Key Takeaways
- The accuracy of PPG recordings in wrist-worn wearables depends heavily on the contact pressure between the sensor and the skin.
- HRV estimation error jumps from 0.89ms to 10.95ms when wrist contact pressure moves away from optimal, a 12x accuracy drop from one physical variable.
- Wrist PPG shifts through five morphology types as contact pressure changes; only Type 2L produces signals reliable enough for accurate recordings.
- Fingertip sensors outperform wrist PPG for heart rate and HRV accuracy because capillary density is higher and contact pressure is more stable across wearing conditions.
Is Your HealthTech Product Built for Success in Digital Health?
.avif)
Health products that measure HRV without signal quality awareness are building on variable data; adding quality context to your pipeline changes outcomes.
Every smartwatch on the market promises continuous heart rate monitoring. Millions of people check their heart rate and heart rate variability (HRV) scores each morning, trust recovery metrics before training sessions, and use sleep data to adjust their routines.
There is a measurement problem most health apps do not discuss: the quality of the PPG signal that produces those heart rate and HRV readings depends heavily on how the watch sits on the wrist. A new dataset published in Nature Scientific Data quantifies exactly how much this matters, and the results should change how developers think about wearable-derived heart rate and HRV data [1,2].
What is PPG used for? HRV explained
Wearables use photoplethysmography (PPG) to derive heart rate in beats per minute (bpm) and HRV in ms. HRV refers to the time variation between individual heart beats. It is measured as the interval between the RR peaks from ECG or the equivalent pulse peaks detected by a PPG.
Consumer wearables most commonly use RMSSD (Root Mean Square of Successive Differences) to report HRV which emphasizes short-term, parasympathetic variability. SDNN (Standard Deviation of NN intervals) is more widely used in clinical settings, capturing both short and long-term variability components. Both metrics are derived from the same NN interval data and both depend on the same PPG signal quality [3].
This distinction matters for developers: when integrating with a wearables data platform that normalizes HRV across providers, you should verify which metric each device actually reports before building scoring logic on top of it.
What is SDNN?
A higher SDNN value generally indicates healthy autonomic nervous system function, meaning the heart appropriately speeds up and slows down in response to demands. A lower SDNN can signal accumulated fatigue, chronic stress, or cardiovascular strain. The metric has clinical standing: guidelines from the European Society of Cardiology associate SDNN below 50ms with elevated cardiovascular risk in long-term recordings [4].
Consumer wearables usually record HRV using RMSSD during sleep or in short morning sessions, which means their absolute numbers are not directly comparable to clinical benchmarks. This context matters when building health products. A user with an HRV of 35ms on a 5-minute wrist reading is not in the same category as a patient with 35ms on a 24-hour Holter. The wearable and clinical numbers live in different reference frames.
What the WF-PPG Research Found
Researchers at Singapore Management University and Eindhoven University of Technology built a custom measurement device that records PPG signals from the wrist and fingertip simultaneously while precisely controlling contact pressure between the sensor and the skin.

They tested 27 participants across 6 sessions each, gradually increasing wrist contact pressure while recording three PPG wavelengths (green, red, infrared) using their device. They paired it with measuring ECG from a Polar H10 chest band, blood pressure from an Omron monitor, and oxygen saturation from a pulse oximeter to compare the results to ground truths [1].
The core finding: as contact pressure changes, wrist PPG waveforms shift through five distinct morphology types. At optimal pressure, wrist PPG produces a clean waveform (Type 2L) with a visible dicrotic notch, the small feature indicating transition from systole to diastole. Too little pressure, and the signal loses its diastolic component. Too much, and the waveform degrades into noise.
The practical impact on cardiac accuracy: at optimal contact pressure, wrist PPG heart rate error is 0.37 bpm and HRV (RMSSD) error is 0.89ms. When pressure drifts from the optimal range, HRV error jumps to 10.95ms. That is a 12x accuracy drop from a single physical variable [1].
Fingertip PPG, by comparison, maintained consistent accuracy regardless of conditions: 0.08 bpm heart rate error and 0.15ms RMSSD error [1].
The Five PPG Morphology Types
The dataset categorizes wrist PPG waveforms into five types based on how contact pressure distorts the signal [1]:
Type 1: Single centralized peak. Occurs at lower contact pressures. Missing diastolic information makes it unreliable HRV estimation.
Type 2E: Two peaks with similar amplitude (less than 10% difference). Slightly insufficient contact pressure. Better than Type 1, but the dicrotic notch is unclear.
Type 2L (Ideal): Two peaks with the systolic peak clearly dominant (10% or more larger). Visible dicrotic notch. This is the waveform clinical algorithms expect. Produced by firm and optimal contact pressure.
Type 1L: Only the systolic peak remains visible. Excessive pressure compresses blood vessels, eliminating the diastolic component.
Type 3: Characterized by low signal-to-noise ratio. Extreme contact pressure makes the waveform uninterpretable.

The transition from Type 1 through Type 2L to Type 3 follows a predictable pattern as pressure increases. The exact pressure thresholds vary between individuals, meaning there is no universal "correct" tightness for a wrist-worn device.
Why This Matters for Health Product Builders
If you are building features that depend on wrist-worn PPG data (sleep staging, recovery scores, HRV trends, stress detection), this research quantifies something practitioners have known anecdotally: wrist data quality fluctuates.
A user who wears their watch loosely during sleep gets different signal quality than someone who cinches it tight during a workout. The same user's data quality changes when they move their wrist, when ambient temperature shifts, or when skin moisture varies.
This does not mean wrist PPG data is useless. It means the gap between having data and having reliable data is real and measurable. Products that treat raw PPG-derived metrics as ground truth are building on variable foundations.
The 12x RMSSD accuracy difference between optimal and sub-optimal contact pressure is not an edge case. It is the everyday reality of wrist-worn measurement. Users do not calibrate their watch band tension before each reading.
For teams working on patient-level use cases, combining wearable data with clinical context matters even more. The patient EHR and wearables integration problem is partly a data quality problem: clean wearable signals are more useful for clinical decision support than noisy ones.
Implications for Platform Architecture
This research points to a structural need in wearable health platforms: signal quality awareness needs to be part of the data pipeline, not an afterthought.
Scoring layers matter. Raw data aggregation (averaging heart rates, summing sleep minutes) does not account for the quality of the underlying signal. Health scores that incorporate signal confidence intervals give downstream applications better information to work with. This is one reason the Open Wearables MCP server is designed to carry metadata alongside raw metrics: context about the source, device, and recording conditions.
Open algorithms enable validation. When a health score is a black box, there is no way to verify whether it accounts for signal quality variation. Open, auditable scoring algorithms let developers and researchers inspect the reasoning chain from raw sensor data to derived metric. Opaque vendor scoring does not.
Normalization across devices is necessary but not sufficient. Unifying the data format from Garmin, Apple Watch, and Whoop solves the interoperability problem. The Open Wearables 0.3 release added Google Health Connect and Samsung Health support, bringing more device ecosystems under a single normalized API. But if a platform does not account for measurement quality differences between devices and wearing conditions, the unified data still inherits all the noise from each source.
AI coaching needs accurate inputs. Health intelligence applications, including AI-powered coaching that interprets trends, detects anomalies, and generates personalized recommendations, depend on HRV data as core inputs. Signal quality context is not optional for these use cases.
This is the difference between health data and health intelligence. Data is the number. Intelligence includes the context: where the data came from, how reliably it was collected, what it means in isolation, and how it combines with other signals.
What Developers Can Do
Use open datasets for validation. The WF-PPG dataset is publicly available on Figshare [2]. If you are building algorithms that use PPG-derived metrics, test them against data with known signal quality variation. Your algorithm might perform well on clean lab data and poorly on the variable signals users actually generate.
Build quality-aware pipelines. Consider adding signal quality indicators to your data model. Even simple heuristics (waveform regularity, signal-to-noise ratio) can help downstream features distinguish high-confidence readings from noisy ones. If you are building on Apple Health SDK data, Apple devices report heart rate confidence levels that can serve as proxy quality indicators.
Account for device differences. Finger-based sensors (like Oura Ring) produce fundamentally different signal quality than wrist-based sensors (like Apple Watch or Garmin). If your platform supports multiple device types, your HRV scoring should reflect those measurement characteristics. For teams building multi-source health platforms, combining wearable data with health record data adds clinical context that helps interpret outliers.
Consider multi-signal approaches. The research shows that combining wrist and finger PPG improves reliability. Platforms that support multiple data sources per user (wrist watch plus ring) can cross-validate measurements, which reduces reliance on any single device's signal quality.
{{lead-magnet}}
The Bigger Picture
The wearable health market keeps growing. More devices, more users, more data. But the quality of that data varies more than most product teams realize.
Dataset like the WF-PPG makes this variability visible and measurable. That shifts the conversation from "how much data can we collect" to "how reliable is the data we have."
For companies building health products, the competitive advantage is not just in accessing wearable data. It is in understanding what that data actually means, accounting for its limitations, and turning noisy measurements into reliable health insights. That is the platform challenge: moving from raw HRV numbers to actionable health intelligence.
If your team is building on wearable data and needs help designing quality-aware pipelines or navigating the device integration landscape, talk to Momentum.
Related Reads:
- Open Wearables 0.3 Release: Google Health Connect and Samsung Health support, and what expanded device coverage means for health platform developers
- Wearables Integration Service: Momentum's support for building production-grade wearable data pipelines
- EHR Integration Service: Connecting wearable health data to electronic health records
- Open Wearables on GitHub: Open-source platform for health intelligence, from wearable data to AI coaching
References:
[1] Ho, M.Y. et al. (2025). WF-PPG paper. Scientific Data, 12, 200. [doi link]
Cited in: study methodology; 12x accuracy drop; fingertip baseline; morphology classification; FAQ accuracy question.
[2] Ho, M.Y. et al. (2024). WF-PPG Dataset. Figshare. [doi link]
Cited in: dataset availability for developer validation.
[3] Shaffer & Ginsberg (2017). HRV Metrics and Norms. Frontiers in Public Health, 5, 258. [doi link]
Cited in: 20–50ms short-term norms; SDNN vs RMSSD definitions.
[4] Task Force of the ESC (1996). HRV Standards. Circulation, 93(5). [doi link]
Cited in: SDNN <50ms risk threshold; 50–100ms healthy range; FAQ normal range answer.
Frequently Asked Questions
Test algorithms against datasets with known signal quality variation, not just clean lab data. Add signal quality indicators to your data model where devices provide them. Account for measurement differences between device types (finger vs wrist sensors) in your SDNN scoring logic. If your product depends heavily on HRV accuracy, multi-device validation (cross-referencing wrist and ring data) adds meaningful confidence to your readings. The wearables integration layer you choose should provide access to device metadata, not just normalized metrics.
.png)


.png)

.png)


