Insights

Building AI Features on Health Data: Momentum's Implementation Process

Author
Piotr Ratkowski
Published
April 24, 2026
Last update
April 24, 2026

Table of Contents

EXCLUSIVE LAUNCH
AI Implementation in Healthcare Masterclass
Start the course

Key Takeaways

  1. Most AI projects in health products stall at data preparation, not model development. FHIR normalization and PII handling typically take 2-4 weeks before any model work begins.
  2. Timeline ranges from 8-10 weeks for a basic health score on clean structured data to 14-20 weeks for multi-model, clinically validated features with FDA SaMD review.
  3. HIPAA compliance for AI inference requires audit logging tied to individual model decisions, not just data storage. Teams that skip this add 2-4 weeks at the end of the project.
  4. A data readiness audit run before development scopes which features are viable given your current data state, and prevents timeline overruns caused by discovering gaps mid-build.

Is Your HealthTech Product Built for Success in Digital Health?

Download the Playbook

When a healthtech team decides to ship their first AI feature, the first obstacle is rarely the model. It's the data.

What looks like structured health data in your EHR turns out to be a mix of FHIR R4, HL7 v2, and some vendor-specific formats that require custom parsers. Your wearable integration surfaces raw device data that needs normalization before it's usable. PII handling that wasn't designed for AI inference creates problems when you start building training pipelines. And compliance questions, whether you need FDA SaMD classification and what HIPAA audit logging requires for AI-generated outputs, show up late and resize the timeline.

This article covers how Momentum builds AI features for health products: the data preparation work that precedes model development, the specific feature types we've shipped, and realistic timelines broken down by complexity.

From Raw Data to Model-Ready Inputs

Data preparation is the most consistently underestimated phase in health AI projects. Teams budget for model training. They don't budget for making data trainable.

FHIR Normalization

Health data arrives from multiple sources in different formats. A care management platform might pull from three EHRs with different FHIR implementation profiles. A wearable integration layer might produce JSON payloads with proprietary field names. A lab results feed might still use HL7 v2 segments.

Before any model can use this data, it has to be normalized to a consistent schema. For Momentum, that baseline is FHIR R4. Normalization work includes mapping vendor-specific fields to standard FHIR resource types (Observation, Patient, Condition, MedicationRequest), resolving conflicting values across sources for the same patient, and handling missing or null values in a way that doesn't introduce model bias.

PII Handling and De-identification

HIPAA's Safe Harbor de-identification method requires removing 18 specific identifiers. The Expert Determination method requires statistical analysis confirming that re-identification risk falls below a defined threshold. For AI development, which method you use affects what data you can use in training environments.

Momentum builds de-identification into the pipeline architecture, not as a post-processing step. Structured fields (names, dates, location data) get masked or generalized; free-text clinical notes require NLP-based entity recognition to catch PHI embedded in narrative.

Data Quality Assessment

  • Structured FHIR data from a well-maintained EHR: lowest overhead. Field mapping, validation, de-identification.
  • Semi-structured data from a mix of sources: requires reconciliation logic, gap analysis, handling of inconsistent timestamps.
  • Raw wearable device data: highest overhead. Sensor-level signals need aggregation, noise filtering, and alignment to clinical time windows.

AI Features Momentum Has Built

Health scores. A composite output derived from multiple data sources: wearable signals, lab results, activity data, sleep metrics. Momentum has built health scores on top of Open Wearables deployments across Garmin, Whoop, and Apple Health data.

Anomaly detection. Three approaches: threshold-based (values outside a defined range), statistical (deviation from patient baseline), and ML-based (patterns too complex for rules). For medication adherence anomalies in a consistent patient population, a statistical baseline model is often sufficient and more explainable.

Coaching recommendations. Require a recommendation logic layer, sufficient user history for personalization, and safety guardrails specific to the health context: avoiding suggestions that conflict with a user's conditions or medications, flagging when trends warrant clinical review rather than a coaching nudge.

Risk stratification. Identifies patients at elevated risk within a population for targeted care management outreach. If the feature crosses into FDA SaMD territory, it triggers a different compliance track.

Timeline by Complexity

ComplexityExampleTotal timelineKey driversBasicHealth score from existing structured FHIR data8-10 weeksClean data available, single EHR source, rules-based scoringModerateAnomaly detection on wearable + EHR data10-14 weeksMulti-source normalization, statistical model training, baseline data requirementsFull-featuredPersonalized coaching with clinical validation14-20 weeksMulti-model architecture, FDA SaMD review, clinical sign-off, compliance audit

Data availability is the single biggest driver of timeline variance. The 8-10 week range for a basic health score assumes data is already in a usable state. If it isn't, add 2-4 weeks for the data pipeline phase. Moving from one to three data sources typically adds 3-4 weeks to the data pipeline phase.

Compliance Factors

FDA SaMD classification. A wellness score describing trends is different from a feature informing a clinical decision. SaMD classification adds a compliance track that includes predicate device analysis, clinical validation requirements, and potentially a 510(k) submission. Momentum assesses SaMD classification risk at the start of every project.

HIPAA audit logging for AI inference. When a model generates a recommendation or risk score using a patient's health data, that inference is a PHI access event requiring a log: what data was used, which model version ran, what output was produced, and when. Retrofitting inference-level audit logging typically adds 2-3 weeks to a project that believed it was complete.

Model monitoring. Momentum builds drift detection on input distributions, performance metric tracking, and automated alerts when model behavior falls outside defined bounds into every production deployment.

Clinical validation. For features touching clinical workflows or informing care decisions, a clinician needs to review model behavior before production release. Timeline ranges from one to three weeks.

Where to Start

Before any model development begins, Momentum recommends a data readiness audit. In 1-2 weeks, it produces a map of available data sources, an assessment of quality and schema consistency, a classification of HIPAA-sensitive data and applicable de-identification approach, and a list of which AI features are viable given the current data state.

For more on the interoperability layer that feeds AI pipelines, see our Healthcare Interoperability Solutions article.

Frequently Asked Questions

How long does it take to build AI features for a health app?
Timeline depends on complexity and data readiness. A basic health score built on clean, structured FHIR data takes 8-10 weeks. Anomaly detection on combined wearable and EHR data takes 10-14 weeks. Full-featured personalized coaching with clinical validation and FDA SaMD review takes 14-20 weeks. If your data isn't ready, add 2-4 weeks for the normalization and de-identification pipeline before any model work begins.

Timeline depends on complexity and data readiness. A basic health score built on clean, structured FHIR data takes 8-10 weeks. Anomaly detection on combined wearable and EHR data takes 10-14 weeks. Full-featured personalized coaching with clinical validation and FDA SaMD review takes 14-20 weeks.

If your data isn't ready, add 2-4 weeks for the normalization and de-identification pipeline before any model work begins. A data readiness audit before development determines which bucket you're in.

What is FDA SaMD classification and when does it apply to health AI?
Software as a Medical Device (SaMD) is an FDA classification for software that performs a medical purpose without being part of a hardware medical device. It applies when your software informs, drives, or influences clinical decisions rather than just providing general wellness information. A feature that displays health trends is typically not SaMD. A feature that flags a patient as high-risk for readmission and triggers a clinical intervention may be. SaMD classification adds a compliance track including predicate device analysis, clinical validation, and potentially a 510(k) submission.

Software as a Medical Device (SaMD) is an FDA classification for software that performs a medical purpose without being part of a hardware medical device. It applies when your software informs, drives, or influences clinical decisions rather than just providing general wellness information.

A feature that displays health trends is typically not SaMD. A feature that flags a patient as high-risk for readmission and triggers a clinical intervention may be. SaMD classification adds a compliance track including predicate device analysis, clinical validation, and potentially a 510(k) submission. Momentum assesses SaMD classification risk at the start of every AI project.

What is FHIR normalization and why does it matter for health AI?
FHIR normalization is the process of converting health data from multiple sources and formats into a consistent FHIR R4 schema. Different EHRs use different field names, coding systems (ICD-10, SNOMED CT, LOINC, RxNorm), and resource structures for the same clinical concepts. Before AI models can work with this data, it must be mapped to a unified schema, conflicts resolved, and missing values handled consistently. Without normalization, models trained on data from one EHR perform poorly on data from another.

FHIR normalization is the process of converting health data from multiple sources and formats into a consistent FHIR R4 schema. Different EHRs use different field names, coding systems (ICD-10, SNOMED CT, LOINC, RxNorm), and resource structures for the same clinical concepts.

Before AI models can work with this data, it must be mapped to a unified schema, conflicts resolved, and missing values handled consistently. Without normalization, models trained on data from one EHR perform poorly on data from another. See our interoperability article for more on how normalization pipelines work.

What is a data readiness audit?
A data readiness audit is a 1-2 week assessment Momentum runs before any AI development begins. It maps available data sources, assesses data quality and schema consistency, classifies HIPAA-sensitive fields and identifies the applicable de-identification approach, and produces a list of which AI features are viable given the current data state. The audit determines whether the project falls in the 8-10 week range or the 14-20 week range and surfaces normalization and compliance work that would otherwise appear mid-project.

A data readiness audit is a 1-2 week assessment Momentum runs before any AI development begins. It maps available data sources, assesses data quality and schema consistency, classifies HIPAA-sensitive fields and identifies the applicable de-identification approach, and produces a list of which AI features are viable given the current data state.

The audit determines whether the project falls in the 8-10 week range or the 14-20 week range and surfaces normalization and compliance work that would otherwise appear mid-project. Teams that run the audit first move faster through implementation and have fewer timeline surprises during compliance review.

Does HIPAA require audit logging for AI-generated outputs?
Yes. When an AI model generates a recommendation or risk score using a patient's protected health information, that inference is a PHI access event that requires an audit log under HIPAA. The log must capture what data was used, which model version ran, what output was produced, and when. Many teams discover this requirement late in development and add 2-3 weeks to retrofit inference-level logging after the fact. Momentum builds HIPAA audit logging for AI inference into the architecture from the start of every project.

Yes. When an AI model generates a recommendation or risk score using a patient's protected health information, that inference is a PHI access event that requires an audit log under HIPAA. The log must capture what data was used, which model version ran, what output was produced, and when.

Many teams discover this requirement late in development and add 2-3 weeks to retrofit inference-level logging after the fact. Momentum builds HIPAA audit logging for AI inference into the architecture from the start of every project.

Written by Piotr Ratkowski

Head of Growth
Grows Momentum's client portfolio and advises HealthTech teams on product strategy, market positioning, and where AI actually makes a difference. Writes about the trends and decisions shaping digital health.

See related articles

Ready to add AI to your health product?

Let's Create the Future of Health Together

Start with a data readiness audit. Momentum maps your available data sources, identifies quality gaps, classifies what's HIPAA-sensitive, and outlines which AI features are viable given what you have.

Looking for a partner who not only understands your challenges but anticipates your future needs? Get in touch, and let’s build something extraordinary in the world of digital health.

Newsletter

Piotr Ratkowski