Insights

What Counts As PHI Under HIPAA, and How To Know if Your App Collects It

Close-up of hands holding a phone, showing mobile data entry where apps may collect PHI and need HIPAA safeguards.
Author
Piotr Sobusiak
Published
November 5, 2025
Last update
November 5, 2025
Close-up of hands holding a phone, showing mobile data entry where apps may collect PHI and need HIPAA safeguards.

Table of Contents

EXCLUSIVE LAUNCH
AI Implementation in Healthcare Masterclass
Start the course

Key Takeaways

  1. PHI is not a fixed list of fields but a status that appears when identifiable data is used for care, payment, or operations.
  2. You can spot whether your app collects PHI by tracing real user flows and noticing where people become identifiable in portals, chats, forms, logs, and prompts.
  3. Wearables, tracking tech, and LLM features become safer when you govern the interface, keep identifiers out, and use only the minimum necessary data.
  4. De-identification works best as a living process with linkability controlled, keys separated, and releases reviewed.
  5. Clear labels at ingestion and BAAs where needed turn compliance into momentum, and the full article walks you through each decision so you can move faster with confidence.

Is Your HealthTech Product Built for Success in Digital Health?

Download the Playbook

​​Most teams ask “Is this PHI?” as if it’s a property of the data itself. It isn’t. PHI is mostly about context: who holds the data, why they hold it, and whether a person can be identified in a healthcare-related activity.

Under HIPAA, information is considered PHI if it includes specific identifiers that can be linked to an individual and is created, maintained, or transmitted by covered entities or their business associates in connection with healthcare services.

The exact same heart-rate value can be non-PHI in a consumer fitness app and PHI inside a clinic’s care-navigation platform. If your product touches care, payment, or operations (even indirectly via vendors), assume you need guardrails until you’ve mapped the flows.

If you’re unsure whether your data flows meet the PHI threshold, we can review your app’s user journeys and tell you exactly where PHI appears, and what to change first.

A quick plain-language explanation of what is PHI under HIPAA

Protected Health Information (PHI) is any information that can identify a person and relates to their health, care, or payment for care—when that information is created, received, maintained, or transmitted by a Covered Entity (like a provider or health plan or other healthcare providers) or a Business Associate (a vendor working for them). Two things must be true at the same time: the person is identifiable (directly or indirectly), and the data sits inside a healthcare purpose (care, payment, or operations). That’s why the same data point can be harmless in a general consumer app and PHI inside a clinic workflow. If you’re unsure, treat it as PHI first, then tighten scope through minimization and de-identification.

Under HIPAA, how protected health information is used or disclosed is strictly regulated, with specific rules governing access, sharing, and privacy protections.

Short answer you can read in one minute

If you (or your vendors) handle information that can single out a person and you do it because you’re delivering or supporting healthcare, you’re handling PHI. If you’re not a provider/plan/clearinghouse and you’re not working for one, you may still hold sensitive PII or state-regulated consumer health data, but HIPAA’s PHI rules may not apply. Context is the switch.

Why PHI depends on context and not on a single data field

Asking “is email PHI?” misses the point. HIPAA doesn’t bless or ban specific columns; it looks at why you have the data and who you are in the chain. An email and other personal identifiers such as electronic mail addresses and phone numbers collected for a newsletter on a marketing site is just PII. The same email captured during patient intake at a provider is PHI because it ties an identifiable person to care.

How can the same data point be PHI in one app and not in another?

A heart-rate value inside a step-challenge app is typically outside HIPAA. Pipe that same value into a clinic’s post-op program, and it supports care decisions—now it’s PHI. Nothing about the number changed. The purpose did.

How to figure out who you are under HIPAA (Covered Entity, Business Associate, or neither)

Covered Entities are providers, health plans, and clearinghouses, such as the ones that process health information on behalf of other entities. Business Associates are vendors that create, receive, maintain, or transmit PHI for a Covered Entity. HIPAA regulations apply to these entities and to the covered entity's workforce, which includes all staff members who handle protected health information and are responsible for compliance.

Many startups start outside HIPAA, then step into BA territory the moment a hospital uses their tool for triage, coordination, follow-ups, eligibility, or similar operational tasks. When that happens, the relationship needs a Business Associate Agreement (BAA) and HIPAA-aligned controls.

When does a startup become a Business Associate and needs a BAA?

If a clinic routes patient messages through your chat, if a plan shares member files for your algorithm to run, or if your platform hosts intake forms that feed a provider’s EHR, you’re acting on behalf of a Covered Entity. That’s BA status. You’ll need a BAA with the customer and matching safeguards with your own vendors that touch PHI.

What makes information PHI: identifiability plus a healthcare purpose

PHI is built from two parts. First, identifiability: a person can be picked out directly (name, email, MRN) or indirectly (dates tied to events, precise locations, device IDs, linkable patterns). Second, healthcare purpose: the data exists because of care, payment, or operations, and a Covered Entity or BA is handling it. If both are true, it’s PHI—relates to their health, care, or payment for care, including information about an individual's physical or mental health, future physical conditions, mental health status, and future payment for healthcare services.

How can people be identified directly or indirectly (even without names)?

Names and emails are obvious. Less obvious are combinations that narrow to one person: a full date of birth with clinic visit times, precise geo around a specialty clinic, a URL with a record ID, or a device identifier that can be joined to account data, or through unique identifying numbers, license plate numbers, vehicle identifiers, internet protocol addresses, web universal resource locators, street address, or such geographic units as zip codes or counties. In modern stacks, linkage creates identifiability even when single fields look harmless.

How to check if your app collects PHI by walking real user flows

Don’t audit your schema first; audit the journey. Where does a user move from browsing to receiving care or managing coverage? Look at authenticated screens, forms, uploads, background jobs, and export paths. Follow what customer support sees, what logs capture, and what analytics collect on patient pages. If the flow exists to deliver or support care, and a person can be identified, you are collecting PHI.

Where does PHI often hide in apps (support tickets, screenshots, URLs, logs, analytics)?

Support inboxes and chat tools receive screenshots of patient dashboards or billing records. Error trackers log full URLs with IDs. Session replay and pixels on patient portals record typed fields. Cloud object keys reveal names or dates. LLM prompt stores keep clinical text pasted by staff. Each of these can carry PHI if they link an identifiable person to healthcare activity.

When does wearable and Apple Health data become PHI and when doesn’t?

Consumer wearables produce “health” data that often sits outside HIPAA until a provider or health plan uses it for care or operations, or a vendor handles it for them. A step count in a personal coaching app may be outside HIPAA. The same step count in a provider-sponsored cardiac rehab program, or inside a BA workflow, becomes PHI because it informs care.

What changes when a provider or plan is involved (programs, referrals, BAAs)?

The tipping point is the relationship. If a provider invites patients to sync Apple Health to guide treatment, or a plan runs an incentives program tied to benefits, the data enters a healthcare purpose. That flips context, triggers BA status for the vendor, and turns shared metrics into PHI.

What should you do about tracking technologies on patient pages and portals?

Tracking isn’t banned; capturing PHI with tracking is the problem. On authenticated patient pages, intake forms, scheduling, results, and messages, pixels and session replay can record identifiers and medical context. Configure analytics to avoid collecting PHI or disable them on these surfaces. Use server-side, aggregate measurement where possible and keep logs free of PHI.

Which pixels, session replay, and analytics should be disabled or reconfigured?

Disable broad client-side trackers on patient flows. Remove session replay on forms that can contain diagnoses, medications, or member numbers. Scrub URLs and query strings of IDs. Ensure analytics events don’t include names, dates of birth, MRNs, or free-text fields. Where tracking is required for operations, document the lawful purpose, limit fields to non-PHI metadata, and protect the data like any sensitive store.

How de-identification works in practice (Safe Harbor vs Expert Determination)

Two paths are recognized. Safe Harbor removes specified identifiers (including detailed dates and locations), such as full face photographic images and any comparable images, and avoids small-area risk; it’s simple but brittle in mobile and IoT contexts. De-identified health information is information that no longer allows for the identification of individuals, and the covered entity must have no actual knowledge that the information could be used to identify an individual.

Expert Determination relies on a qualified expert to show that re-identification risk is very small for your data, recipients, and controls. Either way, treat de-identification as an ongoing process: control joins, separate keys, monitor releases, and revisit risk when datasets or access patterns change.

How do you prevent re-identification through joins and linkage?

Keep join keys in a separate, locked zone. Rotate tokens. Limit who can link datasets. Review exports for latent identifiers like timestamps, device IDs, or rare combinations. When sharing for analytics or model training, strip linkability at the source and verify recipients cannot re-assemble identities with data they already hold.

How to use LLMs without exposing PHI

Govern the interface between PHI and the model. Redact or mask identifiers before prompts. Restrict retrieval to the minimum necessary records. Store prompts and outputs in access-controlled systems with short retention. Prefer structured access—e.g., FHIR resources and a policy layer—so the model never sees more than it needs to answer.

How should model access be routed through redaction, minimum necessary data, and governed APIs?

Build a gateway that: authenticates the user, authorizes fields at query time, retrieves only required elements, redacts identifiers, and injects structured facts into the prompt. Log who asked, what was retrieved, and what the model returned. Keep analytics about model quality separate from PHI, or de-identify before analysis.

How to label data at ingestion so storage, access, and LLM use follow the tag

Label data as it enters your system, not later in BI. Mark payloads and files as PHI, PII, operational, or telemetry, including designated record sets as defined by HIPAA. Route each label to matching storage zones with encryption, access controls, backups, and retention tuned to sensitivity. Make every export, analytics job, and LLM call check the label before running.

A simple checklist to start mapping flows and assigning labels

Sketch the actual patient journey from first login to follow-up, circle every place a person can be identified, and write the label next to each touchpoint. Tag events and files at ingestion, send PHI to a restricted bucket, turn off broad tracking on patient pages, and open a BAA review with any vendor that sees PHI. You will immediately see what to fix, what to de-identify, and where to add guardrails.

{{lead-magnet}}

Short real-world examples that show PHI decisions in common products

A fitness app pulls Apple Health for personal goals: likely outside HIPAA. The same app white-labels for a clinic’s rehab program with clinician dashboards: PHI. A care-navigation app for a hospital handles intake forms, chat about symptoms, and scheduling: PHI throughout. An employer wellness app not tied to a health plan benefit may sit outside HIPAA; the moment a plan sponsors it or data flows into plan operations, the program’s data becomes PHI. A hospital website symptom checker on authenticated paths with analytics that capture IDs: PHI can leak through tracking unless controls are in place.

Is a fitness app using Apple Health collecting PHI?

Personal coaching with no provider relationship: generally not PHI. Provider-sponsored program or plan-linked incentives: PHI.

Does a clinic coordination app make all identifiable data PHI?

Everything identifiable in that flow supports care or operations, so it’s PHI end-to-end and needs BAAs and HIPAA controls.

Is an employer wellness program PHI if the health plan is involved?

Independent wellness with no plan involvement is usually outside HIPAA. Plan-sponsored wellness tied to benefits flips to PHI.

Can a hospital symptom checker leak PHI via cookies and analytics?

If analytics or pixels capture identifiable interactions on patient pages, they may be collecting PHI and must be removed or restricted.

Quick answers to the most common founder questions about PHI

Is step count PHI? Not by default. It becomes PHI when used by or for a provider/plan in care or operations.
Is email PHI? In marketing, it’s PII. In patient intake, scheduling, or results delivery, it’s PHI.
If we hash IDs, is that de-identified? Hashing alone isn’t de-identification if the value can be reversed or linked; treat linkable hashes as identifiers.
Do we need a BAA with our cloud provider? If your cloud stores or processes PHI, yes, you need a BAA and must use services configured for HIPAA.
Can we train models on de-identified data collected as PHI? Yes, if it’s properly de-identified and you prevent re-identification through joins or downstream telemetry.

Why getting PHI right helps your product ship faster and pass reviews

Clear PHI boundaries reduce incidents, cut back-and-forth in security reviews, and shorten enterprise sales cycles. Teams move faster when architecture carries policy: labeled data, governed interfaces, BAAs in place, and tracking configured correctly. Instead of debating every field, you make consistent decisions that stand up to clinical, legal, and security scrutiny.

Key takeaways

PHI isn’t a field name, it’s identifiability plus healthcare purpose. The same data flips status when it moves from a consumer context into care, payment, or operations handled by a Covered Entity or its Business Associate.

You’ll make better, faster decisions if you map real user flows (not just schemas), label data at ingestion, and let those labels drive storage, access, exports, analytics, and LLM use. De-identification is a repeatable process, not a one-time export; protect against linkage, separate keys, and re-check risk as datasets change.

LLMs are safe when the interface is governed: minimum-necessary retrieval, redaction, auditable prompts/outputs, and structured access (e.g., FHIR) behind policy. Finally, if a vendor touches PHI you need a BAA, and tracking on authenticated patient pages must be pruned or reconfigured so it doesn’t capture identifiers or clinical context.

If you want a clean, practical path to “PHI-safe and ready to ship,” we’ll run a focused HIPAA compliance audit—map flows and labels, fix tracking, close BAA gaps, design the LLM access gateway, and lock in de-identification, then hand you a prioritized fix list. Schedule a compliance audit.

Frequently Asked Questions

No items found.

Written by Piotr Sobusiak

CTO
Piotr leads the development of innovative solutions that bridge the gap between healthcare and technology. With extensive experience in software engineering and a deep understanding of the HealthTech landscape, he focuses on creating scalable, compliant, and user-centric digital health products.

See related articles

Make your wearable data PHI-safe from day one

Let's Create the Future of Health Together

We map flows, set guardrails, and ready your app for clinical use.

Looking for a partner who not only understands your challenges but anticipates your future needs? Get in touch, and let’s build something extraordinary in the world of digital health.

Newsletter

Who Does HIPAA Apply To?

Compliance Decision Tree for HealthTech Founders

Download our guide
Piotr Sobusiak