Personal health data needs local storage, not cloud

Health data inside a corporate platform means health data inside a jurisdiction you do not control, encrypted or not. Self-hosting it locally—with proper backups and audit trails—trades convenience for actual ownership; for medical records, that trade is worth making.

Why health data on Copilot is a reminder to keep personal records offline

Microsoft Copilot Health aggregates data from hospitals, wearables, and health apps into a single AI-accessible layer. The pitch is convenience: one place to view your records, ask questions, get summaries. The privacy pledge that comes with it is equally polished. What it cannot offer, and what no vendor-controlled platform can offer, is actual ownership of your own health data.

The distinction matters because health data is not ordinary personal data. GDPR Article 9 classifies it as a special category, alongside genetic data, biometric identifiers, and data revealing racial or ethnic origin. Processing it requires explicit consent or a recognised legal basis, and the bar for both is higher than for standard PII handling. That classification applies regardless of whether the data sits on a server in Dublin or in Texas. GDPR’s reach follows the data subject, not the data centre postcode.

This is where the vendor promise starts to fray. Microsoft holds an EU Data Boundary commitment, and Copilot Chat became a Core Online Service under that boundary in September 2025. That means Microsoft commits to storing and processing EU data within EU borders for certain services. What it does not change is the corporate structure. Microsoft is a US-incorporated entity, and the CLOUD Act, passed in 2018, grants US law enforcement the authority to compel US-headquartered companies to produce data regardless of where that data physically lives. EU residency of a server is not the same as EU jurisdictional control. GDPR Article 48 prohibits simply handing data to foreign authorities without an international agreement, but the structural tension between the two laws has not been resolved by any data residency commitment from any US hyperscaler. Storing your scan results in an Amsterdam data centre owned by a Seattle-headquartered company does not close that gap.

There is also the question of what happens inside the platform before any government agency becomes relevant. Microsoft 365 Copilot had a documented bug, confirmed in early 2026, in which the model read, summarised, and surfaced emails marked as confidential, including protected health information, ignoring sensitivity labels for weeks. The liability position in that situation is instructive. AI health platforms routinely disclaim responsibility for the accuracy of outputs and for unintended data exposure caused by misconfiguration or model behaviour. That responsibility lands with the person or entity who chose to put the data there. When that person is you, and the data is your own medical history, the transfer of risk is complete.

Vendor lock-in compounds the problem quietly. Health records that live inside a platform you cannot export cleanly are records you do not fully control. The export path may exist in theory, incomplete format, missing metadata, no versioning, no audit trail of what was accessed and when. By the time you want to leave, or by the time the platform changes its terms, the practical cost of extraction is high enough that most people absorb it and stay.

The self-hosted alternative is not frictionless, but it is honest about what it is. Running Nextcloud on your own hardware gives you a local health records store with role-based access controls, end-to-end encryption at rest, and an audit log you can actually read. The default configuration requires explicit share grants; nothing is accessible without a deliberate decision. You set the retention period. You decide what syncs and what does not. There is no background model reading your files to surface a summary.

Getting offline backup right for health records means three things: local encryption before the backup leaves the machine (VeraCrypt volumes or encrypted ZFS datasets work for this), versioned snapshots so you can roll back to a known good state rather than losing a corrupted file, and at least one air-gapped copy that never touches a network. The air-gapped copy is the one that matters when everything else goes wrong. An encrypted USB drive stored somewhere physically separate from the server is not elegant, but it is yours, and no court order served on a US company can compel its contents.

Privacy by design in practice means keeping PII out of logs from the start. Nextcloud’s logging level should be set to warning in config.php rather than debug; debug logs will capture file paths and user activity in ways that leak information. Restrict sync scope to a dedicated health records folder rather than giving a sync client access to the entire instance. Set a data retention period and stick to it: if you no longer need a document, delete it and empty the trash. The goal is to minimise the surface area of what exists, not to encrypt an ever-growing pile of data and hope the encryption holds forever.

The audit gap that cloud-hosted solutions leave open is real and structural. When your health data lives on someone else’s infrastructure, your view of what has been accessed, by which service, at what time, is whatever that vendor chooses to show you. On your own hardware, the audit log is yours. You can check it, ship it to a SIEM if you want, or grep it at 2am without asking anyone’s permission. That is not a minor operational detail. For data classified as special category under GDPR, knowing precisely what was accessed and when is part of GDPR compliance itself.

The trade-off is time. Self-hosting Nextcloud on a small machine, a Raspberry Pi 5 or a repurposed thin client, takes an afternoon to set up properly. Maintaining it, applying updates, checking backup integrity, rotating encryption keys, takes a few hours a year. The cloud alternative takes ten minutes to sign up for and costs that time plus the ongoing uncertainty about where your data actually is, who can reach it, and what the platform will do with it next. That is not a comparison that favours the cloud for data this sensitive.