Video file naming and structure for decade-scale retrieval

A video archive you cannot read a decade later is just a warm drive waiting to die. Ambiguous filenames and missing verification schedules kill cold storage video archival; fix those two things, and the rest is manageable.

Cold storage for video archives: keeping home media findable after a decade

A video archive that you cannot read a decade later is just a warm drive waiting to die. The two things that kill long-term cold storage video archival are not hardware failure and not bit rot on their own; they are ambiguous filenames and the absence of any verification schedule. Fix those, and the rest is manageable.

Naming files so they survive decade-scale storage

The only sort key that works across every file system, every OS, and every decade is an ISO 8601 date prefix: YYYY-MM-DD. Put it at the very start of the filename. 2024-07-14 sorts correctly on ext4, APFS, NTFS, and any future file system that has not been invented yet. A name like holiday_july_14.mkv is ambiguous from day one and completely useless once you have twenty years of files mixed together.

After the date, embed the source, resolution, and codec directly in the filename. A naming pattern like 2024-07-14_cornwall-coast_4k_h265.mkv tells you everything without opening the file. Source identifies the camera or capture device; resolution and codec matter because your playback capability in 2035 may be different to today, and you need to know which files require transcoding before you can use them.

Keep the path strictly ASCII, lower case, and hyphen-separated. Spaces break command-line tools. Parentheses, ampersands, and locale-specific characters such as accented vowels cause silent truncation or rename failures on Windows paths. You will not remember which convention you used in five years; pick ASCII hyphens now and never deviate.

Write a sidecar manifest at ingest. A plain-text .manifest.csv file sitting alongside each directory should list filename, SHA-256 hash, file size in bytes, capture date, and a human-readable description. Embedded metadata in MKV or MP4 containers gets stripped during re-encodes, copies through certain NAS applications, and cloud sync clients. The sidecar file is the ground truth. Treat it as part of the archive, not a convenience.

For re-encodes, add a versioning suffix rather than overwriting. _v1 for the original capture, _v2 for a first transcode, _proxy for a low-resolution viewing copy. This keeps your offline backup strategy honest: you always know which copy is the master and which is derivative. Overwriting the original with a re-encode is a one-way door you will regret on a day when the re-encode turns out to have a fault.

Directory structure and redundancy schedule

Keep the folder structure flat. Two levels is the right depth: one top-level directory per year, one subdirectory per event or subject. 2024/2024-07-14_cornwall-coast/ covers everything you need. Anything deeper than two levels hits path-length limits on Windows (260 characters by default without enabling long path support), and long paths are silently truncated by some copy tools on macOS. Flat structures also mean your manifest files stay close to the content they describe.

The nearline NAS setup should be your working layer. Set up a mirrored or RAIDZ pool on ZFS, or a Btrfs RAID-1 mirror, for day-to-day access. ZFS and Btrfs both store per-block checksums and verify them on read, which means bit rot detected early can be repaired automatically from the mirror. Schedule a zpool scrub monthly at minimum. For drives that spin down regularly or see low read traffic, monthly scrubs are more important, not less: data that goes unread for months can accumulate silent corruption that only surfaces when you actually need the file.

Pair the NAS with a snapshot rotation cycle. Take a ZFS or Btrfs snapshot on the NAS before every new ingest, keep four weekly snapshots and three monthly snapshots, and prune automatically. This protects against accidental deletions and bad copies without eating your entire pool capacity.

The cold drives are a separate layer. Every quarter, clone the NAS archive to a new offline backup drive using rsync --checksum or rclone copy --checksum. Do not use a plain rsync -av; it compares timestamps and sizes, not content. The --checksum flag re-reads every byte and compares hashes, which is the only verification worth having for long-term media preservation.

At ingest, generate a SHA-256 checksum log for the entire directory tree:

bash
find /mnt/archive/2024 -type f -exec sha256sum {} \; > /mnt/archive/2024/checksums-2024.sha256

Store that file on the NAS and copy it to each cold drive alongside the data. Every time you spin up a cold drive, whether for a quarterly copy or an annual verify, run the check:

bash
sha256sum -c /mnt/cold-drive/2024/checksums-2024.sha256

Any mismatch is a corruption event. Cross-reference against the NAS copy immediately.

Labelling offline drives without trusting the filesystem

Do not rely on volume labels as your only identifier. A filesystem label can be changed accidentally, or it may not mount correctly on a different OS. Every cold drive should carry a permanent physical label on the drive itself, a matching entry in a plain-text drive-registry.csv file on the NAS, and a duplicate of that registry burned to each drive. The registry should record: drive serial number, purchase date, capacity, archive date range, pool name if applicable, and physical location. When you pull a drive out of a box in 2033 and the volume name reads UNTITLED, the serial number on the label and the registry entry together tell you exactly what is on it. The filesystem metadata is a convenience; the registry is the record.

This entire approach, filename convention plus sidecar manifests plus quarterly checksum verification plus physical labelling, is the boring strategy. The boring strategy is the one that has worked every time.