Long term data archival

A few words about my research on making data archives that will stand the test of time.

Goal: Archive data for long-term storage.

Requirements:

- Durable storage.

- Resiliency to bit rot.

- No need for special rooms or conditions to store media.

- Easy retrieval; no need to wait hours to restore data.

- Encryption.

- Indexed archives for easy reference.

Selecting storage media

Common media choices are:

💿 Optical storage (with one notable exception) is very unreliable and will result in read errors usually in a few years time. The exception here is M-DISC, which we'll talk about in a bit. There's also Syylex Glass Master Disc but it's ridiculously expensive ($1000 per disc).

🖴 Flash storage and Solid State drives need occasional connection to power to "refresh" bits and not lose data. Plus, cheap consumer-grade SSDs are notoriously unreliable. Avoid. HDDs are also not reliable (susceptible to sudden bad sectors) and may not even start after being dormant for a few years if you're unlucky. Avoid as well.

📼 Tapes (LTO) are too slow, need a lot of time for data retrieval, need expensive equipment and special storage conditions (low humidity, climate control etc) to be reliable for long-term data storage.

💾 Floppies. Gotta love them for nostalgia, but no.

Avoid other obscure media. Chances are, the hardware you'll need to read them will be obsolete and very difficult to find a few decades' time.

So, what to choose?

M-DISC. (Unless you have huge datasets, where tape is the only realistic option). From Wikipedia:

M-DISC's design is intended to provide archival media longevity. M-Disc claims that properly stored M-DISC DVD recordings will last up to 1000 years. The patents protecting the M-DISC technology assert that the data layer is a glassy carbon material that is substantially inert to oxidation and has a melting point of 200–1,000 °C (392–1,832 °F). M-Discs are readable by most regular DVD players made after 2005 and Blu-Ray & BDXL disc drives and writable by most made after 2011.

There have been accelerated aging tests for M-DISCS that prove their increased durability compared to even the best quality alternatives, but whether they'll last 50 or 500 years, is something to be seen. Other advantages:

- No need for specific equipment to read. DVD and BluRay drives will probably be here for a long time.

- No need for special storage environment, stash in a drawer and forget.

- No need to purchase special equipment to write. A good quality writer is recommended nevertheless; I got a Toshiba USB3 M-DISC writer at around $200 a few years ago.

There are M-DISC DVDs and BluRays, I chose the latter with the 25GB capacity which is decent. If you have huge storage requirements, then you should revisit LTO storage instead.

Backup procedure

Steps:

1. Encryption: Create Veracrypt volume and put your data there.

2. Recovering from corruption: Fortify the volume file with extra metadata to recover from data corruption.

3. Indexing: Make sure you know where's what.

4. Persistence: Burn the final files to the disc.

1. Encryption

I use Veracrypt. It's easy to use, uses solid crypto and it's cross-platform: Runs on Windows, MacOS, Linux and OpenBSD (which is what I use).

To create a new Veracrypt volume:

To mount it:

To unmount it:

2. Recovering from corruption

To ensure we can recover our data in case of errors, we'll use Parchive (Par2).

Create a Par2 archive with 5% recovery size and one recovery file:

To validate a Par2 archive:

In case of errors, repair:

3. Indexing

To create an encrypted list of files included in the backup:

To see all files included in the backup:

4. Persistence: Burning files to disc

After following the above steps, you'll have a set of four files. Burn those on the disc using Brasero or your favourite optical disc burning software. The first time you do this, before you stash the disc away I suggest you follow the procedure backwards to make sure you can decrypt and restore the files correctly.

http://archive.org/details/lne-syylex-glass-dvd-accelerated-aging-report

https://en.wikipedia.org/wiki/Parchive

https://github.com/Parchive/par2cmdline

https://www.veracrypt.fr/en/Home.html

--

📅 2023-05-25

🏷 computers, backup, archival, storage, bitrot

📧 a@antanst.com

CC BY-NC-ND 4.0

Proxied content from gemini://smol.gr:1965/20230525-backup.gmi (external content)

Gemini request details:

Original URL
gemini://smol.gr:1965/20230525-backup.gmi
Status code
Success
Meta
text/gemini; lang=en
Proxied by
kineto

Be advised that no attempt was made to verify the remote SSL certificate.