How to store and preserve lots of data?

11,513

Solution 1

The simple answer is: multiple copies. Whatever else you do, don't trust any single media, location or service.

Personally, I currently use external (USB-connected) hard disks for backup purposes. A 2-3 TB drive can be sourced quite cheaply and will provide you with plenty of storage capacity not just for the time being but also account for any reasonable future needs. At the physical size of roughly a large paperback book, it will easily hold the content of 200-300 movie DVDs without further compression. Buy two, or three if you are paranoid, preferably one of which from a completely different manufacturer (might want to mix Seagate and Western Digital, for example, since they are unlikely to use disks with the exact same design or manufacturing defects), and keep at least one of them in a physically separate location - a bank safety deposit box is a relatively cheap alternative that will give you physical security as well, but even just keeping one copy at work or at a friend's home will almost always work just as well. If you can arrange to be able to refresh an off-site copy without bringing it to your own location, that is even better. If any of the content is privacy-sensitive, keep that in mind when planning how to handle off-site copies.

Also keep in mind that the amount of data you are talking about (300 GB counting as more or less "irrecoverable", another 500 GB "nice to keep" but which in a pinch you could probably get from other sources such as second-hand movie DVDs) is not really all that much. I currently have a grand total of about 100 GB of digital photos alone, and it's not hard for me to add during a single day some 10-15 GB to that - and I have done that on a few occasions going to events where I have had reason to take lots of photos. Many of those photos are of in various ways questionable quality, many are mundane (nice to have, but in a pinch there's nothing truly special about them), but some of them actually are irreplacable from a content point of view as well as actually of good quality. For backup purposes, though, I treat them all the same way: multiple copies. I've had a few hard drives fail on me and while a few times I've lost data I would really have liked to keep, overall this strategy has meant that I can restore the most recent backup to a new drive and be on my merry way. If the live copy fails restore the backup to a replacement primary drive; if the backup drive fails, get a replacement backup drive and make a new backup.

If you do go the multiple storage media route, too, remember to keep checking each for signs of degredation. It's fairly quick and easy to do a SHA1 hash run over all the files on a drive and compare the results, as well as storing the list of hashes itself in multiple locations. That way, even if you get read errors at some point, you can determine which copy is "good".

Solution 2

You have to consider how valuable the data you're backing up is to you. I would split it into at least 2 categories:

  1. Irreplaceable
  2. Would like to keep but won't be heartbroken if the data is lost.

Strategy

For category 1: I would suggest one of the popular online storage systems (Amazon S3, Dropbox, etc). Here you're paying for someone to help you manage the backup, and ensure longer term access and easier access. Otherwise follow the suggestions for category 2 and ensure correct redundancy of data exists and care is taken. Assuming it's a smaller percentage of the 800 Gb total.

For category 2: it's your decision on how much to invest storage fees and time to upload data online. For that reason I would even suggest you use a large HDD to backup all the data, and store that drive disconnected from a PC, and just be aware it has a lifetime linked to a PC that supports current tech eg SATA. Then you can port the data to a new drive / new tech in the future. A 1 or 2TB HDD is reasonably well priced and will cover your data requirements now and into the short term.

Redundancy

Multiple drives with the same data would be your redundancy, which again can even be stored 'off site' if you're truly concerned about data safety.

Security

As an added feature if you're trying to protect the data from unauthorized access encrypt it locally before uploading online, and/or storing to HDD. Something like TrueCrypt will be ideal

Solution 3

First, think about the ways you might lose your data, and decide which you want to protect against. Some examples:

  • accidentally delete something
  • hard disk dies
  • a software or hardware bug
  • malware
  • theft
  • fire
  • natural disaster (fire, flood, quake, volcano, lightning, etc.)
  • government seizure

In my life, I've lost plenty of data, but only to the first two causes on that list.

Hard disks are compact, hold a lot of data, get bigger all the time, readily available, don't require special equipment to use, cheap, and getting cheaper (flooding in Thailand not withstanding).

I keep all my data one one drive; a second USB drive holds regular automated backups; a third identical USB drive sits offsite (at work, or in a safe deposit box is good). Monthly I carry the current backup drive to the offsite location, and bring the other drive back.

All storage media decay; the only way to be sure that your data is good is to use it. As part of that monthly routine, I pick an arbitrary file and restore if from my backup.

Solution 4

I had the same problem as you and looked hard and long for a solution. Many NAS drives have 2 hdd bays.. i wanted more.. but that always had a big price tag on it..

but then I found this! demonstrated model has been superseded sine original post

enter image description here

  • 4 SATA-2 BAYS (not hot swap) max of 8TB (modding can provide more space)
  • 1 SATA ontop for CDROM/SYSTEM HDD running on DMA33
  • ESATA on the BACK
  • 4 USB on the FRONT! 2 ON THE BACK
  • USB INSIDE (where the door is- so you boot of USB drive and not have it sticking out)
  • 8GB MAX DDR3!
  • Hardware Embedded SATA RAID 0/1 But you should use ZFS instead

Product Spec Sheet

I bought mine for £100 (there was a HP deal on) but normal price is £200 - Still half price of many other NAS.. AND! Its a full fledged processor! 1.6Ghz Dual Core AMD (special type that uses low power)

I could not believe my freaking eye balls!

Then what? This is what

I went over to FreeNAS and installed freenas7 - I know FreeNAS 8 is out.. but it seems to chew down on memory.. but the ZFS driver is allot more stable and faster on it.. So the choice is yours.

I installed 4x1TB and used RAIDZ similar to RAID-5 on ZFS (the HP server does have hardware raid(0,1) too.. but the driver does not work on freenas7 :( -- but does work on Windows Essentials! )

so I got 3TB of storage with 1 redundant hard drive(RAID-5)

Did i forgot to mention it has a full PCI-E slot in it? (low profile) Here is a sneak peak at the MB and the nice usb plug in directly on mobo

enter image description here

MODDING

-EDIT AUGUST 2012

I found this post that has a section how to hack the BIOS and enable that extra CD-ROM port to run at 1.5GBs SATA! - YES! I did it and works well.

-EDIT OCT 2017

We have since had the Gen9 and Gen10 of the microservers - I am still using the original configuration from the time of this post.. yea.. still going strong and so happy with it.

It is high time to probably invest in a newer MicroServer and get 4 3TB drives in there using ZFS RaidZ-2 (2 Redundant drives using 4 drives total) Currently just just using RaidZ-1 which is not fully optimised for 4 drives but it has been fine for me for over 5 years :D

Solution 5

Well, if you are serious about saving your data, I would suggest you build two NAS units from inexpensive computer parts. The hardware does not have to be bleeding edge, just capable of accessing Terrabyte SATA drives. You could use one of the many UNIX/Linux flavours or an old Windows version as an operating system. Then load your DVDs and disk files onto this unit and then copy it to the other. You will have to keep the two systems synced with each other in some fashion once a day, week or month depending on your paranoia level.

This is better than tapes or Blu-ray as the data is readily available and you will have two copies floating around. Once in a while, you will have to scan the NAS hard disks and decide if you need to replace them as hardware does fail.

Share:
11,513

Related videos on Youtube

Alan
Author by

Alan

Updated on September 18, 2022

Comments

  • Alan
    Alan almost 2 years

    I have many DVDs with movies and games, they are getting old and dusty, some of them are damaged. And I have more than 300 GB of general files in my PC. I want to store and preserve all of my data and keep them for a long long time.

    I want to know what is the best and more reliable storage media that I should use and... I can buy.
    I will copy all of my DVDs to a new media and throw all of my DVDs away. It will take a very smaller physical space.

    I think that everything that I need to store is currently 800 GB.

    I do not like optical discs, because they are fragile and I have lost many CDs and DVDs.
    Hard drives would be good, but they can suddenly stop working, and it is very expensive to recover data from a broken hard drive.

    Other questions:
    How often will I have to copy the files to a new storage media?
    Is it good idea to have another copy of each file stored in another storage media?

    Edit:
    If I put all of the movies and games in a HDD, if the HDD stops working, I will lose everything.
    This is how valuable they are for me: If I lose 10% of them, I will not cry, because I can get new ones, but if I lose 90%, it will be a disaster.

    Edit 2:
    I do not think that that online storage is reliable even if I pay for it, because they may be closed or may get bankrupt. They are not from my country and it would be difficult to get the files back.

  • soandos
    soandos over 12 years
    It does not. Please cite something that shows why this approach is good (cost effective and reliable).
  • soandos
    soandos over 12 years
    How often do they swap tapes? Is that fact that they store everything in triplicate (or more) a factor in this?
  • Kris
    Kris over 12 years
    What does not what, please?
  • soandos
    soandos over 12 years
    The question do does not ask for opinions. If it does, it does not deserve to be a question here.
  • Synetech
    Synetech over 12 years
    @soandos, > The question do does not ask for opinions. Sure it does. > If it does, it does not deserve to be a question here. Why not? A lot of questions here are like this. Not every question has a single, definitive answer. Even questions on SO don’t always (usually?) have just one way to do things.
  • Xavierjazz
    Xavierjazz over 12 years
    @Synetech inc.: If you read the site FAQ, you will see that questions asking for a subjective response are prohibited. When I re-read the question, I saw that I had misunderstood the implications. Regards,
  • HikeMike
    HikeMike over 12 years
    This is a horrible idea. If it's irreplaceable and not family photos, chances are you don't want others to have access (e.g. business or insurance documents).
  • Blrfl
    Blrfl over 12 years
    This fails miserably when you have an accident that deletes data on the first copy and the second one is automagically synced to mirror it.
  • Dan Is Fiddling By Firelight
    Dan Is Fiddling By Firelight over 12 years
    Pity the HP microserver is about half again as large as the 4x3.5" drive WHS boxes HP used to sell. It's still better than the PC-Q08 which is about twice the volume. Does the HP use a standard miniITX board or a proprietary size?
  • Mark Booth
    Mark Booth over 12 years
    I've just built my folks a Mini-ITX PC using a Lian Li PC-Q08 case, which could happily house six 3.5" and a 2.5" drive, maxing out at around 25GB, if you so desired. *8')
  • Mark Booth
    Mark Booth over 12 years
    @DanNeely - Thanks - Corrected my comment...
  • Synetech
    Synetech over 12 years
    @Xavierjazz, subjective means what is the best this or that, not in what ways can this be done or can you recommend a method for doing that. Like I said, there is more than one way to do most things, so there is nothing wrong with asking for people’s opinions on how to do something; there are plenty of such questions.
  • J. Shmoe
    J. Shmoe over 12 years
    Hot Swap is actually important. When a drive fails, the last thing you want to do is subject the other drives to a full power cycle.
  • kingchris
    kingchris over 12 years
    Well. Never trust automation. Things will go wrong so one always has to be sensible about things. If your NASs are big enough you could do generations of backups. YOu can also plug USB or eSata drives in and copy for off site external storage.
  • sblair
    sblair over 12 years
    Why did you use RAID 5 with ZFS instead of RAID-Z1?
  • Piotr Kula
    Piotr Kula over 12 years
    Hot Swap on a production environment is critical. At home? Will your mama shout at you cause she cant.. ummm save photos to the nas? Power down to swap a hard drive an using the hardware raid to resyc is an amazing trade off at home with this machine. Using ZFS.. you just shut down replace the broken drive. boot up and it resilvers in the background.. all for about £300 compared to £3000 corporate server..
  • Piotr Kula
    Piotr Kula over 12 years
    @sblair I used freeNAS'es ZFS RAID5 option.. I know its software and slower.. but its not bad. I wanted the most out of 4TB and sacrificing 1TB for full redundancy is amazing. So i only have 2.9TB But if any disk starts causing problems its very easy to replace.
  • Nick Josevski
    Nick Josevski over 12 years
    @Daniel Beck that's why I put in the security section, and state to encrypt locally before uploading online. If your data is strongly encrypted before uploading, then it will be fine to store online.
  • sblair
    sblair over 12 years
    RAID-Z1 is also software-based, and offers the same redundancy as software RAID 5. I'm not sure there's a huge difference between these two options in practice, but RAID-Z1 avoids partial stripe writes, so there's no painful read-modify-write cycle. Also, RAID-Z can heal itself after silent data corruption, but ZFS on RAID 5 cannot - it would just know that something was wrong.
  • Piotr Kula
    Piotr Kula over 12 years
    @sblair hmm. I am not sure. I look at management and it says raidz1 raid5 single parity.. so it is raidz1 and raid5?? .. or its just the freenas'es ZFS radiz that is just called raid5 for clarification?
  • user
    user over 12 years
    While this is an interesting aspect for long-term archival storage and retrieval, the OP's question does not read to me as to take such aspects into account.
  • DanX
    DanX over 12 years
    In the past I have answered the exact question asked, then someone came up with a more general and robust answer. And they were right to do so.
  • user
    user over 12 years
    Not only that, but while it works quite well (cost notwithstanding) as a backup medium, tape is inappropriate for random access storage because of its sequential access nature.
  • Daniel R Hicks
    Daniel R Hicks over 12 years
    And another option is film. Good quality safety film, as 35mm, microfilm, or microfiche, is good for 100 years or so, and a reader could always be cobbled up fairly easily.
  • Fiasco Labs
    Fiasco Labs over 12 years
    @Blrfl One of the funniest stories I read was where the guy manually synched over garbage onto his secondary system. 1.5T gone in a self inflicted SNAFU. Funny because he wrote very humorously about the situation, but as he said, a really good lesson in having offline backup copies so he ended up recovering about 95%. Multiple backups and diversity in storage mostly save the day. Offsite helps as well.
  • samthebrand
    samthebrand over 12 years
    I'm a little unclear which method you prefer here: CD-Rs, 2HDs, or a combo of methods?
  • samthebrand
    samthebrand over 12 years
    NickJosevski - are you admitting that your answer is incorrect and the best course of action would be that suggested by @Sirex?
  • Psycogeek
    Psycogeek over 12 years
    CD-Rs have held up "ok" the data density is not as high as other optical media, so they usually survive time, but they do not hold a lot of data. the 2 HDs one active and one inactive, Is the way that i have preserved tons of data. Also many hidden tips in there about storage, i actually have a storage room for my own creations, that attempts to model itself (cheaply) with the storage methods of the library of congress. De-humidification, temperature control, and dust control, well beyond normal. O2 and moisture are the enemies :-) Dust was the enemy of tape.
  • Nick Josevski
    Nick Josevski almost 12 years
    @SamTheBrand not exactly, the point raised was about trusting someone else to handle your irreplaceable data. If the data is so important you should back it up to as many locations as possible, and not just rely on 3rd parties, have your own backup too.
  • endolith
    endolith over 11 years
    Hope your house doesn't get flooded.
  • Piotr Kula
    Piotr Kula over 11 years
    IS that irony i detect there? Sorry if you a victim to flooding- But yes. A flood would not help, or an earthquake or a fire ...
  • gaborous
    gaborous over 7 years
    I second the redundancy principle, this is the only way to ensure long term file fixity. Another way to achieve that is by using error correction codes like Reed-Solomon (see PAR, DVDisaster, pyFileFixity), but multiple copies are always a good and necessary precaution.
  • user
    user over 7 years
    @gaborous Why is RAID not a backup? on Server Fault is relevant.
  • gaborous
    gaborous over 7 years
    Yes that's why error correction codes are a valuable addition, which people often confuse with RAID but they are two very different technologies for different purposes. I also did a more technically extensive explanation of the differences here.
  • user
    user over 7 years
    @gaborous Actually, some RAID levels (basically everything but RAID 0 and RAID 1, and their combined variants) use what you could call error correction codes. I'd rather have two pieces of media with full, known good copies, than five media of which three must be successfully read for me to be able to reconstruct a copy, let alone the issues with online vs offline backups.
  • gaborous
    gaborous over 7 years
    I disagree, RAID5/6 are indeed similar but lesser variants of error correction codes because they are limited to respectively 1 failing disk or k failing disks (where k is usually only 2 currently in most RAID6 implementations...). In other words, if more than 2 copies of one byte are corrupted, the byte is irrecoverable. Error correction codes on the other hand allow to have a window ("block") of up to 255 bytes, and is way more efficient against bursting errors (which is the most common type according to several studies). Anyway, both ecc and duplication are good complementary strategies.
  • gaborous
    gaborous over 7 years
    Also about having multiple copies, 2 is not enough, because if one gets corrupted, you have no way to know which one is correct (except if you computed a hash beforehand, and then you also need the hash to NOT be corrupted either!). It's advised to have at least 3 copies (triple redundancy principle) because then if one fails, you can cast a bitwise majority vote to recover a byte. Of course, this means that only up to 1 byte corruption can be recovered, but if you have 2/3 copies corrupted the vote will tell you that the byte is lost.