This forum is in permanent archive mode. Our new active community can be found here.

Back up!

edited September 2011 in Technology
I have a pet peeve and it is people who don't back their shit up. To quote Scott:
AprecheY U NO BACKUP?
In the spirit of not just telling you to back up, here is how to do it, and how to do it well.

1) Back up to the cloud. This costs about $50-$60 per year, choose any of the many providers. I use JungleDisk which is a nice front end to Amazon S3 and CrashPlan which has their own data centers. Amazon S3 is great because it is huge and reliable. Also, your storage is accessible via public API's and in an emergency you can write a three line python script to get it all back (JungleDIsk actually provides a free open source program for this on their site). Finally, Amazon S3 is cheap; for small amounts of data (hundreds of MB's) it is literally pennies a month. CrashPlan is free unless you use their own cloud storage. They have a neat inbuilt system where you can give space on your machine for a friend's backup or just backup all your machines to each other. I use their unlimited plan for $36/year (you have to buy a 4 year plan) which is for one computer plus attached drives.

2a) Local backup. If you have a desktop, permanently attach an external drive to it. If you have a laptop, attach the external disk to your WiFi router (if your router does not support this, get yourself a modern router).

2b) Use some software to do the backup for you. This will cost some amount of money, possibly 0 since there are some free programs out there (e.g. CrashPlan, Carbon Copy Cloner). If you need to do it yourself, you will forget. I use ChronoSync (mac only) because it has customizations out the wazoo. I back up all my machines to a RAID disk that is attached to a server and that server backs itself, including the external disk, up to CrashPlan. This is in addition to each machine individually having an automated JungleDisk backup running.

You need to do both 1) and 2). If your house burns down and you lose all of your data, then it was not a backup. If Amazon flubs big and your data is gone, then it was not a backup. Don't call it a backup if it is not a backup.

A couple more general points

Never, ever, mirror your data. This, coupled with automatic backups can result in a glitch where your backup gets corrupted and backing up corrupts your data. Always have primary backups be one way only and keep copies of deleted/modified files. After that you can have a secondary backup which is, e.g., a cloned image of your boot drive.

Always have reporting. JungleDisk had an RSS feature that reports every backup, successful or not. ChronoSync has customizable emails, and I have my mail filters set up such that warnings get flagged. CrashPlan is the best, since their server end will actually email you if your backup hasn't happened within a specified timeframe.

A backup is never a backup if you can't get your files back. So after setting everything up, test the setup by retrieving some files. Note where you need to know passwords and try to figure out if you could do the retrieval without them (e.g. by requesting new ones). If there are passwords which are absolutely necessary, write them down on a piece of paper and store it safely somewhere else than your house.

This last one is to protect your strategy from biological memory loss and, in case you are backing up not just your own data, from the possible bioligical loss of yourself. Write down your backup strategy with specific instructions on how to retrieve the backup. Give to a person you trust.

So, how do you back up? Feel free to suggest improvements to my setup.
«1

Comments

  • To have perfect backups you need 3 kinds.

    1) Storage that is super safe and far away. S3 is perfect for this.
    2) A second hard drive that is nearby that you backup to once in awhile. Apple Time Machine does this. It will save you if you accidentally delete a file or something.
    3) A RAID 1 mirror. This will save you if one of the drives crashes, but it won't save if you if you accidentally make a mistake and delete or corrupt something. That's why you need the other two kinds.

    Remember, you only need to backup data that can not be replaced. I do not backup anything that can be downloaded again. For example, I only backup music that is weird and rare that I downloaded back in the Napster/WinMX college days. I know if I didn't back it up, it might be lost forever. I don't backup NES roms or Madoka fansubs, because I know I can torrent them again when I need them.
  • Never, ever, mirror your data. This, coupled with automatic backups can result in a glitch where your backup gets corrupted and backing up corrupts your data. Always have primary backups be one way only and keep copies of deleted/modified files. After that you can have a secondary backup which is, e.g., a cloned image of your boot drive.
    What do you mean by mirroring data?
  • What do you mean by mirroring data?
    As in, RAID 1 can not be your only backup. If RAID 1 is all you have, then you have nothing.
  • What do you mean by mirroring data?
    As in, RAID 1 can not be your only backup. If RAID 1 is all you have, then you have nothing.
    But you are protected against a single hard drive failure.
  • I don't backup NES roms or Madoka fansubs, because I know I can torrent them again when I need them.
    I don't know about NES ROMs, but getting fansubbed anime after the season is over can be dicey if they show wasn't super popular. Just sayin.

    For me, I'm really struggling to think of something on my home computer or work computer that I can't possibly go on without. I mean maybe my resumes, but that would only be annoying to lose completely.
  • but getting fansubbed anime after the season is over can be dicey if they show wasn't super popular. Just sayin.
    If I can torrent "Flying Ghost Ship" then I'm confident I can get everything else I need.
  • One aspect of not backing up replaceable content that I've been considering lately is to automatically pull a listing of what this content is and back that up. Meaning, save the output of 'ls /data/movies/' to a text file and then backup that tiny text file. That way if I suffer a drive failure that held replaceable data, I know what data was actually lost and can decide what data I want to go to the effort of re-acquiring.

    In regards to video games, of course there is no need to backup Steam games, but I do backup my save game directories if saves aren't stored to the cloud.
  • I backup my important doc's to Dropbox, and I have a full storage drive backup on my external. I just finished uploading all my music to google music too. I'm going to start using Dropbox for my gamesaves more too. What I've done in the past is use a combination of Sync-toy and Dropbox to backup my saves and access them from multiple computers. I just pair a games save folder with folder in Dropbox on each computer. Its kind of a pain, but it works for those games that don't have steam cloud support. Man they really need to make that standard.
  • I'm hard-pressed to think of any truly irreplaceable data that I've got kicking around. My most important documents are work-related, and those are all handled by a system that I cannot access from home.

    I do have some weird music lying around, but I have so much music that I really won't care about losing those few rare songs.
  • edited September 2011
    I realized this year at school that, at any given time, I have maybe five irreplaceable and extremely important documents on my drive at any given time (mostly lab reports and zip files of scanned data). I just Dropbox those, and put a second copy on a 1.5TB ultraportable external drive in case my power, internets, or both fail. Media I'm fond of or have yet to watch gets sent to the same drive; I use Spotify for music, so I don't even bother backing up music anymore. Otherwise, every program I have is downloaded, and I have a Windows 7 ISO and a file of serial numbers for various programs on the drive as well. I pretty much have my bases covered.
    Post edited by WindUpBird on
  • edited September 2011
    Never, ever, mirror your data. This, coupled with automatic backups can result in a glitch where your backup gets corrupted and backing up corrupts your data. Always have primary backups be one way only and keep copies of deleted/modified files. After that you can have a secondary backup which is, e.g., a cloned image of your boot drive.
    What do you mean by mirroring data?
    Like Scott says, RAID 1 is an example of mirroring data, however the concept is more general and I refer to the practice where you have two places (e.g. your laptop's hard drive and the external drive attached to the router) and you use the backup program to always have an exact copy of your laptop's drive on the external drive. If you delete a file on your laptop, next time you back up it gets deleted on the external drive. If a file changes location on the laptop the same change gets applied to the external drive, etc..

    This may sound ideal, since if something goes wrong with your laptop, you'll get it back bit for bit the way it was at the last backup, with minimal effort. Also the external drive has to be at most the same size as the internal drive in your laptop. However, if there is something wrong with your drive in such a way that non critical files just get slowly corrupted, you may not notice the problem before those corruptions have been backed up and replaced the healthy files on the external drive.

    It get's worse if you use two way mirroring (and yes, some backup programs actually let you choose this feature!), because then trouble that originates with your external drive can propagate back to your laptop. You have effectively doubled the chances that your data will get corrupted.
    Remember, you only need to backup data that can not be replaced.
    This is true, but with the current plans the cloud backup companies have it cost you little to nothing to back up everything. This is also again a precaution against edge cases, if you set up for this and that directory to be excluded from the backup, or just backup that directory and those files, there will come a time when you accidentally have an important file in the wrong place. For some people, who are awesome at having their shit together, this may never happen, but as a general rule; just back everything up.
    Post edited by Dr. Timo on
  • I don't back my shit up. I also don't bottle my piss.
  • Hell, right now I work for a company that makes enterprise disk-based backup equipment and even our product has a feature for off site backups where you have backup gizmo A in say Boston and backup gizmo B in New York. The data is automatically transferred from A to B and vice versa as necessary because you never know if a fire will burn down one of your offices containing your backup gizmo.

    Before cloud-based backups became common and cheap, I would sometimes burn some DVDs of my files and mail them to my mom, just to give me an off-site backup strategy. If you don't want to use a cloud service for some reason, this may still be a valid solution for you, if somewhat annoying.
  • I used to keep my USB backup (the 3rd backup) in a fire safe prior to cloud. Think that would've worked?
  • I used to keep my USB backup (the 3rd backup) in a fire safe prior to cloud. Think that would've worked?
    Too many variables, especially as I'm not an expert on the effects of heat on flash memory/hard drives or fire safe construction. Fire safes are more or less designed to keep the temperature low enough to prevent paper from burning, but I'm not sure if that's also low enough to prevent data corruption/destruction on flash/hard drives. In addition, if the fire was severe enough it could result in a building collapse that destroys the fire safe and all its contents. Also, while I mentioned fire as a possible disaster you'd want to use offsite backups to recover from, it's not the only one -- floods, earthquakes, riots, tornadoes, etc., all are things you'd want an offsite backup and recovery strategy to handle.
  • I have many GB of irreplaceable data, but not so irreplaceable as to pay for cloud storage for all of it. I take so many photos and shoot so much video that I don't think I'd be able to upload it all before I have another TB of data to upload.

    At the moment I'm a making an effort to sort through all my data going back to 1998, since I have it on an assortment of internal hard drives, external hard drives, DVDs and CDs. I'm copying everything onto two 2TB hard drives (except for video rushes and RAW image files). I'll keep every old hard drive too, and all the old media, in case things don't copy across. The only thing I'm having trouble recovering are a few folders of photos that I know are on an old laptop. But the laptop's power adapter is broken, and the hard drive is an ATA type thing, and I don't have an adapter for that either. The photos that I know are on this drive, but that I can't find backed up anywhere else, are those my girlfriend took when we broke through the ceiling of our old apartment into the apartment above. We cut a hole and just kept going. It was pretty crazy, with both of us pushing the other on. As it as totally illegal we never shared the photos with anyone. I really want those photos! And there are some of the original recordings of the SFBRP on there too. It would be nice to have the uncompressed data for every one, not just the 64k mp3 files. And a few other folders of this and that that I kind find elsewhere.

    Once I get everything on to one hard drive I'll see if there is a way to easily de-duplicate them, as I know there is a lot of repetition.

    My ultimate goal is to scour the internet, on as many forums and usenet groups I've posted in, and collect together everything I've ever posted online. Usenet shouldn't be a problem, but a few forums probably won't be on the wayback machine or anywhere else. I'm not sure how I'll go about this, but it should be a fun research project.

    Scott, would it be possible to list every post I've ever made to this forum in a single page? With the link to the original thread for context? I think there could be a lot of fun stuff there, things I wouldn't even remember I've posted.

    Anyway, to address the first post, Time Machine is awesome. I always have at least two Time Machine disks up to date. And every other file I have on an external hard drive is on at least one other. And I use ftp for a lot of stuff. And Dropbox is cool too. I don't use any other online backup, because I travel so much, and it's not practical for me to do so.
  • Scott, would it be possible to list every post I've ever made to this forum in a single page? With the link to the original thread for context? I think there could be a lot of fun stuff there, things I wouldn't even remember I've posted.
    There's a search syntax that an do that for you.
  • edited September 2011
    but not so irreplaceable as to pay for cloud storage for all of it.
    I pay less than $1 a month for backups on Amazon S3. It's so insanely cheap.
    I take so many photos and shoot so much video that I don't think I'd be able to upload it all before I have another TB of data to upload.
    it probably would take forever to upload, but services like carbonite will allow unlimited backup for $60 per year. If you can take your hard drives and your laptop to a place that has really fast upload speeds, it might be worth it. At Rym's house with 20 megabit upload speeds you could upload a terabyte of data in a little over 100 hours. If you could go to the Netherlands where they have Gigabit fiber, supposedly, you could probably do it leaving a computer plugged in for a day. Of course, that depends on who you are uploading TO.
    Scott, would it be possible to list every post I've ever made to this forum in a single page? With the link to the original thread for context? I think there could be a lot of fun stuff there, things I wouldn't even remember I've posted.
    The forum is very well backed up, and things like that are certainly possible because it is all in an SQL database. The problem is I can't just make the database publicly available because that gives everyone's hashed passwords away and anyone with a rainbow table and a brain will destroy the forum. I could give you the data by hand, but then I have to do it for everyone. I don't have time for that shit. I say wait for Vanilla 2 and I'll find some way to make data available through an API in a safe manner.

    Maybe I can charge money to do these things by hand for the time being.
    Post edited by Apreche on
  • but not so irreplaceable as to pay for cloud storage for all of it.
    I pay less than $1 a month for backups on Amazon S3. It's so insanely cheap.
    S3 gets expensive if you have TB's of stuff though. CrashPlan is the cheapest I found at $3/month for unlimited, I have ~2TB on their servers but it took a couple of months do get it all up there.
  • but not so irreplaceable as to pay for cloud storage for all of it.
    I pay less than $1 a month for backups on Amazon S3. It's so insanely cheap.
    The key word here is "all". I backup a fair bit, but not everything. How much storage does in GB does $1 per month get you?
    services like carbonite will allow unlimited backup for $60 per year.
    From my (admittedly not very extensive) research, these "unlimited" offers tend to only upload data from internal hard drives. I have a 256GB internal hard drive, but I'm constantly archiving photos and videos and audio files to other hard drives. And it is the stuff that is archived on other drives which is the main bulk of what I would like to put in the cloud but which isn't there.

    And the main reason that I wouldn't be able to keep ahead of the upload is that I travel so much. I've not been home for over three weeks now, and my main internet connection is this satellite connection on a cruise ship. It's slightly faster than dialup, and costs about 10 cents per minute each time I log on.

    In that same three weeks I've created 53GB of new files that I'm going to archive. Am I even going to try to back this up to the cloud in the 10 days I have at home before I leave to Portland and SF for another two weeks? Nope!
    I could give you the data by hand, but then I have to do it for everyone.
    No you wouldn't, you'd just have to do it for me. Alternatively I'll see if my python kung fu is good enough for me to work out how to scrape it myself.
  • edited September 2011
    From my (admittedly not very extensive) research, these "unlimited" offers tend to only upload data from internal hard drives.
    Read my posts. CrashPlan has an unlimited plan for $3/month ($140 for a four year plan) which includes one computer and any number of attached drives. Drives can even be disconnected and reconnected at any time.
    In that same three weeks I've created 53GB of new files that I'm going to archive. Am I even going to try to back this up to the cloud in the 10 days I have at home before I leave to Portland and SF for another two weeks?
    I just checked my backup reports and 2TB of data took ~100 days, So 10 days gives you ~200GB of upload. I have 10Mb/s upstream but for some reason the fastest I could get data onto their servers was 3.6Mb/s. From reading their support forum, this is a problem with trans atlantic customers in general, which CrashPlan has no control over. I did some separate checking with SpeedTest which confirmed that this problem is not unique to CrashPlan.

    Also, at your rate of data creation, this problem is not going to get better for you. Sure data speeds increase with time, but so do pixel counts in DSLR's :-). You better start backing up now because later it will be an even greater pain in the butt. Do what I did, get an old ass computer, set it up next to your router at home, attach all drives and back up for three months.
    Post edited by Dr. Timo on
  • I could give you the data by hand, but then I have to do it for everyone.
    No you wouldn't, you'd just have to do it for me. Alternatively I'll see if my python kung fu is good enough for me to work out how to scrape it myself.
    Turns out this is a pretty fun exercise. After four or five attempts, I've finally got the hang of object oriented programming on python. Or to put it this way; I've finally made something that both works and is useful.
    Read my posts. CrashPlan has an unlimited plan
    I'll look into it when I get home on Sunday. Thanks for the info.
  • Turns out this is a pretty fun exercise. After four or five attempts, I've finally got the hang of object oriented programming on python. Or to put it this way; I've finally made something that both works and is useful.
    Are you using BeautifulSoup and/or Scrapy?
  • What do you mean by mirroring data?
    As in, RAID 1 can not be your only backup. If RAID 1 is all you have, then you have nothing.
    One "backup" is a copy, two is a backup, three is a reliable backup.
  • Turns out this is a pretty fun exercise. After four or five attempts, I've finally got the hang of object oriented programming on python. Or to put it this way; I've finally made something that both works and is useful.
    Are you using BeautifulSoup and/or Scrapy?
    What do you think? I'm using stuff I write myself, looking at python documentation. It's hard work, but it's fun to learn.
  • What do you think? I'm using stuff I write myself, looking at python documentation. It's hard work, but it's fun to learn.
    You should use libraries. The whole point of Python is that batteries are included.
  • What do you think? I'm using stuff I write myself, looking at python documentation. It's hard work, but it's fun to learn.
    You should use libraries. The whole point of Python is that batteries are included.
    Yeah, I know, but I did the entire project while offline, so can't look that stuff up easily. And in the end, parsing and extracting the relevant bits of text out of the HTML was the easiest part. Anyway, after a few years and many attempts to "get" object oriented programming, it has finally happened with this project.

    Some results!
    A history of the movies I have seen since December 2009 (or Every Post Luke Made To The Movies Thread).
    Luke's dating history since 2009 (or Every Post Luke Made To The Dating Thread).
  • You're also killing the server since you are doing things really inefficiently.
  • Killing the server by asking for 50 pages? Sorry, I didn't think that would be a problem.
  • Killing the server by asking for 50 pages? Sorry, I didn't think that would be a problem.
    Well, it's not dying, but I can see when you do it.
Sign In or Register to comment.