How do I encrypt large (20Gb+) compressed files from a backup?

7,030

Others have suggested various symmetric encryption tools which are suitable for pipelining, such as aespipe. I suspect those will be about as efficient as you can get, given that encryption is a fairly CPU-expensive sort of thing to do, and it's not a bad suggestion.

But I'd suggest considering an asymmetric tool such as gpg. The under-the-hood bulk encryption will still be via a symmetric cipher using a nonce key, but the whole issue of key management becomes vastly easier with the GPG toolchain available.

In particular, the use of a simple symmetric cipher to encrypt the backups either fatally torpedoes the security (in that the same key is used to encrypt all backups, and the server must have access is to it on a regular basis) or substantially increases your complexity (in that you use a different key for each backup, and must now manage them manually).

With a tool like gpg, you can generate a single keypair for the encryption of all backups, and the server that encrypts needs only the public half to which to encrypt the backups. You can keep the private key separately, either under normal security or - if business reasons require it - arrange to split the key amongst many individuals, some subgroup of which is needed to reconstitute the private key, using standard key-sharing protocols.

Share:
7,030

Related videos on Youtube

initall
Author by

initall

All true wisdom is printed on T-Shirts.

Updated on September 18, 2022

Comments

  • initall
    initall over 1 year

    I want to encrypt regularly created and compressed backup files from OpenVZ containers, before they leave the server and get copied to the providers backup space via ftp.

    They vary in size, from 6 to 50 GB (after being compressed, lzo) UPDATE ... and are created automatically by the virtualization environment Proxmox.

    Various comments here and there on mcrypt ("code quality / support") or openssl ("not for large files") that make me wonder if they fit. What can you recommend?

    Furthermore: I can't split the backup file during compression to smaller parts and don't want to do it afterwards for performance reasons. I've had bad experiences using duplicity and want to avoid it, should you want to mention that.

    The server environment is Debian 7.

    • MadHatter
      MadHatter over 9 years
      Is gpg out of the question? You also don't say how you're doing the backups, which makes it harder to recommend something that does the encryption on-the-fly whilst not making smaller pieces.
    • Michael Hampton
      Michael Hampton over 9 years
      Have your backup tool do this. You won't get any performance otherwise.
    • initall
      initall over 9 years
      @MadHatter This is a Proxmox setup that does backups on the fly with lvm-snapshots. They result .lzo compressed in my case. After that I could run whatever fits to encrypt them.
    • MadHatter
      MadHatter over 9 years
      But you said you don't want to do that: "I ... don't want to do it afterwards for performance reasons".
    • initall
      initall over 9 years
      Pardon me? Proxmox creates a huge, compressed backup file. I don't want to split THIS FILE and encrypt the single parts (point is: just because encryption tool x or y only accepts say up to 500MB), I just ask if there is a solution to encrypt this whole existing compressed file.
    • MadHatter
      MadHatter over 9 years
      @initall: ah, I see. Your comment was ambiguous - the "it" that you don't want to do after is splitting the file, not encrypting it. Understood. Then may I again suggest gpg, which works very well in pipelines?
  • initall
    initall over 9 years
    Ok, I gave it a try. While not giving exactly arguments for gpg being the perfect tool for encrypting LARGE files, I did some measurements of my own. On my server it took about 4 1/2 minutes to encrypt the 20Gig+ file and it was just a matter of seconds to encrypt for more recipients. I also found that the backup utility vzdump offers scripting with various hooks so that I can encrypt right after the backup file is created.
  • David Timothy Strauss
    David Timothy Strauss over 6 years
    Encryption with AES is not CPU-expensive in modern systems; today's processors have hardware acceleration. What is expensive (relative to data size) is using an asymmetric encryption algorithm like RSA. That's why the typical approach for large files is to (1) generate a one-time-use secret, (2) encrypt using a symmetric algorithm (like AES) and the one-time-use secret, (3) encrypt the one-time-use secret using an asymmetric algorithm (like RSA) for whoever will need to decrypt the file. This gives the performance of AES with the flexibility of RSA. Some tools can automate this.
  • MadHatter
    MadHatter over 6 years
    @DavidTimothyStrauss errm yes, I say exactly that in my answer. But for the record, I agree with you.
  • David Timothy Strauss
    David Timothy Strauss over 6 years
    @MadHatter You're right; your complete answer does cover that approach. I was thrown off by the claim that AES is "a fairly CPU-expensive sort of thing to do," which is not true. A modern processor (with AES-NI) can encrypt AES at speeds exceeding 10gbps, and the necessary hardware has shipped on Intel processors since 2010 (servers) and 2011 (desktops and laptops). This makes the answer confusing when explaining the combination of asym and sym because calling AES "fairily CPU-expensive" doesn't begin to capture the speed difference between AES and RSA (or similar) and why people nest them.
  • MadHatter
    MadHatter over 6 years
    @DavidTimothyStrauss I think we're pretty much agreeing. Even symmetric encryption requires quite a lot of CPU operations per byte processed, compared to (say) addition or multiplication, or merely reading data from storage. I freely concede that hardware support (eg AES-NI) will optimise one particular encryption algorithm, but the general point (I think) stands given that AES-NI is no help with, say, Threefish. When picking a security strategy, ignoring the cost of all symmetric encryption because you can perform one particular transform very efficiently strikes me as unwise.