Base64 Encoding safe for filenames?

31,188

Solution 1

Modified Base64 (when /,= and + are replaced) is safe to create names but does not guarantee reverse transformation due to case insensitivity of many file systems and urls.

Base64 is case sensitive, so it will not guarantee 1-to-1 mapping in cases of case insensitive file systems (all Windows files systems, ignoring POSIX subsystem cases). Most urls also case insensitive preventing 1-to-1 mapping.

I would use Base32 in this case - you'll get names a bit longer, but Base32 encoded values are 100% safe for file/uri usage without replacing any characters and guarantees 1-to-1 mapping even in cases of insensitive environment (FAT/Win32 NTFS access).

Unfortunately there is usually no built-in support for this encoding in frameworks. On other hand code is relatively simple to write yourself or find online.

http://en.wikipedia.org/wiki/Base32.

Solution 2

RFC 3548 suggests not only to replace the / character. The URL and Filename safe Alphabet replaces:

  • the 63:nd / character with the underscore _
  • the 62:nd + character with the minus -.

But maybe you better use a HEX-String. It is been a while, when i stored a hash value in a filename. I started with using Base64 String but switched to a Hex-String. I don't remember why i switched, maybe because Windows makes no difference between 'a' and 'A' as AndiDog said.

Solution 3

I'm not sure what you are using the encoding for, but consider percent encoding file names.

  • It works in every file system
  • It keeps file names human readable as long as they're within the ASCII range

Solution 4

Usually MD5 hashes (hashes in general) are represented as hexadecimal strings instead of Base64, which then only contain [a-f0-9]. Those names would be supported by all filesystems.

If you really want to use Base64, your solution (replacing slashes) will not work correctly as Windows filesystems don't make a difference between 'A' and 'a'. Maybe you want to use Base32 instead? But mind that Base32 makes 8 bits out of 4, so it will be easier to just take the hexadecimal representation.

In general, the following characters are not allowed in Windows and/or Linux: \ / : * ? " < > |

Solution 5

One-liner for C#:

String filename = Convert.ToBase64String(new SHA256Managed().ComputeHash(Encoding.UTF8.GetBytes("UTF-8 string with snowmen"))).Replace("+", "_").Replace("/", "-").Replace("=","");

Needs the following to the beginning of the file:

using System.Security.Cryptography
using System.Text
Share:
31,188
heroddaji
Author by

heroddaji

Updated on June 29, 2020

Comments

  • heroddaji
    heroddaji almost 4 years

    Is Base64 encoding safe to use for filenames on Windows and Linux systems? From my research I have found that replacing all / characters of the result with - or _ should resolve any issues.

    Can anyone provide more details on this?

    Currently in Java I am using the following peice of code:

    MessageDigest md5Digest = MessageDigest.getInstance("MD5");
    md5Digest.reset();
    md5Digest.update(plainText.getBytes());
    
    byte[] digest = md5Digest.digest();
    
    BASE64Encoder encoder = new BASE64Encoder();
    hash = encoder.encode(digest);
    hash.replace('/','_');
    
  • Mark
    Mark over 13 years
    You can use base_encode function in php, see stackoverflow.com/questions/1848601/…
  • thomasrutter
    thomasrutter about 11 years
    @Mark base_encode is only suitable for values that can be represented as numbers in PHP and the exact precision is platform-dependent, but anything over 14 decimal digits (about 9 base32 digits) may not retain integer precision, making it unsuitable for strings/hashes.
  • Rob Worsnop
    Rob Worsnop over 9 years
    Guava supports Base32 encoding now.
  • Dan Bechard
    Dan Bechard over 9 years
    This doesn't address the case-insensitivity problem.
  • Andrii Abramov
    Andrii Abramov about 6 years
    And I would not call this one-liner.. Requires horizontal scroll while reading the code
  • gstackoverflow
    gstackoverflow about 6 years
    Modified Base64 (when /,= and + are replaced) is safe to create names but does not guarantee reverse transformation due to case insensitivity of many file systems and urls. Can you explaint it in details?
  • Alexei Levenkov
    Alexei Levenkov about 6 years
    @gstackoverflow Base64 is case-sensitive, so if you care to reverse transformation back to bytes case must be preserved ("Baaa" -> "05 A6 9A", but "BAAA" => "04 00 00"). On FAT/FAT32 file name is normalized to upper case - so you can't transform file name read from file system back to byte array. Similarly often parameters of Urls are normalized (usually by misunderstanding, but still) causing the same problem. Additionally some file systems (NTFS) are case sensitive but not allow names vary just by case - so you can't construct "unique" NTFS names from distinct bytes.
  • user2864740
    user2864740 over 4 years
    In actuality, the only forbidden characters in Linux (and many UNIX-derivatives) are / and NUL on their native file-systems. It's usually unnecessary (and confusing) to deviate far from the more restrictive Windows set.. here is a breakdown by filesystem (and assumes programs written correctly per target restrictions; then again, even Windows Explorer failed that test on valid FAT32 and NTFS filenames for many years..).