How to rename file names to avoid conflict in Windows or Mac?

7,309

Solution 1

You could do something like:

rename 's/[<>:"\\|?*]/_/g' /path/to/file

This will replace all these characters with a _. Note that you need not to replace /, since it's an invalid character for filenames in both filesystems, but is used as the Unix path separator. Extend to a directory and all its contents with:

find /path/to/directory -depth -exec rename 's/[<>:"\\|?*]/_/g' {} +

Note that both / (which marks the end of the pattern) and \ are escaped. To retain uniqueness, you could append a random prefix to it:

$ rename -n 's/[<>:"\/\\|?*]/_/g && s/^/int(rand(10000))/e' a\\b
a\b renamed as 8714a_b

A more complete solution should, at least:

  1. Convert all characters to the same case
  2. Use a sane counting system

That's to say, foo.mp3 should not become foo.mp3.1, but foo.1.mp3, since Windows is more reliant on extensions.

With that in mind, I wrote the following script. I tried to be non-destructive, by using a prefix path into which I can copy the renamed files, instead of modifying the original.

#! /bin/bash

windows_chars='<>:"\|?*'
prefix="windows/"

# Find number of files/directories which has this name as a prefix
find_num_files ()
(
    if [[ -e $prefix$1$2 ]]
    then
        shopt -s nullglob
        files=( "$prefix$1-"*"$2" )
        echo ${#files[@]}
    fi
)

# From http://www.shell-fu.org/lister.php?id=542
# Joins strings with a separator. Separator not present for
# edge case of single string.
str_join ()
(
    IFS=${1:?"Missing separator"}
    shift
    printf "%s" "$*"
)

for i
do
    # convert to lower case, then replace special chars with _
    new_name=$(tr "$windows_chars" _ <<<"${i,,}")

    # if a directory, make it, instead of copying contents
    if [[ -d $i ]]
    then
        mkdir -p "$prefix$new_name"
        echo mkdir -p "$prefix$new_name"
    else
        # get filename without extension
        name_wo_ext=${new_name%.*}
        # get extension
        # The trick is to make sure that, for:
        # "a.b.c", name_wo_ext is "a.b" and ext is ".c"
        # "abc", name_wo_ext is "abc" and ext is empty
        # Then, we can join the strings without worrying about the
        # . before an extension
        ext=${new_name#$name_wo_ext}
        count=$(find_num_files "$name_wo_ext" "$ext")
        name_wo_ext=$(str_join - "$name_wo_ext" $count)
        cp "$i" "$prefix$name_wo_ext$ext"
        echo cp "$i" "$prefix$name_wo_ext$ext"
    fi
done

In action:

$ tree a:b
a:b
├── b:c
│   ├── a:d
│   ├── A:D
│   ├── a:d.b
│   └── a:D.b
├── B:c
└── B"c
    └── a<d.b

3 directories, 5 files
$ find a:b -exec ./rename-windows.sh {} +
mkdir -p windows/a_b
mkdir -p windows/a_b/b_c
mkdir -p windows/a_b/b_c
cp a:b/B"c/a<d.b windows/a_b/b_c/a_d.b
mkdir -p windows/a_b/b_c
cp a:b/b:c/a:D.b windows/a_b/b_c/a_d-0.b
cp a:b/b:c/A:D windows/a_b/b_c/a_d
cp a:b/b:c/a:d windows/a_b/b_c/a_d-1
cp a:b/b:c/a:d.b windows/a_b/b_c/a_d-1.b
$ tree windows/
windows/
└── a_b
    └── b_c
        ├── a_d
        ├── a_d-0.b
        ├── a_d-1
        ├── a_d-1.b
        └── a_d.b

2 directories, 5 files

The script is available in my Github repo.

Solution 2

Recursively replace a list of strings or characters in filenames by other strings or characters

The script below can be used to replace a list of strings or characters, possibly occurring in a file's name, by an arbitrary replacement per string. Since the script only renames the file itself (not the path), there is no risk of messing with directories.

The replacement is defined in the list: chars (see further below). It is possible to give each string its own replacement, to be able to reverse the renaming if you'd ever want to do that. (assuming the replacement is a unique string). In case you'd like to replace all problematic strings by an underscore, simply define the list like:

chars = [
    ("<", "_"),
    (">", "_"),
    (":", "_"),
    ('"', "_"),
    ("/", "_"),
    ("\\", "_"),
    ("|", "_"),
    ("?", "_"),
    ("*", "_"),
    ]

Dupes

To prevent duplicated names, the script first creates the "new" name. It then checks if a similarly named file already exists in the same directory. If so, it creates a new name, preceded by dupe_1or dupe_2, until it finds an "available" new name for the file:

enter image description here

becomes:

enter image description here

The script

#!/usr/bin/env python3
import os
import shutil
import sys

directory = sys.argv[1]

# --- set replacement below in the format ("<string>", "<replacement>") as below
chars = [
    ("<", "_"),
    (">", "_"),
    (":", "_"),
    ('"', "_"),
    ("/", "_"),
    ("\\", "_"),
    ("|", "_"),
    ("?", "_"),
    ("*", "_"),
    ]
# ---

for root, dirs, files in os.walk(directory):
    for file in files:
        newfile = file
        for c in chars:
            newfile = newfile.replace(c[0], c[1])
        if newfile != file:
            tempname = newfile; n = 0
            while os.path.exists(root+"/"+newfile):
                n = n+1; newfile = "dupe_"+str(n)+"_"+tempname
            shutil.move(root+"/"+file, root+"/"+newfile)

How to use

  1. Copy the script into an empty file, save it as rename_chars.py.
  2. Edit if you want the replacement list. As it is, the scrip0t replaces all occurrences of problematic characters by an underscore, but the choice is yours.
  3. Test- run it on a directory by the command:

    python3 /path/to/rename_chars.py <directory_to_rename>
    

Note

Note that in the line:

("\\", "_bsl_"),

in python, a backslash needs to be escaped by another backslash.

Share:
7,309

Related videos on Youtube

don.joey
Author by

don.joey

Before I was called Private, but due to namespace polution I am henceforth known as don.joey! For my real avatar (.gif): check here.

Updated on September 18, 2022

Comments

  • don.joey
    don.joey over 1 year

    How can I batch rename file names so that they do not include characters that clash with other file systems as for instance,

    Screenshot 2015-09-07-25:10:10
    

    Note that the colons are the issue in this file name. These will not be digested by Windows or Mac.

    These files could be renamed to

    Screenshot 2015-09-07-25--10--10
    

    I have to move a large amount of files from Ubuntu to another OS. I copied them to an NTFS drive using Rsync, but that lost some files. I also copied them to an ext4 drive.

    The following list are the reserved characters:

    < (less than)
    > (greater than)
    : (colon)
    " (double quote)
    / (forward slash)
    \ (backslash)
    | (vertical bar or pipe)
    ? (question mark)
    * (asterisk)
    

    Another issue is that Windows is not case-sensitive when it comes to file names, (and most OS X systems as well).

    • Panther
      Panther over 8 years
      How much of the information do you want to preserve ? Use a loop for i in Screenshot* .. n=1 ... mv $i $i$n ... n=n+1 ...
    • Jacob Vlijm
      Jacob Vlijm over 8 years
      Do you have any preferences how to rename? Also: is there a risk on dupes after renaming?
    • Rinzwind
      Rinzwind over 8 years
      @JacobVlijm I would assume that to be a yes (just to be safe and yes I know ... that regex will be long :D )
    • Rinzwind
      Rinzwind over 8 years
      You probably need to name a character for every char you want to replace. And that could be added to 1 regex or to multiple rename.ul instructions.
    • don.joey
      don.joey over 8 years
      To be honest: the chance for me having dupes is small. For future purposes, though, I think a snippet should avoid dupes and should preserve as much info as possible
  • Rinzwind
    Rinzwind over 8 years
    1 slight issue: 1<1.txt and 1:1.txt will have you end up with 1 file less than intended.
  • muru
    muru over 8 years
    @Rinzwind True. But how do you decide which one is which in the Windows world?
  • don.joey
    don.joey over 8 years
    I think Rinz has a point. Maybe manpages.ubuntu.com/manpages/oneiric/man1/rename.ul.1.html could come in handy?
  • muru
    muru over 8 years
    @don.joey Ok - then where do you stop? Have you taken into account case-sensitivity?
  • don.joey
    don.joey over 8 years
    Could you take it into account?
  • Rinzwind
    Rinzwind over 8 years
    "case" could be solved with a backup parameter with "cp" or "rsync"(?)
  • muru
    muru over 8 years
    @Rinzwind will ruin extensions (not a bother for Linux, but will mess up the Windows world).
  • don.joey
    don.joey over 8 years
    Actually this messes up the dir structure because it will replace the slashes.
  • muru
    muru over 8 years
    @don.joey yes, it does. I was in the process of editing it, then my attention went to other things.
  • muru
    muru over 8 years
    @don.joey see update.
  • conualfy
    conualfy about 4 years
    Length is also a thing is the Windows world, NTFS does not allow that long filenames and folder names as ext4 does. It should be treated, too.
  • Andy
    Andy almost 3 years
    Yo this ruined my git subfolder. all i did was find . <etc>. I guess it kinda was my fault. but still, no warning.
  • muru
    muru almost 3 years
    @AndiHamolli none of the problematic characters are likely to appear in a .git folder (unless used in a branch name or something). In any case, files are only copied, not moved, so your original stuff should remain as is.
  • Andy
    Andy almost 3 years
    Does it delete anything?
  • Andy
    Andy almost 3 years
    @muru , can you also make a version that just renames the existing files, without creating a separate windows folder.
  • muru
    muru almost 3 years
    @AndiHamolli Currently it doesn't delete anything. It just makes copies of whatever needs to be renamed. If you want to just rename, change cp to mv and delete the prefix="windows" line.