rsync to get a list of only file names

file list filenames rsync

23,204

Solution 1

Hoping the question will be moved to the appropriate site, I'll answer here nevertheless.

You could append a pipe with awk:

rsync ... | awk '{ $1=$2=$3=$4=""; print substr($0,5); }' >output.txt

This eliminates all the unwanted information by outputting everything from the 5th field, but works only if none of the first four fields in the output format gets an additional whitespace somewhere (which is unlikely).

This awk solution won't work if there are file names starting with whitespace.

An even more robust way to solve could be a rather complex program which as well makes assumptions.

It works this way: For each line,

Cut off the first 10 bytes. Verify that they are followed by a number of spaces. Cut them off as well.
Cut off all following digits. Verify that they are followed by one space. Cut that off as well.
Cut off the next 19 bytes. Verify that they contain a date and a time stamp in the appropriate format. (I don't know why the date's components are separated with / instead of - - it is not compliant with ISO 8601.)
Verify that now one space follows. Cut that off as well. Leave any following whitespace characters intact, as they belong to the file name.
If the test has passed all these verifications, it is likely that the remainder of that line will contain the file name.

It gets even worse: for very esoteric corner cases, there are even more things to watch: File names can be escaped. Certain unprintable bytes are replaced by an escape sequence (#ooo with ooo being their octal code), a process which must be reversed.

Thus, neither awk nor a simple sed script will do here if we want to do it properly.

Instead, the following Python script can be used:

def rsync_list(fileobj):
    import re
    # Regex to identify a line
    line_re = re.compile(r'.{10} +\d+ ..../../.. ..:..:.. (.*)\n')
    # Regex for escaping
    quoted_re = re.compile(r'\\#(\d\d\d)')
    for line in fileobj:
        match = line_re.match(line)
        assert match, repr(line) # error if not found...
        quoted_fname = match.group(1) # the filename part ...
        # ... must be unquoted:
        fname = quoted_re.sub( # Substitute the matching part...
            lambda m: chr(int(m.group(1), 8)), # ... with the result of this function ...
            quoted_fname)                      # ... while looking at this string.
        yield fname

if __name__ == '__main__':
    import sys
    for fname in rsync_list(sys.stdin):
        #import os
        #print repr(fname), os.access(fname, os.F_OK)
        #print repr(fname)
        sys.stdout.write(fname + '\0')

This outputs the list of file names separated by NUL characters, similiar to the way find -print0 and many other tools work so that even a file name containing a newline character (which is valid!) is retained correctly:

rsync . | python rsf.py | xan -0 stat -c '%i'

correctly shows the inode number of every given file.

Certainly I may have missed the one or other corner case I didn't think of, but I think that the script correctly handles the very most cases (I tested with all 255 thinkable one-byte-filenames as well as a file name starting with a space).

Solution 2

After years of work, here is my solution to this age-old problem:

DIR=`mktemp -d /tmp/rsync.XXXXXX`
rsync -nr --out-format='%n' serveripaddress::pt/dir/files/ $DIR > output.txt
rmdir $DIR

Solution 3

Further to https://stackoverflow.com/a/29522388/2858703

If your mktemp supports the --dry-run option, there's no need to actually create the temporary directory:

rsync -nr --out-format='%n' serveripaddress::pt/dir/files/ $(mktemp -d --dry-run) > output.txt

23,204

Author by

user1172282

Updated on October 18, 2020

Comments

user1172282 over 3 years
Here's an example of the command I'm using:
```
rsync --list-only --include "*2012*.xml" -exclude "*.xml" serveripaddress::pt/dir/files/ --port=111 > output.txt
```
How can I get a listing of just the file names without the extra information like permissions, timestamp, etc.?

Edit: And is it possible to output each file name on a new line?
André Keller over 11 years

well awk is probably the better fit for this, as awk understands a last field operator rsync ... | awk '{ print $NF }'
Ark-kun over 10 years

Obscure and fragile solutions like this should never be used.
glglgl over 10 years

@rbtux Good luck with a file name such as My favourite song.mp3.
glglgl over 10 years

@Ark-kun You are right; my original solution breaks with a file with a size of more than 99999999999.
Ark-kun over 10 years

@glglgl How could your cut -c 44- solution even work when rsync outputs owner names?
glglgl over 10 years

@Ark-kun Does it output them in this mode? Or are there variations between output formats? In this case, it is just not possible to pwarse the output. But my impression is that the output is always the way mentionned.
glglgl over 10 years

This manpage suggests that there are format changes starting at a given version and only if the --human-readable option is used. This applies that they intend to have the output format stable. And as there is no username given, but just file mode, size, date, time and file name, ignoring the first 4 whitespace-separated fields should do. (Well, file names starting with a whitespace could lead to trouble if we are not careful enough.)
Ark-kun over 10 years

Looks like I was wrong about usernames. rsync doesn't print them.
Ark-kun over 10 years

What a pity that cut doesn't have the "consequetive spaces" field separator option like sort and awk.
William Entriken about 9 years

Also, this may work, but I'm not sure if this is documented: rsync -nr --out-format='%n' serveripaddress::pt/dir/files/ /dev/false > output.txt and no, /dev/null wont work
William Entriken over 4 years

Thank you @bxm for the note rsync -nr --out-format='%n' serveripaddress::pt/dir/files/ $(mktemp -d --dry-run) > output.txt
Antoine Viallon over 2 years

Can't --dry-run be used here?