rsync to get a list of only file names
Solution 1
Hoping the question will be moved to the appropriate site, I'll answer here nevertheless.
You could append a pipe with awk
:
rsync ... | awk '{ $1=$2=$3=$4=""; print substr($0,5); }' >output.txt
This eliminates all the unwanted information by outputting everything from the 5th field, but works only if none of the first four fields in the output format gets an additional whitespace somewhere (which is unlikely).
This awk
solution won't work if there are file names starting with whitespace.
An even more robust way to solve could be a rather complex program which as well makes assumptions.
It works this way: For each line,
- Cut off the first 10 bytes. Verify that they are followed by a number of spaces. Cut them off as well.
- Cut off all following digits. Verify that they are followed by one space. Cut that off as well.
- Cut off the next 19 bytes. Verify that they contain a date and a time stamp in the appropriate format. (I don't know why the date's components are separated with
/
instead of-
- it is not compliant with ISO 8601.) - Verify that now one space follows. Cut that off as well. Leave any following whitespace characters intact, as they belong to the file name.
- If the test has passed all these verifications, it is likely that the remainder of that line will contain the file name.
It gets even worse: for very esoteric corner cases, there are even more things to watch: File names can be escaped. Certain unprintable bytes are replaced by an escape sequence (#ooo
with ooo
being their octal code), a process which must be reversed.
Thus, neither awk
nor a simple sed
script will do here if we want to do it properly.
Instead, the following Python script can be used:
def rsync_list(fileobj):
import re
# Regex to identify a line
line_re = re.compile(r'.{10} +\d+ ..../../.. ..:..:.. (.*)\n')
# Regex for escaping
quoted_re = re.compile(r'\\#(\d\d\d)')
for line in fileobj:
match = line_re.match(line)
assert match, repr(line) # error if not found...
quoted_fname = match.group(1) # the filename part ...
# ... must be unquoted:
fname = quoted_re.sub( # Substitute the matching part...
lambda m: chr(int(m.group(1), 8)), # ... with the result of this function ...
quoted_fname) # ... while looking at this string.
yield fname
if __name__ == '__main__':
import sys
for fname in rsync_list(sys.stdin):
#import os
#print repr(fname), os.access(fname, os.F_OK)
#print repr(fname)
sys.stdout.write(fname + '\0')
This outputs the list of file names separated by NUL characters, similiar to the way find -print0
and many other tools work so that even a file name containing a newline character (which is valid!) is retained correctly:
rsync . | python rsf.py | xan -0 stat -c '%i'
correctly shows the inode number of every given file.
Certainly I may have missed the one or other corner case I didn't think of, but I think that the script correctly handles the very most cases (I tested with all 255 thinkable one-byte-filenames as well as a file name starting with a space).
Solution 2
After years of work, here is my solution to this age-old problem:
DIR=`mktemp -d /tmp/rsync.XXXXXX`
rsync -nr --out-format='%n' serveripaddress::pt/dir/files/ $DIR > output.txt
rmdir $DIR
Solution 3
Further to https://stackoverflow.com/a/29522388/2858703
If your mktemp
supports the --dry-run
option, there's no need to actually create the temporary directory:
rsync -nr --out-format='%n' serveripaddress::pt/dir/files/ $(mktemp -d --dry-run) > output.txt
user1172282
Updated on October 18, 2020Comments
-
user1172282 over 3 years
Here's an example of the command I'm using:
rsync --list-only --include "*2012*.xml" -exclude "*.xml" serveripaddress::pt/dir/files/ --port=111 > output.txt
How can I get a listing of just the file names without the extra information like permissions, timestamp, etc.?
Edit: And is it possible to output each file name on a new line?
-
André Keller over 11 yearswell awk is probably the better fit for this, as awk understands a last field operator
rsync ... | awk '{ print $NF }'
-
Ark-kun over 10 yearsObscure and fragile solutions like this should never be used.
-
glglgl over 10 years@rbtux Good luck with a file name such as
My favourite song.mp3
. -
glglgl over 10 years@Ark-kun You are right; my original solution breaks with a file with a size of more than 99999999999.
-
Ark-kun over 10 years@glglgl How could your
cut -c 44-
solution even work when rsync outputs owner names? -
glglgl over 10 years@Ark-kun Does it output them in this mode? Or are there variations between output formats? In this case, it is just not possible to pwarse the output. But my impression is that the output is always the way mentionned.
-
glglgl over 10 yearsThis manpage suggests that there are format changes starting at a given version and only if the
--human-readable
option is used. This applies that they intend to have the output format stable. And as there is no username given, but just file mode, size, date, time and file name, ignoring the first 4 whitespace-separated fields should do. (Well, file names starting with a whitespace could lead to trouble if we are not careful enough.) -
Ark-kun over 10 yearsLooks like I was wrong about usernames. rsync doesn't print them.
-
Ark-kun over 10 yearsWhat a pity that
cut
doesn't have the "consequetive spaces" field separator option like sort and awk. -
William Entriken about 9 yearsAlso, this may work, but I'm not sure if this is documented:
rsync -nr --out-format='%n' serveripaddress::pt/dir/files/ /dev/false > output.txt
and no,/dev/null
wont work -
William Entriken over 4 yearsThank you @bxm for the note
rsync -nr --out-format='%n' serveripaddress::pt/dir/files/ $(mktemp -d --dry-run) > output.txt
-
Antoine Viallon over 2 yearsCan't
--dry-run
be used here?