rsync multiple files from multiple directories in linux
Let's take a look at your command
rsync -avh --ignore-existing -e ssh -r /home/data/logs/2017-09-* {dns,http}.*.log.gz / [email protected]:/home/pnlogs/
The collection source for this is all files (recursion is enabled with -r
- but actually -a
already does that) that match these four paths:
/home/data/logs/2017-09-*
dns.*.log.gz
http.*.log.gz
/
The last of these will ensure that the rsync
command tries to copy everything.
I think what you're wanting to copy is all the DNS and HTTP log files inside the YYYY-MM-DD folders, keeping the directory structure:
cd /home/data/logs &&
rsync -avhR 2017-09-*/{dns,http}.*.log.gz [email protected]:/home/pnlogs/
I wouldn't use --ignore-existing
unless you have a really strong reason for knowing you want to do that. (This flag prevents rsync
restarting a partial transfer. Mind you, you don't have --partial
so perhaps this is moot.)
You don't need -r
because you're using -a
and this archive mode already includes recursion. But if I've interpreted your requirement correctly there is no recursion anyway.
The -R
flag tells rsync
to keep the relative paths from the source trees in the destination. I've used cd
to get the command into a useful starting point because otherwise you'd end up with /home/data/logs
also being included in the target path. Wrap all of this in brackets ( ... )
if you want to avoid the directory change being effective for the remainder of any script in which this runs.
Related videos on Youtube
Blitzkrieg
Updated on September 18, 2022Comments
-
Blitzkrieg over 1 year
I have multiple directories named by date (ex: 2017-09-05) and inside those directories multiple log.gz files from BRO IDS. I am trying to enter each directory, and get only specific log.gz files by name, and send those to a remote system using rsync. A log file looks like this:
dns.00:00:00-01:00:00.log.gz
I am attempting to use wildcards to accomplish this.Ex:
rsync -avh --ignore-existing -e ssh -r /home/data/logs/2017-09-* {dns,http}.*.log.gz / [email protected]:/home/pnlogs/
This is close, but its just copying all the files in each folder and ignoring my attempt at getting just http and dns logs as seen in the example. Is this possible to do in one line? Is there a better method?
Ideally I would like to keep the files in their original directories upon transfer. For example in my original command the directories will stay on the remote system. This would be nice for organization's sake. Interestingly, that command copied HTTP and DNS logs from 2017-09-01 and 2017-09-02, but not from the 3rd, 4th, or 5th of this month.
How could I adjust this to account for changes in month/year? This will sit in a script and I wouldn't want to have to change it every month (i.e. to 2017-10, etc) just using $date?
Got this working with:
rsync -avhR $(date +????-??-??)/{dns,http,conn}.*.log.gz [email protected]:/home/pnlogs/
Thanks for all the help.-
Satō Katsura over 6 yearsMake a list of files, then use option
--files-from
ofrsync
. -
roaima over 6 yearsUse
$( ... )
for the date command interpolation. Backticks are deprecated these days. Also, why not just????-??-??
instead of specifying a date. This way it will pick up older files that haven't been transferred, for example if the job fails to run one day.
-
-
Austin Hemmelgarn over 6 yearsFor the record, rsync uses a replace-by-rename model by default, so without
--partial
, you can resume an interrupted transfer just fine, provided you don't mind files that were already transferred but have changed on the source not being updated). From a practical perspective,--ignore-existing
also makes things insanely fast for an initial transfer because it skips a lot of the expensive checks done for deciding what to transfer. -
Blitzkrieg over 6 yearsIs it possible to keep the files in their originals directories upon transfer? For example in my original command the directories will stay on the remote system. This would be nice for organizations sake. Interestingly, that command copied HTTP and DNS logs from 2017-09-01 and 2017-09-02, but not from the 3rd, 4th, or 5th of this month.
-
roaima over 6 years@AustinHemmelgarn the
--partial
flag is not default. Here's what the man page has to say on the subject: «--partial
By default,rsync
will delete any partially transferred file if the transfer is interrupted. » -
Austin Hemmelgarn over 6 years@roaima I never said that
--partial
was the default, I said that rsync uses a replace-by-rename method by default, which is controlled by--inplace
, not--partial
. Apologies if the wording of my comment implied that--partial
was the default. -
roaima over 6 years@Blitzkrieg if it's missing files then they didn't match the template pattern (
2017-09-*/{dns,http}.*.log.gz
) or else they were created after thersync
was begun. -
roaima over 6 years@AustinHemmelgarn sorry, I'm now confused what you're trying to clarify (!) The
--inplace
option isn't a default, either. Files are usually transferred using a temporary filename, which is renamed to the target filename on completion. Unless you're suggesting--ignore-existing
is a safe option. It can be a good option under careful control, but when recommending options to a relativersync
beginner it's not one I'd usually even contemplate mentioning let alone using. -
Blitzkrieg over 6 years@roaima This should work! Thank you very much for your help :)
-
Austin Hemmelgarn over 6 years@roaima That's what I meant by saying 'replace-by-rename'. It's just the common term among most of the programmers I deal with. I would argue also that --ignore-existing isn't exactly 'unsafe', but it does behave in a way many people would not intuitively expect. I don't use it often myself, but when I need to move big directory structures for the first time and don't care about metadata, I'll often use it to get rid of having to call stat() on everything on the source system, which can save a lot of transfer time.
-
roaima over 6 years@AustinHemmelgarn I'm familiar with the term; I just couldn't reconcile it with my interpretation of your comments. In specific instances I would agree that it can be advantageous (as can
--append
,--inplace
and other behavioural modifiers) but I'd still avoid it in the general case.