rsync multiple files from multiple directories in linux

5,282

Let's take a look at your command

rsync -avh --ignore-existing -e ssh -r /home/data/logs/2017-09-*  {dns,http}.*.log.gz / [email protected]:/home/pnlogs/

The collection source for this is all files (recursion is enabled with -r - but actually -a already does that) that match these four paths:

  • /home/data/logs/2017-09-*
  • dns.*.log.gz
  • http.*.log.gz
  • /

The last of these will ensure that the rsync command tries to copy everything.

I think what you're wanting to copy is all the DNS and HTTP log files inside the YYYY-MM-DD folders, keeping the directory structure:

cd /home/data/logs &&
rsync -avhR 2017-09-*/{dns,http}.*.log.gz [email protected]:/home/pnlogs/

I wouldn't use --ignore-existing unless you have a really strong reason for knowing you want to do that. (This flag prevents rsync restarting a partial transfer. Mind you, you don't have --partial so perhaps this is moot.)

You don't need -r because you're using -a and this archive mode already includes recursion. But if I've interpreted your requirement correctly there is no recursion anyway.

The -R flag tells rsync to keep the relative paths from the source trees in the destination. I've used cd to get the command into a useful starting point because otherwise you'd end up with /home/data/logs also being included in the target path. Wrap all of this in brackets ( ... ) if you want to avoid the directory change being effective for the remainder of any script in which this runs.

Share:
5,282

Related videos on Youtube

Blitzkrieg
Author by

Blitzkrieg

Updated on September 18, 2022

Comments

  • Blitzkrieg
    Blitzkrieg over 1 year

    I have multiple directories named by date (ex: 2017-09-05) and inside those directories multiple log.gz files from BRO IDS. I am trying to enter each directory, and get only specific log.gz files by name, and send those to a remote system using rsync. A log file looks like this:dns.00:00:00-01:00:00.log.gz I am attempting to use wildcards to accomplish this.

    Ex: rsync -avh --ignore-existing -e ssh -r /home/data/logs/2017-09-* {dns,http}.*.log.gz / [email protected]:/home/pnlogs/

    This is close, but its just copying all the files in each folder and ignoring my attempt at getting just http and dns logs as seen in the example. Is this possible to do in one line? Is there a better method?

    Ideally I would like to keep the files in their original directories upon transfer. For example in my original command the directories will stay on the remote system. This would be nice for organization's sake. Interestingly, that command copied HTTP and DNS logs from 2017-09-01 and 2017-09-02, but not from the 3rd, 4th, or 5th of this month.

    How could I adjust this to account for changes in month/year? This will sit in a script and I wouldn't want to have to change it every month (i.e. to 2017-10, etc) just using $date?

    Got this working with: rsync -avhR $(date +????-??-??)/{dns,http,conn}.*.log.gz [email protected]:/home/pnlogs/ Thanks for all the help.

    • Satō Katsura
      Satō Katsura over 6 years
      Make a list of files, then use option --files-from of rsync.
    • roaima
      roaima over 6 years
      Use $( ... ) for the date command interpolation. Backticks are deprecated these days. Also, why not just ????-??-?? instead of specifying a date. This way it will pick up older files that haven't been transferred, for example if the job fails to run one day.
  • Austin Hemmelgarn
    Austin Hemmelgarn over 6 years
    For the record, rsync uses a replace-by-rename model by default, so without --partial, you can resume an interrupted transfer just fine, provided you don't mind files that were already transferred but have changed on the source not being updated). From a practical perspective, --ignore-existing also makes things insanely fast for an initial transfer because it skips a lot of the expensive checks done for deciding what to transfer.
  • Blitzkrieg
    Blitzkrieg over 6 years
    Is it possible to keep the files in their originals directories upon transfer? For example in my original command the directories will stay on the remote system. This would be nice for organizations sake. Interestingly, that command copied HTTP and DNS logs from 2017-09-01 and 2017-09-02, but not from the 3rd, 4th, or 5th of this month.
  • roaima
    roaima over 6 years
    @AustinHemmelgarn the --partial flag is not default. Here's what the man page has to say on the subject: « --partial By default, rsync will delete any partially transferred file if the transfer is interrupted. »
  • Austin Hemmelgarn
    Austin Hemmelgarn over 6 years
    @roaima I never said that --partial was the default, I said that rsync uses a replace-by-rename method by default, which is controlled by --inplace, not --partial. Apologies if the wording of my comment implied that --partial was the default.
  • roaima
    roaima over 6 years
    @Blitzkrieg if it's missing files then they didn't match the template pattern (2017-09-*/{dns,http}.*.log.gz) or else they were created after the rsync was begun.
  • roaima
    roaima over 6 years
    @AustinHemmelgarn sorry, I'm now confused what you're trying to clarify (!) The --inplace option isn't a default, either. Files are usually transferred using a temporary filename, which is renamed to the target filename on completion. Unless you're suggesting --ignore-existing is a safe option. It can be a good option under careful control, but when recommending options to a relative rsync beginner it's not one I'd usually even contemplate mentioning let alone using.
  • Blitzkrieg
    Blitzkrieg over 6 years
    @roaima This should work! Thank you very much for your help :)
  • Austin Hemmelgarn
    Austin Hemmelgarn over 6 years
    @roaima That's what I meant by saying 'replace-by-rename'. It's just the common term among most of the programmers I deal with. I would argue also that --ignore-existing isn't exactly 'unsafe', but it does behave in a way many people would not intuitively expect. I don't use it often myself, but when I need to move big directory structures for the first time and don't care about metadata, I'll often use it to get rid of having to call stat() on everything on the source system, which can save a lot of transfer time.
  • roaima
    roaima over 6 years
    @AustinHemmelgarn I'm familiar with the term; I just couldn't reconcile it with my interpretation of your comments. In specific instances I would agree that it can be advantageous (as can --append, --inplace and other behavioural modifiers) but I'd still avoid it in the general case.