How do I concatenate all the files in a given directory in order of date, where I want the newest file on top?

51,674

Solution 1

To concatenate files you use

cat file1 file2 file3 ...

To get a list of quoted filenames sorted by time, newest first, you use

ls -t

Putting it all together,

cat $(ls -t) > outputfile

You might want to give some arguments to ls (eg, *.html).

But if you have filenames with spaces in them, this will not work. My file.html will be assumed to be two filenames: My and file.html. You can make ls quote the filenames, and then use xargs, who understands the quoting, to pass the arguments to cat.

ls -tQ | xargs cat

As for your second question, filtering out parts of files isn't difficult, but it depends on what exactly you want to strip out. What are the “redundant headers”?

Solution 2

The easiest way of listing files in an order other than lexicographic is with zsh glob qualifiers. Without zsh, you can use ls, but parsing the output of ls is fraught with dangers.

cat *(om)

If you want to strip some lines, use sed or awk or perl. For example, to take the <head> from the first file and combine the <body> parts from the other files, assuming that the <body> and </body> tags are alone on a line in every file:

{
  sed -e '/<\/body>/ q' *.html(om[2])
  sed -e '1,/<body>/ d' -e '/<\/body>/,$ d' *.html(om[3,-1])
  echo '</body>'
  echo '</html>'
} >concatenated.html

Explanation:

  • First, concatenated.html is created. It is therefore the youngest *.html file (assuming no file has a date in the future.
  • Then copy from the second-youngest *.html file, but quit at the </body> line.
  • Then copy from the other files, but skip everything down to the <body> line and starting with the </body> line.
  • Finally produce the last closing tags.

Solution 3

Solution given by @angus is good but will have issues if there are directories in the folder this will fix it.

cat $(ls -tpa | grep -v / )

Share:
51,674

Related videos on Youtube

InquilineKea
Author by

InquilineKea

Quora page: http://www.quora.com/Alex-K-Chen Google+: http://profiles.google.com/simfish Email: simfish+s[at]gmail.com Feel free to stalk my Internet name(s) if you wish - it's how I make most of my closest contacts these days. Just note that I don't really believe in social constructs when they get in the way of satisficing (or in the way of my goal of learning as much as possible). Don't take anything I say/do too seriously. A lot of the things I do (that may look weird/stupid on the outside) are the types of things that help me adjust my posterior probability of various things - especially low probability events.Sometimes I hit on a jackpot/very interesting idea. I'm kind of messy since it helps facilitate creative destruction. I have a lot of respect for all of the sciences. Tactically, I'm developing heuristics for rationality, impartiality, anti-laziness, and creation+identification+searching for what's relevant+reliable. Strategically, I just want to learn everything. A lot of my thought processes involve my creating new hypotheses and refuting them on my own. I still document the thought process since it's important and may be important for future "true" hypotheses.

Updated on September 18, 2022

Comments

  • InquilineKea
    InquilineKea almost 2 years

    And with the oldest file on bottom?

    Also, if I do this, is it also possible to strip out the redundant headers contained within each HTML file? I'm seeing myself concatenate a lot of HTML files up, and it would be nice to reduce the file size of the ultimate file a bit.

  • Mike Pennington
    Mike Pennington about 12 years
    This isn't working on my debian system... I have to use cat $(ls -t) > outputfile, otherwise cat rejects the quoted file names
  • angus
    angus about 12 years
    My mistake. I always get caught on these things. See updated answer.
  • InquilineKea
    InquilineKea about 12 years
    Oh - by redundant headers I mean things that are normally put in some header.php/footer.php file, but which are saved separately when saved to HTML (and can really increase the file size when you mass-download PHP pages).
  • Barefoot IO
    Barefoot IO over 8 years
    cat $(ls -t) is also vulnerable to filename expansion. If there's a filename with an *, or ?, or a bracket expression (e.g. file-[old].html); and if the filename interpreted as a pattern matches other filenames; the approach will produce an incorrect list. set -f would address this deficiency.
  • Barefoot IO
    Barefoot IO over 8 years
    Caveat: This answer is also vulnerable to pathname expansion, as explained in my comment to angus' answer.
  • Barefoot IO
    Barefoot IO over 8 years
    ls -Q may produce output which is not suitable for xargs. For example, "foo" becomes "\"foo\"", but xargs does not understand escaped double quotes within double quoted strings.
  • Barefoot IO
    Barefoot IO over 8 years
    Unless cat's exit status is tested, a directory argument should be inconsequential. cat will simply emit a message to stderr and proceed to the next argument.