How do I concatenate all the files in a given directory in order of date, where I want the newest file on top?
Solution 1
To concatenate files you use
cat file1 file2 file3 ...
To get a list of quoted filenames sorted by time, newest first, you use
ls -t
Putting it all together,
cat $(ls -t) > outputfile
You might want to give some arguments to ls
(eg, *.html
).
But if you have filenames with spaces in them, this will not work. My file.html
will be assumed to be two filenames: My
and file.html
. You can make ls
quote the filenames, and then use xargs
, who understands the quoting, to pass the arguments to cat
.
ls -tQ | xargs cat
As for your second question, filtering out parts of files isn't difficult, but it depends on what exactly you want to strip out. What are the “redundant headers”?
Solution 2
The easiest way of listing files in an order other than lexicographic is with zsh glob qualifiers. Without zsh, you can use ls
, but parsing the output of ls
is fraught with dangers.
cat *(om)
If you want to strip some lines, use sed or awk or perl. For example, to take the <head>
from the first file and combine the <body>
parts from the other files, assuming that the <body>
and </body>
tags are alone on a line in every file:
{
sed -e '/<\/body>/ q' *.html(om[2])
sed -e '1,/<body>/ d' -e '/<\/body>/,$ d' *.html(om[3,-1])
echo '</body>'
echo '</html>'
} >concatenated.html
Explanation:
- First,
concatenated.html
is created. It is therefore the youngest*.html
file (assuming no file has a date in the future. - Then copy from the second-youngest
*.html
file, but quit at the</body>
line. - Then copy from the other files, but skip everything down to the
<body>
line and starting with the</body>
line. - Finally produce the last closing tags.
Solution 3
Solution given by @angus is good but will have issues if there are directories in the folder this will fix it.
cat $(ls -tpa | grep -v / )
Related videos on Youtube
InquilineKea
Quora page: http://www.quora.com/Alex-K-Chen Google+: http://profiles.google.com/simfish Email: simfish+s[at]gmail.com Feel free to stalk my Internet name(s) if you wish - it's how I make most of my closest contacts these days. Just note that I don't really believe in social constructs when they get in the way of satisficing (or in the way of my goal of learning as much as possible). Don't take anything I say/do too seriously. A lot of the things I do (that may look weird/stupid on the outside) are the types of things that help me adjust my posterior probability of various things - especially low probability events.Sometimes I hit on a jackpot/very interesting idea. I'm kind of messy since it helps facilitate creative destruction. I have a lot of respect for all of the sciences. Tactically, I'm developing heuristics for rationality, impartiality, anti-laziness, and creation+identification+searching for what's relevant+reliable. Strategically, I just want to learn everything. A lot of my thought processes involve my creating new hypotheses and refuting them on my own. I still document the thought process since it's important and may be important for future "true" hypotheses.
Updated on September 18, 2022Comments
-
InquilineKea almost 2 years
And with the oldest file on bottom?
Also, if I do this, is it also possible to strip out the redundant headers contained within each HTML file? I'm seeing myself concatenate a lot of HTML files up, and it would be nice to reduce the file size of the ultimate file a bit.
-
Mike Pennington about 12 yearsThis isn't working on my debian system... I have to use
cat $(ls -t) > outputfile
, otherwisecat
rejects the quoted file names -
angus about 12 yearsMy mistake. I always get caught on these things. See updated answer.
-
InquilineKea about 12 yearsOh - by redundant headers I mean things that are normally put in some header.php/footer.php file, but which are saved separately when saved to HTML (and can really increase the file size when you mass-download PHP pages).
-
Barefoot IO over 8 years
cat $(ls -t)
is also vulnerable to filename expansion. If there's a filename with an*
, or?
, or a bracket expression (e.g.file-[old].html
); and if the filename interpreted as a pattern matches other filenames; the approach will produce an incorrect list.set -f
would address this deficiency. -
Barefoot IO over 8 yearsCaveat: This answer is also vulnerable to pathname expansion, as explained in my comment to angus' answer.
-
Barefoot IO over 8 years
ls -Q
may produce output which is not suitable forxargs
. For example,"foo"
becomes"\"foo\""
, but xargs does not understand escaped double quotes within double quoted strings. -
Barefoot IO over 8 yearsUnless cat's exit status is tested, a directory argument should be inconsequential. cat will simply emit a message to stderr and proceed to the next argument.