How can I automatically convert all source code files in a folder (recursively) to a single PDF with syntax highlighting?

linux pdf printing html

16,174

Solution 1

I was intrigued by your question and got kinda carried away. This solution will generate a nice PDF file with a clickable index and color highlighted code. It will find all files in the current directory and subdirectories and create a section in the PDF file for each of them (see the notes below for how to make your find command more specific).

It requires that you have the following installed (the install instructions are for Debian-based systems but these should be available in your distribution's repositories):

pdflatex, color and listings
```
sudo apt-get install texlive-latex-extra latex-xcolor texlive-latex-recommended
```
This should also install a basic LaTeX system if you don't have one installed.

Once these are installed, use this script to create a LaTeX document with your source code. The trick is using the listings (part of texlive-latex-recommended) and color (installed by latex-xcolor) LaTeX packages. The \usepackage[..]{hyperref} is what makes the listings in the table of contents clickable links.

#!/usr/bin/env bash

tex_file=$(mktemp) ## Random temp file name

cat<<EOF >$tex_file   ## Print the tex file header
\documentclass{article}
\usepackage{listings}
\usepackage[usenames,dvipsnames]{color}  %% Allow color names
\lstdefinestyle{customasm}{
  belowcaptionskip=1\baselineskip,
  xleftmargin=\parindent,
  language=C++,   %% Change this to whatever you write in
  breaklines=true, %% Wrap long lines
  basicstyle=\footnotesize\ttfamily,
  commentstyle=\itshape\color{Gray},
  stringstyle=\color{Black},
  keywordstyle=\bfseries\color{OliveGreen},
  identifierstyle=\color{blue},
  xleftmargin=-8em,
}        
\usepackage[colorlinks=true,linkcolor=blue]{hyperref} 
\begin{document}
\tableofcontents

EOF

find . -type f ! -regex ".*/\..*" ! -name ".*" ! -name "*~" ! -name 'src2pdf'|
sed 's/^\..//' |                 ## Change ./foo/bar.src to foo/bar.src

while read  i; do                ## Loop through each file
    name=${i//_/\\_}             ## escape underscores
    echo "\newpage" >> $tex_file   ## start each section on a new page
    echo "\section{$i}" >> $tex_file  ## Create a section for each filename

   ## This command will include the file in the PDF
    echo "\lstinputlisting[style=customasm]{$i}" >>$tex_file
done &&
echo "\end{document}" >> $tex_file &&
pdflatex $tex_file -output-directory . && 
pdflatex $tex_file -output-directory .  ## This needs to be run twice 
                                           ## for the TOC to be generated

Run the script in the directory that contains the source files

bash src2pdf

That will create a file called all.pdf in the current directory. I tried this with a couple of random source files I found on my system (specifically, two files from the source of vlc-2.0.0) and this is a screenshot of the first two pages of the resulting PDF:

enter image description here

A couple of comments:

The script will not work if your source code file names contain spaces. Since we are talking about source code, I will assume they don't.
I added ! -name "*~" to avoid backup files.
I recommend you use a more specific find command to find your files though, otherwise any random file will be included in the PDF. If your files all have specific extensions (.c and .h for example), you should replace the find in the script with something like this
```
find . -name "*\.c" -o -name "\.h" | sed 's/^\..//' | 
```
Play around with the listings options, you can tweak this to be exactly as you want it.

Solution 2

I know I'm waaaay too late, but somebody looking for a solution might find this useful.

Based on @terdon's answer, I have created a BASH script that does the job: https://github.com/eljuanchosf/source-code-to-pdf

Solution 3

(from StackOverflow)

for i in *.src; do echo "$i"; echo "---"; cat "$i"; echo ; done > result.txt

This will result a result.txt containing:

Filename
separator (---)
Content of .src file
Repeat from the top until all *.src files are done

If your source code have different extension, just change as needed. You can also edit the echo bit to add necessary information (maybe echo "filename $1" or change the separator, or add an end-of-file separator).

the link have other methods, so use whatever method you like best. I find this one to be most flexible, although it does come with a slight learning curve.

The code will run perfectly from a bash terminal (just tested on a VirtualBox Ubuntu)

If you don't care about filename and just care about content of files merged together:

cat *.src > result.txt

will work perfectly fine.

Another method suggested was:

grep "" *.src > result.txt

Which will prefix every single line with the filename, which can be good for some people, personally I find it too much information, hence why my first suggestion is the for loop above.

Credit to those in the StackOverflow forum people.

EDIT: I just realized that you are after specifically HTML or PDF as the end result, some solutions I've seen is to print the text file into PostScript and then convert postscript to PDF. Some code I've seen:

groff -Tps result.txt > res.ps

then

ps2pdf res.ps res.pdf

(Requires you to have ghostscript)

Hope this helps.

16,174

Bentley4

Profile with mostly old questions where I learned about programming. Thank you SO!

Updated on September 18, 2022

Comments

Bentley4 over 1 year

I would like to convert source code of a few projects to one printable file to save on a usb and print out easily later. How can I do that?

Edit

First off I want to clarify that I only want to print the non-hidden files and directories(so no contents of .git e.g.).

To get a list of all non-hidden files in non-hidden directories in the current directory you can run the find . -type f ! -regex ".*/\..*" ! -name ".*" command as seen as the answer in this thread.

As suggested in that same thread I tried making a pdf file of the files by using the command find . -type f ! -regex ".*/\..*" ! -name ".*" ! -empty -print0 | xargs -0 a2ps -1 --delegate no -P pdf but unfortunately the resulting pdf file is a complete mess.
- mpy almost 11 years
  
  Don't know if it fits your need, but with a2ps -P file *.src you can produce postscript files out of your source code. But the PS files need to be converted and combined afterwards.
- SBI almost 11 years
  
  Using convert (linux.about.com/od/commands/l/blcmdl1_convert.htm, imagemagick) you should then be able to make one pdf from the ps files.
- mpy almost 11 years
  
  Can you comment, what you mean with "complete mess"? This (i.stack.imgur.com/LoRhv.png) looks not too bad to me, using a2ps -1 --delegate=0 -l 100 --line-numbers=5 -P pdf -- I added -l for 100 chars per row to prevent some word wraps and line numbers, but that's only personal preference.
- Bentley4 almost 11 years
  
  For converting this project(4 non-empty non-hidden files each about a page long in non-hidden directories) to pdf I had about 5 pages of source code and 39 pages of gibberish.
Bentley4 almost 11 years

This only works for files of a specific extension(.src) but I want every file to be put in that pdf regardless of extension. I do would like to omit non-hidden dirs and non-hidden files though. I edited the original post, could you take a look at it?
mpy almost 11 years

Wow, that's what I call an answer! :)
Bentley4 almost 11 years

OMG terdon, you owned that question ^^. To other people trying the script: if you run into src2pdf: line 36: warning: here-document at line 5 delimited by end-of-file (wanted EOF') when running the script you have to delete the whitespace on the EOF line in order for it to work.
Bentley4 almost 11 years

The script src2pdf also adds the contents of the script to the pdf, any idea how I can omit that?
Bentley4 almost 11 years

If your file is called src2pdf then insert ! -name "src2pdf" in the find line in the script like this find . -type f ! -regex ".*/\..*" ! -name "src2pdf" ! -name ".*" ! -name "*~" | to omit it in the pdf.
terdon almost 11 years

@Bentley4 thanks! I removed the whitespace (it was added whern I pasted the script into the answer) and added the filter to remove the script itself from the find results (I had saved the script in another directory that was in my $PATH so I did not have that problem). Also, you can change the language used for the source files to have better markup by changing language=C++ to whatever you want, it can deal with many different languages, see here.
Bentley4 almost 11 years

Wow, nice. Didn't notice that before, thanks for the reminder.
qubodup about 9 years

Awesome! But how would I handle UTF8? Adding \usepackage[utf8]{inputenc} to the header doesn't make it possible to process a Lua 5.2 file with German language (äöüÄÖÜß) comments in UTF-8 Unicode text, with CRLF line terminators file.
terdon about 9 years

@qubodup I don't really know. LaTeX and UTF8 can be tricky. It should work with \usepackage[utf8]{inputenc} \usepackage[german]{babel}` but it fails on my tests. However, I suspect I'm not feeding it true utf8. That might be worth its own question but I suggest you ask on TeX - LaTeX, they should know.
qubodup about 9 years

@terdon Thank you for checking! I was lucky enough to find a solution thanks to multiple questions and answers (links inside), I'm now using codepad.org/7iLVB8Se which gives me pages that look like i.imgur.com/lGaYcAi.png
DavidPostill almost 8 years

Please quote the essential parts of the answer from the reference link(s), as the answer can become invalid if the linked page(s) change.
Matt Liberty about 6 years

Underscores in file names throw off the PDF since underscore means subscript to Latex. I modified the \section{$i} to use sanitized input: while read i; do ## Loop through each file name=${i//_/\\_} echo "\newpage" >> $tex_file ## start each section on a new page echo "\section{$name}" >> $tex_file ## Create a section for each file
terdon about 6 years

@MattLiberty oh, good catch! I edited that in, thanks!
geisterfurz007 over 5 years

Hey there! I just noticed that there is an article with stuff that looks pretty similar to this if you want to investigate: samhobbs.co.uk/2017/01/….
terdon over 5 years

@geisterfurz007 thanks for the heads up, but the author of the article links back to this post in their code, so everything's fine. Nice when the system works :)
geisterfurz007 over 5 years

Ah! I didn't check the code :D Cheers for checking and the response!
Jabba over 4 years

Thanks a lot! The line echo "\section{$i}" >> $tex_file should be echo "\section{$name}" >> $tex_file.
Liam almost 4 years

I have a lot of binary files mixed in my tree, so I found modifying this SO answer (see my comment) with ! -exec grep -Il . "{}" \; at the end to be very helpful