How can I automatically convert all source code files in a folder (recursively) to a single PDF with syntax highlighting?

16,174

Solution 1

I was intrigued by your question and got kinda carried away. This solution will generate a nice PDF file with a clickable index and color highlighted code. It will find all files in the current directory and subdirectories and create a section in the PDF file for each of them (see the notes below for how to make your find command more specific).

It requires that you have the following installed (the install instructions are for Debian-based systems but these should be available in your distribution's repositories):

  • pdflatex, color and listings

    sudo apt-get install texlive-latex-extra latex-xcolor texlive-latex-recommended
    

    This should also install a basic LaTeX system if you don't have one installed.

Once these are installed, use this script to create a LaTeX document with your source code. The trick is using the listings (part of texlive-latex-recommended) and color (installed by latex-xcolor) LaTeX packages. The \usepackage[..]{hyperref} is what makes the listings in the table of contents clickable links.

#!/usr/bin/env bash

tex_file=$(mktemp) ## Random temp file name

cat<<EOF >$tex_file   ## Print the tex file header
\documentclass{article}
\usepackage{listings}
\usepackage[usenames,dvipsnames]{color}  %% Allow color names
\lstdefinestyle{customasm}{
  belowcaptionskip=1\baselineskip,
  xleftmargin=\parindent,
  language=C++,   %% Change this to whatever you write in
  breaklines=true, %% Wrap long lines
  basicstyle=\footnotesize\ttfamily,
  commentstyle=\itshape\color{Gray},
  stringstyle=\color{Black},
  keywordstyle=\bfseries\color{OliveGreen},
  identifierstyle=\color{blue},
  xleftmargin=-8em,
}        
\usepackage[colorlinks=true,linkcolor=blue]{hyperref} 
\begin{document}
\tableofcontents

EOF

find . -type f ! -regex ".*/\..*" ! -name ".*" ! -name "*~" ! -name 'src2pdf'|
sed 's/^\..//' |                 ## Change ./foo/bar.src to foo/bar.src

while read  i; do                ## Loop through each file
    name=${i//_/\\_}             ## escape underscores
    echo "\newpage" >> $tex_file   ## start each section on a new page
    echo "\section{$i}" >> $tex_file  ## Create a section for each filename

   ## This command will include the file in the PDF
    echo "\lstinputlisting[style=customasm]{$i}" >>$tex_file
done &&
echo "\end{document}" >> $tex_file &&
pdflatex $tex_file -output-directory . && 
pdflatex $tex_file -output-directory .  ## This needs to be run twice 
                                           ## for the TOC to be generated    

Run the script in the directory that contains the source files

bash src2pdf

That will create a file called all.pdf in the current directory. I tried this with a couple of random source files I found on my system (specifically, two files from the source of vlc-2.0.0) and this is a screenshot of the first two pages of the resulting PDF:

enter image description here


A couple of comments:

  • The script will not work if your source code file names contain spaces. Since we are talking about source code, I will assume they don't.
  • I added ! -name "*~" to avoid backup files.
  • I recommend you use a more specific find command to find your files though, otherwise any random file will be included in the PDF. If your files all have specific extensions (.c and .h for example), you should replace the find in the script with something like this

    find . -name "*\.c" -o -name "\.h" | sed 's/^\..//' | 
    
  • Play around with the listings options, you can tweak this to be exactly as you want it.

Solution 2

I know I'm waaaay too late, but somebody looking for a solution might find this useful.

Based on @terdon's answer, I have created a BASH script that does the job: https://github.com/eljuanchosf/source-code-to-pdf

Solution 3

(from StackOverflow)

for i in *.src; do echo "$i"; echo "---"; cat "$i"; echo ; done > result.txt

This will result a result.txt containing:

  • Filename
  • separator (---)
  • Content of .src file
  • Repeat from the top until all *.src files are done

If your source code have different extension, just change as needed. You can also edit the echo bit to add necessary information (maybe echo "filename $1" or change the separator, or add an end-of-file separator).

the link have other methods, so use whatever method you like best. I find this one to be most flexible, although it does come with a slight learning curve.

The code will run perfectly from a bash terminal (just tested on a VirtualBox Ubuntu)

If you don't care about filename and just care about content of files merged together:

cat *.src > result.txt

will work perfectly fine.

Another method suggested was:

grep "" *.src > result.txt

Which will prefix every single line with the filename, which can be good for some people, personally I find it too much information, hence why my first suggestion is the for loop above.

Credit to those in the StackOverflow forum people.

EDIT: I just realized that you are after specifically HTML or PDF as the end result, some solutions I've seen is to print the text file into PostScript and then convert postscript to PDF. Some code I've seen:

groff -Tps result.txt > res.ps

then

ps2pdf res.ps res.pdf 

(Requires you to have ghostscript)

Hope this helps.

Share:
16,174

Related videos on Youtube

Bentley4
Author by

Bentley4

Profile with mostly old questions where I learned about programming. Thank you SO!

Updated on September 18, 2022

Comments

  • Bentley4
    Bentley4 over 1 year

    I would like to convert source code of a few projects to one printable file to save on a usb and print out easily later. How can I do that?

    Edit

    First off I want to clarify that I only want to print the non-hidden files and directories(so no contents of .git e.g.).

    To get a list of all non-hidden files in non-hidden directories in the current directory you can run the find . -type f ! -regex ".*/\..*" ! -name ".*" command as seen as the answer in this thread.

    As suggested in that same thread I tried making a pdf file of the files by using the command find . -type f ! -regex ".*/\..*" ! -name ".*" ! -empty -print0 | xargs -0 a2ps -1 --delegate no -P pdf but unfortunately the resulting pdf file is a complete mess.

    • mpy
      mpy almost 11 years
      Don't know if it fits your need, but with a2ps -P file *.src you can produce postscript files out of your source code. But the PS files need to be converted and combined afterwards.
    • SBI
      SBI almost 11 years
      Using convert (linux.about.com/od/commands/l/blcmdl1_convert.htm, imagemagick) you should then be able to make one pdf from the ps files.
    • mpy
      mpy almost 11 years
      Can you comment, what you mean with "complete mess"? This (i.stack.imgur.com/LoRhv.png) looks not too bad to me, using a2ps -1 --delegate=0 -l 100 --line-numbers=5 -P pdf -- I added -l for 100 chars per row to prevent some word wraps and line numbers, but that's only personal preference.
    • Bentley4
      Bentley4 almost 11 years
      For converting this project(4 non-empty non-hidden files each about a page long in non-hidden directories) to pdf I had about 5 pages of source code and 39 pages of gibberish.
  • Bentley4
    Bentley4 almost 11 years
    This only works for files of a specific extension(.src) but I want every file to be put in that pdf regardless of extension. I do would like to omit non-hidden dirs and non-hidden files though. I edited the original post, could you take a look at it?
  • mpy
    mpy almost 11 years
    Wow, that's what I call an answer! :)
  • Bentley4
    Bentley4 almost 11 years
    OMG terdon, you owned that question ^^. To other people trying the script: if you run into src2pdf: line 36: warning: here-document at line 5 delimited by end-of-file (wanted EOF') when running the script you have to delete the whitespace on the EOF line in order for it to work.
  • Bentley4
    Bentley4 almost 11 years
    The script src2pdf also adds the contents of the script to the pdf, any idea how I can omit that?
  • Bentley4
    Bentley4 almost 11 years
    If your file is called src2pdf then insert ! -name "src2pdf" in the find line in the script like this find . -type f ! -regex ".*/\..*" ! -name "src2pdf" ! -name ".*" ! -name "*~" | to omit it in the pdf.
  • terdon
    terdon almost 11 years
    @Bentley4 thanks! I removed the whitespace (it was added whern I pasted the script into the answer) and added the filter to remove the script itself from the find results (I had saved the script in another directory that was in my $PATH so I did not have that problem). Also, you can change the language used for the source files to have better markup by changing language=C++ to whatever you want, it can deal with many different languages, see here.
  • Bentley4
    Bentley4 almost 11 years
    Wow, nice. Didn't notice that before, thanks for the reminder.
  • qubodup
    qubodup about 9 years
    Awesome! But how would I handle UTF8? Adding \usepackage[utf8]{inputenc} to the header doesn't make it possible to process a Lua 5.2 file with German language (äöüÄÖÜß) comments in UTF-8 Unicode text, with CRLF line terminators file.
  • terdon
    terdon about 9 years
    @qubodup I don't really know. LaTeX and UTF8 can be tricky. It should work with \usepackage[utf8]{inputenc} \usepackage[german]{babel}` but it fails on my tests. However, I suspect I'm not feeding it true utf8. That might be worth its own question but I suggest you ask on TeX - LaTeX, they should know.
  • qubodup
    qubodup about 9 years
    @terdon Thank you for checking! I was lucky enough to find a solution thanks to multiple questions and answers (links inside), I'm now using codepad.org/7iLVB8Se which gives me pages that look like i.imgur.com/lGaYcAi.png
  • DavidPostill
    DavidPostill almost 8 years
    Please quote the essential parts of the answer from the reference link(s), as the answer can become invalid if the linked page(s) change.
  • Matt Liberty
    Matt Liberty about 6 years
    Underscores in file names throw off the PDF since underscore means subscript to Latex. I modified the \section{$i} to use sanitized input: while read i; do ## Loop through each file name=${i//_/\\_} echo "\newpage" >> $tex_file ## start each section on a new page echo "\section{$name}" >> $tex_file ## Create a section for each file
  • terdon
    terdon about 6 years
    @MattLiberty oh, good catch! I edited that in, thanks!
  • geisterfurz007
    geisterfurz007 over 5 years
    Hey there! I just noticed that there is an article with stuff that looks pretty similar to this if you want to investigate: samhobbs.co.uk/2017/01/….
  • terdon
    terdon over 5 years
    @geisterfurz007 thanks for the heads up, but the author of the article links back to this post in their code, so everything's fine. Nice when the system works :)
  • geisterfurz007
    geisterfurz007 over 5 years
    Ah! I didn't check the code :D Cheers for checking and the response!
  • Jabba
    Jabba over 4 years
    Thanks a lot! The line echo "\section{$i}" >> $tex_file should be echo "\section{$name}" >> $tex_file.
  • Liam
    Liam almost 4 years
    I have a lot of binary files mixed in my tree, so I found modifying this SO answer (see my comment) with ! -exec grep -Il . "{}" \; at the end to be very helpful