Starting off a simple (the simplest perhaps) C compiler?

28,016

Solution 1

A compiler consists of three pieces:

  1. A parser
  2. An abstract syntax tree (AST)
  3. An assembly code generator

There are lots of nice parser generators that start with language grammars. Maybe ANTLR would be a good place for you to start. If you want to stick to C roots, try lex/yacc or bison.

There are grammars for C, but I think C in its entirety is complex. You'd do well to start off with a subset of the language and work your way up.

Once you have an AST, you use it to generate the machine code that you'll run.

It's doable, but not trivial.

I'd also check Amazon for books about writing compilers. The Dragon Book is the classic, but there are more modern ones available.

UPDATE: There have been similar questions on Stack overflow, like this one. Check out those resources as well.

Solution 2

I advise you this tutorial:

It is a small example on how to implement a "small language" compiler. The source code is very small and is explained step by step.

There is also the C front end library for the LLVM (Low Level Virtual Machine which represent the internal structure of a program) library:

Solution 3

For what it's worth, the Tiny C Compiler is a pretty full-featured C compiler in a relatively small source package. You might benefit from studying that source, as it's probably significantly easier to understand than trying to comprehend all of GCC's source base, for instance.

Solution 4

This is my opinion (and conjecture) it will be hard to write a compiler without understanding data structures normally covered in undergraduate (post secondary) Computer Science classes. This doesn't mean you cannot, but you will need to know essential data structures such as linked lists, and trees.

Rather than writing a full or standards compliant C language compiler (at least in the start), I would suggest limiting yourself to a basic subset of the language, such as common operators, integer only support, and basic functions and pointers. One classic example of this was Ron Cain's Small-C, made popular by a series of articles written in Dr. Dobbs Journal in I believe the 1980s. They publish a CD with the James Hendrix's out-of-print book, A Small-C Compiler.

What I would suggest is following Crenshaw's tutorial, but write it for a C-like language compiler, and whatever CPU target (Crenshaw targets the Motorola 68000 CPU) you wish to target. In order to do this, you will need to know basic assembly of which ever target you want to run the compiled programs on. This could include a emulator for a 68000, or MIPS which are arguably nicer assembly instruction sets than the venerable CISC instruction set of the Intel x86 (16/32-bit).

There are many potential books that can be used as starting points for learning compiler / translator theory (and practice). Read the comp.compilers FAQ, and reviews at various online book sellers. Most introductory books are written as textbooks for sophomore to senior level undergraduate Computer Science classes, so they can be slow reading without a CS background. One older book that might be more introductory, but easier to read than "The Dragon Book" is Introduction to Compiler Construction by Thomas Parsons. It is older, so you should be able to find an used copy from your choice of online book sellers at a reasonable price.

So I'd say, try starting with Jack Crenshaw's Let's Build a Compiler tutorial, write your own, following his examples as a guide, and build the basics of a simple compiler. Once you have that working, you can better decide where you wish to take it from that point.

Added:

In regards to the bootstrapping process. Since there are existing C compilers freely available, you do not need to worry about bootstrapping. Write your compiler with separate, existing tools (GCC, Visual C++ Express, Mingw / djgpp, tcc), and you can worry about self-compiling your project at a much later stage. I was surprised by this part of the question until I realized you were brought to the idea of writing your own compiler by reading Ken Thomas' ACM Turing award speech, Reflections on Trusting Trust, which does go into the compiler bootstrapping process. It's a moderated advanced topic, and is also simply a lot of hassle as well. I find even bootstrapping the GCC C compiler under older Unix systems (Digital OSF/1 on the 64-bit Alpha) that included a C compiler a slow and time consuming, error prone process.

The other sort-of question was what a compiler tool like Yacc actually does. Yacc (Yet Another Compiler Compiler or Bison from GNU) is a tool designed to make writing a compiler (or translator) parser easier. Based on the formal grammar for your target language that you input to yacc, it generates a parser, which is one portion of a compiler's overall design. Next is Lex (or flex from GNU) which used to generate a lexical analyzer or scanner, which is often used in combination with the yacc generated parser to form the skeleton of the front-end of a compiler. These tools make writer a front end arguably easier than writing an lexical analyzer and parser yourself. Crenshaw's tutorial does not use these tools, and you don't need to either, many compiler writers don't always use them. Of course Crenshaw admits the tutorial's parser is quite basic.

Crenshaw's tutorial also skips generating an AST (abstract syntax tree), which simplifies but also limits the tutorial compiler. It lacks most if not all optimization, and is very tied to the specific programming language and the particular assembly language emitted by the "back-end" of the compiler. Normally the AST is a middle piece where some optimization can be performed, and serves to de-couple the compiler front-end and back-end in design. For a beginner without a Computer Science background, I'd suggest not worrying about not having an AST for your first compiler (or at least the first version of it). I think keeping it small and simple will help you finish writing a compiler, in its first version, and you can decide from there how you want to proceed then.

Solution 5

You might be interested in the book/course The Elements of Computing Systems:Building a Modern Computer from First Principles.

Note that this isn't about building a "pc" from stuff you bought off newegg. It begins with a description of Boolean logic fundamentals, and builds a virtual computer from the lowest levels of abstraction to progressively higher levels of abstraction. The course materials are all online, and the book itself is fairly inexpensive from Amazon.

In the course, in addition to "building the hardware", you'll also implement an assembler, virtual machine, compiler, and rudimentary OS, in a step-wise fashion. I think this would give you enough of a background to delve deeper into the subject area with some of the more commonly recommended resources listed in the other answers.

Share:
28,016
Legend
Author by

Legend

Just a simple guy :)

Updated on July 13, 2022

Comments

  • Legend
    Legend almost 2 years

    I came across this: Writing a compiler using Turbo Pascal

    I am curious if there are any tutorials or references explaining how to go about creating a simple C compiler. I mean, it is enough if it gets me to the level of making it understand arithmetic operations. I became really curious after reading this article by Ken Thompson. The idea of writing something that understands itself seems exciting.

    Why did I put up this question instead of asking Google? I tried Google and the Pascal one was the first link. The rest did no seem relevant and added to that... I am not a CS major (so I still need to learn what all those tools like yacc do) and I want to learn this by doing and am hoping people with more experience are always better at these things than Google. I want to read some article written in the same spirit as the one I listed above but that which highlights at least the bootstrapping phases of building a simple C compiler.

    Also, I don't know the best way to learn. Do I start off building a C compiler in C or some other language? Do I write a C compiler or some other language? I feel questions like this are better answered once I have some direction to explore. Any suggestions?

    Any suggestions?

  • Phong
    Phong over 14 years
    @duffymo: thanks, What I really liked on this tutorial is that they dont rely on any external soft for the lexer/parser function.
  • Legend
    Legend over 14 years
    Awesome! Thank you very much... That thread seems like a beast by itself.. Will start digging into it...
  • Legend
    Legend over 14 years
    Sorry just trying to understand... Are you suggesting writing something using the higher level languages like Python or Ruby? If yes, sure, I am open to trying that as well.. As I mentioned, I am a complete n00b in this area so anything that makes me understand the main concept is welcome... Also could you clarify the resume part? :D
  • Andrew McGregor
    Andrew McGregor over 14 years
    A non-trivial program in one of those modern languages will look good on your resume. It would be substantially easier to implement a compiler in something like Ruby or Python. Haskell would be by no means easy, but it might lead you to a much better compiler.
  • Legend
    Legend over 14 years
    Understood... I've only heard of Haskell... Never tried it though. Writing one in Python or Ruby never occurred to me. If that's the case, I'll definitely look into it. Thanks
  • duffymo
    duffymo over 14 years
    Yes, it's really terrific. Nice find. Thanks for posting it.
  • mctylr
    mctylr over 14 years
    Can we skip the which language is better contest? As much as I personally enjoy Perl or Python, the OP said C was the desired language.
  • Potatoswatter
    Potatoswatter over 14 years
    @Legend: and by the way, don't listen to all the naysayers. You probably aren't going to make a breakthrough or even a competitive language, but neither would someone with a degree on their first attempt. So long as you're having fun, you'll always find something more to achieve.
  • Potatoswatter
    Potatoswatter over 14 years
    Agreed with mctylr, but at least my background makes a functional language look like a much better "host" than Python. Compilers involve building structures quickly and carefully, and the "easy" languages get too hairy when it comes to things like identifier bindings and the underlying meaning of their builtin data structures. Also disagreeing on the Agile part. For a beginner project, this should be one person working one step at a time. Do save your test cases. Don't obsess over your plan.
  • mctylr
    mctylr over 14 years
    Isn't the original article (Schorre's) behind ACM's pay-wall? If so, please note it. I happen to be an ACM member, but not everyone is.
  • Legend
    Legend over 14 years
    So true... Sometimes I do get nervous looking at the 'real' CS grads... In fact, a recent post at slashdot: ask.slashdot.org/comments.pl?sid=10/02/19/147251 made me think more. But at the end of the day, I guess it all depends on how we think... :)
  • Legend
    Legend over 14 years
    I'm just not sure if I can post a link here but I'm sure alternate versions are available through Google Scholar... Thanks a lot Ira Baxter.
  • Legend
    Legend over 14 years
    Oh... I think I have that book with me somewhere..! Thanks
  • DigitalRoss
    DigitalRoss over 14 years
    @Potatoswatter, regarding Agile, I realize he is just one guy so the team parts of Agile won't apply. But I think it's very important that he incrementally prototype the program. He really needs to have something trivial running, then something slightly larger, then slightly larger. If he tries to really write a compiler and then start testing he will be totally doomed.
  • Legend
    Legend over 14 years
    It took me quite a while to digest what you've written. Very informative post. Thank you for your time...
  • Legend
    Legend over 14 years
    I know I did not really ask this question but you pretty much gave me a solution to a series of questions that I was about to ask in the future :) Looks like a really interesting book. Ordered it yesterday... just want to see how it feels to start from the roots... Thank again...
  • mctylr
    mctylr over 14 years
    Thank you, I am glad you found it useful. I hope I can answer your question in a manner that is helpful, and encourages you to be successful.
  • Tim_K
    Tim_K over 14 years
    +1 for ANTLR. It might not be the best parser generator but the debugging and testing tools are hard to top.
  • Wei Hu
    Wei Hu over 14 years
    I wouldn't call CIL a C compiler. CIL consists of a parser that reads C files and a writer that outputs equivalent C files. It does help with code analysis and transformation.
  • Norman Ramsey
    Norman Ramsey over 14 years
    I don't think I did call CIL a compiler. It's a front end. With C, that's a "big chunk", as I said.
  • The111
    The111 over 12 years
    This book looks like what I've wanted for a very long time. I'm partially through a MS in CS, but I have no undergrad in CS and know I am lacking tons of low level knowledge. This books looks like an excellent place to start. Thanks.
  • Joe Internet
    Joe Internet over 12 years
  • mctylr
    mctylr about 9 years
    A link to an (public) ACM Queue article about following the META II paper using Python. META II: Digital Vellum in the Digital Scriptorium by Dave Long (Jan 2015, Vol 13, Issue 1).
  • Ira Baxter
    Ira Baxter about 9 years
    @mctylr: His article on MetaII, is longer than the MetaII paper itself :-}