Parse a .py file, read the AST, modify it, then write back the modified source code

61,937

Solution 1

Pythoscope does this to the test cases it automatically generates as does the 2to3 tool for python 2.6 (it converts python 2.x source into python 3.x source).

Both these tools uses the lib2to3 library which is an implementation of the python parser/compiler machinery that can preserve comments in source when it's round tripped from source -> AST -> source.

The rope project may meet your needs if you want to do more refactoring like transforms.

The ast module is your other option, and there's an older example of how to "unparse" syntax trees back into code (using the parser module). But the ast module is more useful when doing an AST transform on code that is then transformed into a code object.

The redbaron project also may be a good fit (ht Xavier Combelle)

Solution 2

The builtin ast module doesn't seem to have a method to convert back to source. However, the codegen module here provides a pretty printer for the ast that would enable you do do so. eg.

import ast
import codegen

expr="""
def foo():
   print("hello world")
"""
p=ast.parse(expr)

p.body[0].body = [ ast.parse("return 42").body[0] ] # Replace function body with "return 42"

print(codegen.to_source(p))

This will print:

def foo():
    return 42

Note that you may lose the exact formatting and comments, as these are not preserved.

However, you may not need to. If all you require is to execute the replaced AST, you can do so simply by calling compile() on the ast, and execing the resulting code object.

Solution 3

In a different answer I suggested using the astor package, but I have since found a more up-to-date AST un-parsing package called astunparse:

>>> import ast
>>> import astunparse
>>> print(astunparse.unparse(ast.parse('def foo(x): return 2 * x')))


def foo(x):
    return (2 * x)

I have tested this on Python 3.5.

Solution 4

You might not need to re-generate source code. That's a bit dangerous for me to say, of course, since you have not actually explained why you think you need to generate a .py file full of code; but:

  • If you want to generate a .py file that people will actually use, maybe so that they can fill out a form and get a useful .py file to insert into their project, then you don't want to change it into an AST and back because you'll lose all formatting (think of the blank lines that make Python so readable by grouping related sets of lines together) (ast nodes have lineno and col_offset attributes) comments. Instead, you'll probably want to use a templating engine (the Django template language, for example, is designed to make templating even text files easy) to customize the .py file, or else use Rick Copeland's MetaPython extension.

  • If you are trying to make a change during compilation of a module, note that you don't have to go all the way back to text; you can just compile the AST directly instead of turning it back into a .py file.

  • But in almost any and every case, you are probably trying to do something dynamic that a language like Python actually makes very easy, without writing new .py files! If you expand your question to let us know what you actually want to accomplish, new .py files will probably not be involved in the answer at all; I have seen hundreds of Python projects doing hundreds of real-world things, and not a single one of them needed to ever writer a .py file. So, I must admit, I'm a bit of a skeptic that you've found the first good use-case. :-)

Update: now that you've explained what you're trying to do, I'd be tempted to just operate on the AST anyway. You will want to mutate by removing, not lines of a file (which could result in half-statements that simply die with a SyntaxError), but whole statements — and what better place to do that than in the AST?

Solution 5

Took a while, but Python 3.9 has this: https://docs.python.org/3.9/whatsnew/3.9.html#ast https://docs.python.org/3.9/library/ast.html#ast.unparse

ast.unparse(ast_obj)

Unparse an ast.AST object and generate a string with code that would produce an equivalent ast.AST object if parsed back with ast.parse().

Share:
61,937

Related videos on Youtube

Amandasaurus
Author by

Amandasaurus

I'm a Linux user

Updated on July 31, 2021

Comments

  • Amandasaurus
    Amandasaurus almost 3 years

    I want to programmatically edit python source code. Basically I want to read a .py file, generate the AST, and then write back the modified python source code (i.e. another .py file).

    There are ways to parse/compile python source code using standard python modules, such as ast or compiler. However, I don't think any of them support ways to modify the source code (e.g. delete this function declaration) and then write back the modifying python source code.

    UPDATE: The reason I want to do this is I'd like to write a Mutation testing library for python, mostly by deleting statements / expressions, rerunning tests and seeing what breaks.

    • dfa
      dfa about 15 years
      Deprecated since version 2.6: The compiler package has been removed in Python 3.0.
    • user1066101
      user1066101 about 15 years
      What can't you edit the source? Why can't you write a decorator?
    • Ryan
      Ryan about 15 years
      Holy cow! I wanted to make a mutation tester for python using the same technique (specifically creating a nose plugin), are you planning on open sourcing it?
    • Amandasaurus
      Amandasaurus about 15 years
      @Ryan Yeah I'll open source anything I create. We should keep in contact on this
    • chiffa
      chiffa over 10 years
      Are you running any genetic algorithms on your mutations? :P
    • jfs
      jfs over 9 years
      macropy provides syntax sugar for manipulating ast at import time.
  • Ryan
    Ryan about 15 years
    Good overview of possible solution and likely alternatives.
  • Rick Copeland
    Rick Copeland about 15 years
    Real world use case for code generation: Kid and Genshi (I believe) generate Python from XML templates for speedy rendering of dynamic pages.
  • mattbasta
    mattbasta over 13 years
    Just for anyone using this in the future, codegen is largely out-of-date and has a few bugs. I've fixed a couple of them; I have this as a gist on github: gist.github.com/791312
  • Janus Troelsen
    Janus Troelsen over 11 years
    the unparse example is still maintained, here is the updated py3k version: hg.python.org/cpython/log/tip/Tools/parser/unparse.py
  • Ira Baxter
    Ira Baxter over 9 years
    I haven't looked at this answer in 4 years. Wow, it has been downvoted several times. That's really stunning, since it answers OP's question directly, and even shows how to do the mutations he wants to do. I don't suppose any of the downvoters would care to explain why they downvoted.
  • Zoran Pavlovic
    Zoran Pavlovic over 9 years
    Because it promotes a very expensive, closed-source tool.
  • Ira Baxter
    Ira Baxter over 9 years
    @ZoranPavlovic: So you are not objecting to any of its technical accuracy or utility?
  • Zoran Pavlovic
    Zoran Pavlovic over 9 years
    Check the requesters edit/comments. I doubt an open-sourced library would be compatible with your proposed solution of using a closed-source tool.
  • Ira Baxter
    Ira Baxter over 9 years
    @Zoran: He didn't say he had an open source library. He said he wanted to modify Python source code (using ASTs), and the solutions he could find did not do that. This is such a solution. You don't think people use commercial tools on programs written in languages like Python on Java?
  • Ira Baxter
    Ira Baxter about 9 years
    @Zoran: Some would call DMS expensive; it certainly isn't in the $100 category. On the other hand, it does a lot out-of-the-box that many other "free" or cheap tools simply do not. Including parsing and transform Python reliably. If you believe your time is free and you have a lot of it, DMS is expensive. If you need to solve a problem and can't find another answer and don't want to build it all yourself from scratch then your opinion of its cost may be considerably different. This is the classic reason that paid-software exists vs. everything being freeware.
  • wim
    wim about 8 years
    I'm not a down-voter, but the post reads a bit like an advertisement. To improve the answer, you could disclose that you're affiliated with the product
  • Ira Baxter
    Ira Baxter about 8 years
    @wim: The phrase "our" is long-accepted SO indication of affiliation. There are a lot of folk that seem to down-vote because it is a commercial product, and not free for them to use. Unfortunately for me, it wasn't free to build; I've put 20 years of my career into it.
  • wim
    wim about 8 years
    Disclose more clearly, I meant. It wasn't obvious to me at all until I clicked through to your profile. I also note that the word "Our" was edited from the word "The" four years later than the original posting, so it's likely you may have collected those downvotes before your edits.
  • Ira Baxter
    Ira Baxter about 8 years
    The was a long, heated argument about what it meant to disclose affiliation. "Our" was deemed adequate, even if you don't think it enough; everybody has an opinion but they don't often know the site policies. The "our" was added several years later, after SO decided they needed a policy for this. Since policy, I have gone back over various answers and adjusted them; some take awhile to notice. There was a downvote before that adjustment because I hadn't adjusted yet; now there several (3-4) more recent that that. The NaySayers have a piling on effect.
  • mbdevpl
    mbdevpl about 8 years
    With regard to unparse.py script - it may be really cumbersome to use it from another script. But, there is a package called astunparse (on github, on pypi) which is basically a properly packaged version of unparse.py.
  • zjffdu
    zjffdu almost 8 years
    Notice the latest codegen is updated in 2012 which is after the above comment, so I guess codegen is updated. @mattbasta
  • medmunds
    medmunds almost 6 years
    astor appears to be a maintained successor to codegen