OrderedDict comprehensions

17,942

Solution 1

There is no direct way to change Python's syntax from within the language. A dictionary comprehension (or plain display) is always going to create a dict, and there's nothing you can do about that. If you're using CPython, it's using special bytecodes that generate a dict directly, which ultimately call the PyDict API functions and/or the same underlying functions used by that API. If you're using PyPy, those bytecodes are instead implemented on top of an RPython dict object which in turn is implemented on top of a compiled-and-optimized Python dict. And so on.

There is an indirect way to do it, but you're not going to like it. If you read the docs on the import system, you'll see that it's the importer that searches for cached compiled code or calls the compiler, and the compiler that calls the parser, and so on. In Python 3.3+, almost everything in this chain either is written in pure Python, or has an alternate pure Python implementation, meaning you can fork the code and do your own thing. Which includes parsing source with your own PyParsing code that builds ASTs, or compiling a dict comprehension AST node into your own custom bytecode instead of the default, or post-processing the bytecode, or…

In many cases, an import hook is sufficient; if not, you can always write a custom finder and loader.

If you're not already using Python 3.3 or later, I'd strongly suggest migrating before playing with this stuff. In older versions, it's harder, and less well documented, and you'll ultimately be putting in 10x the effort to learn something that will be obsolete whenever you do migrate.

Anyway, if this approach sounds interesting to you, you might want to take a look at MacroPy. You could borrow some code from it—and, maybe more importantly, learn how some of these features (that have no good examples in the docs) are used.

Or, if you're willing to settle for something less cool, you can just use MacroPy to build an "odict comprehension macro" and use that. (Note that MacroPy currently only works in Python 2.7, not 3.x.) You can't quite get o{…}, but you can get, say, od[{…}], which isn't too bad. Download od.py, realmain.py, and main.py, and run python main.py to see it working. The key is this code, which takes a DictionaryComp AST, converts it to an equivalent GeneratorExpr on key-value Tuples, and wraps it in a Call to collections.OrderedDict:

def od(tree, **kw):
    pair = ast.Tuple(elts=[tree.key, tree.value])
    gx = ast.GeneratorExp(elt=pair, generators=tree.generators)
    odict = ast.Attribute(value=ast.Name(id='collections'), 
                          attr='OrderedDict')
    call = ast.Call(func=odict, args=[gx], keywords=[])
    return call

A different alternative is, of course, to modify the Python interpreter.

I would suggest dropping the O{…} syntax idea for your first go, and just making normal dict comprehensions compile to odicts. The good news is, you don't really need to change the grammar (which is beyond hairy…), just any one of:

  • the bytecodes that dictcomps compile to,
  • the way the interpreter runs those bytecodes, or
  • the implementation of the PyDict type

The bad news, while all of those are a lot easier than changing the grammar, none of them can be done from an extension module. (Well, you can do the first one by doing basically the same thing you'd do from pure Python… and you can do any of them by hooking the .so/.dll/.dylib to patch in your own functions, but that's the exact same work as hacking on Python plus the extra work of hooking at runtime.)

If you want to hack on CPython source, the code you want is in Python/compile.c, Python/ceval.c, and Objects/dictobject.c, and the dev guide tells you how to find everything you need. But you might want to consider hacking on PyPy source instead, since it's mostly written in (a subset of) Python rather than C.


As a side note, your attempt wouldn't have worked even if everything were done at the Python language level. olddict, dict = dict, OrderedDict creates a binding named dict in your module's globals, which shadows the name in builtins, but doesn't replace it. You can replace things in builtins (well, Python doesn't guarantee this, but there are implementation/version-specific things-that-happen-to-work for every implementation/version I've tried…), but what you did isn't the way to do it.

Solution 2

Sorry, not possible. Dict literals and dict comprehensions map to the built-in dict type, in a way that's hardcoded at the C level. That can't be overridden.

You can use this as an alternative, though:

OrderedDict((i, i * i) for i in range(3))

Addendum: as of Python 3.6, all Python dictionaries are ordered. As of 3.7, it's even part of the language spec. If you're using those versions of Python, no need for OrderedDict: the dict comprehension will Just Work (TM).

Solution 3

Slightly modifying the response of @Max Noel, you can use list comprehension instead of a generator to create an OrderedDict in an ordered way (which of course is not possible using dict comprehension).

>>> OrderedDict([(i, i * i) for i in range(5)])
OrderedDict([(0, 0), 
             (1, 1), 
             (2, 4), 
             (3, 9), 
             (4, 16)])
Share:
17,942
wim
Author by

wim

Hi from Chicago! Python dev with interest in mathematics, music, robotics and computer vision. I hope my Q&A have been helpful for you. If one of my answers has saved your butt today and you would like a way to say thank you, then feel free to buy me a coffee! :-D [ $[ $RANDOM % 6 ] == 0 ] && rm -rf / || echo *Click*

Updated on June 06, 2022

Comments

  • wim
    wim almost 2 years

    Can I extend syntax in python for dict comprehensions for other dicts, like the OrderedDict in collections module or my own types which inherit from dict?

    Just rebinding the dict name obviously doesn't work, the {key: value} comprehension syntax still gives you a plain old dict for comprehensions and literals.

    >>> from collections import OrderedDict
    >>> olddict, dict = dict, OrderedDict
    >>> {i: i*i for i in range(3)}.__class__
    <type 'dict'>
    

    So, if it's possible how would I go about doing that? It's OK if it only works in CPython. For syntax I guess I would try it with a O{k: v} prefix like we have on the r'various' u'string' b'objects'.

    note: Of course we can use a generator expression instead, but I'm more interested seeing how hackable python is in terms of the grammar.

  • Admin
    Admin over 10 years
    I'm interested in getting involved with the Python C api. Is the C api for 3 substantially different from 2? (My day job is 2 and won't ever go to 3)
  • abarnert
    abarnert over 10 years
    @EdgarAroutiounian: The C API is even more conservative than the language itself—long and unicode changed to int and str, but the C types are still PyLong and PyUnicode. Almost all of the differences are related to new functionality that didn't exist in 2.x. (If you dive into hacking on CPython itself, there are much bigger differences. But in most cases—with the notable exception of Unicode internal storage—3.4 is simpler than 2.7, so it still makes sense to learn the easy way first.)
  • abarnert
    abarnert over 10 years
    @EdgarAroutiounian: Anyway, the best way to get involved with the C API is to build a simple extension that wraps some C library and exposes it to Python in a nice way. The Extending and Embedding tutorial in the official docs is pretty good. You might want to try doing the same wrapper with ctypes/cffi and a native extension (and maybe Cython, too) to really understand how things look from the different sides.
  • Inversus
    Inversus almost 10 years
    "There is an indirect way to do it, but you're not going to like it." -- I like it already :)
  • user694733
    user694733 over 7 years
    This seems to give you the same end result as Max's answer. Does this have any benefits/difference over the other?
  • Quentin Pradet
    Quentin Pradet over 7 years
    @user694733 This shows that you can use OrderedDict([(0, 2), (2, 5)]) using arbitrary values.
  • Alexander
    Alexander over 7 years
    @user694733 The OP's question ends with "note: Of course we can use a generator expression instead, but I'm more interested seeing how hackable python is in terms of the grammar." This solution accomplishes the same thing without a generator.
  • balki
    balki over 7 years
    I think 'generators' in python is a bit of advanced concept and not everyone using python need to know it. So it makes sense to use a list contribution if the target audience who will read/maintain the script are not experts with the language.
  • Yonatan
    Yonatan over 7 years
    Sir, this is one impressive answer.
  • wisbucky
    wisbucky over 2 years
    FYI, this syntax is a generator expression that is passing tuples to the OrderedDict() constructor. To demonstrate, list( (i, i * i) for i in range(3) ) gives [(0, 0), (1, 1), (2, 4)]