Generator as function argument

12,868

Both 3. and 4. should be syntax errors on all Python versions. However you've found a bug that affects Python versions 2.5 - 3.4, and which was subsequently posted to the Python issue tracker. Because of the bug, an unparenthesized generator expression was accepted as an argument to a function if it was accompanied only by *args and/or **kwargs. While Python 2.6+ allowed both cases 3. and 4., Python 2.5 allowed only case 3. - yet both of them were against the documented grammar:

call    ::=     primary "(" [argument_list [","]
                            | expression genexpr_for] ")"

i.e. the documentation says a function call comprises of primary (the expression that evaluates to a callable), followed by, in parentheses, either an argument list or just an unparenthesized generator expression; and within the argument list, all generator expressions must be in parentheses.


This bug (though it seems it had not been known), had been fixed in Python 3.5 prereleases. In Python 3.5 parentheses are always required around a generator expression, unless it is the only argument to the function:

Python 3.5.0a4+ (default:a3f2b171b765, May 19 2015, 16:14:41) 
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f(1 for i in [42], *a)
  File "<stdin>", line 1
SyntaxError: Generator expression must be parenthesized if not sole argument

This is now documented in the What's New in Python 3.5, thanks to DeTeReR spotting this bug.


Analysis of the bug

There was a change made to Python 2.6 which allowed the use of keyword arguments after *args:

It’s also become legal to provide keyword arguments after a *args argument to a function call.

>>> def f(*args, **kw):
...     print args, kw
...
>>> f(1,2,3, *(4,5,6), keyword=13)
(1, 2, 3, 4, 5, 6) {'keyword': 13}

Previously this would have been a syntax error. (Contributed by Amaury Forgeot d’Arc; issue 3473.)


However, the Python 2.6 grammar does not make any distinction between keyword arguments, positional arguments, or bare generator expressions - they are all of type argument to the parser.

As per Python rules, a generator expression must be parenthesized if it is not the sole argument to the function. This is validated in the Python/ast.c:

for (i = 0; i < NCH(n); i++) {
    node *ch = CHILD(n, i);
    if (TYPE(ch) == argument) {
        if (NCH(ch) == 1)
            nargs++;
        else if (TYPE(CHILD(ch, 1)) == gen_for)
            ngens++;
        else
            nkeywords++;
    }
}
if (ngens > 1 || (ngens && (nargs || nkeywords))) {
    ast_error(n, "Generator expression must be parenthesized "
              "if not sole argument");
    return NULL;
}

However this function does not consider the *args at all - it specifically only looks for ordinary positional arguments and keyword arguments.

Further down in the same function, there is an error message generated for non-keyword arg after keyword arg:

if (TYPE(ch) == argument) {
    expr_ty e;
    if (NCH(ch) == 1) {
        if (nkeywords) {
            ast_error(CHILD(ch, 0),
                      "non-keyword arg after keyword arg");
            return NULL;
        }
        ...

But this again applies to arguments that are not unparenthesized generator expressions as evidenced by the else if statement:

else if (TYPE(CHILD(ch, 1)) == gen_for) {
    e = ast_for_genexp(c, ch);
    if (!e)
        return NULL;
    asdl_seq_SET(args, nargs++, e);
}

Thus an unparenthesized generator expression was allowed to slip pass.


Now in Python 3.5 one can use the *args anywhere in a function call, so the Grammar was changed to accommodate for this:

arglist: argument (',' argument)*  [',']

and

argument: ( test [comp_for] |
            test '=' test |
            '**' test |
            '*' test )

and the for loop was changed to

for (i = 0; i < NCH(n); i++) {
    node *ch = CHILD(n, i);
    if (TYPE(ch) == argument) {
        if (NCH(ch) == 1)
            nargs++;
        else if (TYPE(CHILD(ch, 1)) == comp_for)
            ngens++;
        else if (TYPE(CHILD(ch, 0)) == STAR)
            nargs++;
        else
            /* TYPE(CHILD(ch, 0)) == DOUBLESTAR or keyword argument */
            nkeywords++;
    }
}

Thus fixing the bug.

However the inadvertent change is that the valid looking constructions

func(i for i in [42], *args)

and

func(i for i in [42], **kwargs)

where an unparenthesized generator precedes *args or **kwargs now stopped working.


To locate this bug, I tried various Python versions. In 2.5 you'd get SyntaxError:

Python 2.5.5 (r255:77872, Nov 28 2010, 16:43:48) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f(*[1], 2 for x in [2])
  File "<stdin>", line 1
    f(*[1], 2 for x in [2])

And this was fixed before some prerelease of Python 3.5:

Python 3.5.0a4+ (default:a3f2b171b765, May 19 2015, 16:14:41) 
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f(*[1], 2 for x in [2])
  File "<stdin>", line 1
SyntaxError: Generator expression must be parenthesized if not sole argument

However, the parenthesized generator expression, it works in Python 3.5, but it does not work not in Python 3.4:

f(*[1], (2 for x in [2]))

And this is the clue. In Python 3.5 the *splatting is generalized; you can use it anywhere in a function call:

>>> print(*range(5), 42)
0 1 2 3 4 42

So the actual bug (generator working with *star without parentheses) was indeed fixed in Python 3.5, and the bug could be found in that what changed between Python 3.4 and 3.5

Share:
12,868

Related videos on Youtube

DeTeReR
Author by

DeTeReR

Updated on June 24, 2022

Comments

  • DeTeReR
    DeTeReR almost 2 years

    Can anyone explain why passing a generator as the only positional argument to a function seems to have special rules?

    If we have:

    def f(*args):
        print "Success!"
        print args
    
    1. This works, as expected.

      >>> f(1, *[2])
      Success!
      (1, 2)
      
    2. This does not work, as expected.

      >>> f(*[2], 1)
        File "<stdin>", line 1
      SyntaxError: only named arguments may follow *expression
      
    3. This works, as expected

      >>> f(1 for x in [1], *[2])
      Success! 
      (generator object <genexpr> at 0x7effe06bdcd0>, 2)
      
    4. This works, but I don't understand why. Shouldn't it fail in the same way as 2)

      >>> f(*[2], 1 for x in [1])
      Success!
      (generator object <genexpr> at 0x7effe06bdcd0>, 2)
      
    • J0HN
      J0HN over 8 years
      Not an exact duplicate, but quite similar: stackoverflow.com/questions/12720450/…. TL;DR seems like it's an implementation detail - it just works like that.
    • Bakuriu
      Bakuriu over 8 years
      Note: case 2 should work in python 3.5+ (due to the PEP 448)
    • Antti Haapala -- Слава Україні
      Antti Haapala -- Слава Україні over 8 years
      Python 3.5 is out, and it now tells that the case 3 (actually also the case 4) has been fixed. What's new in Python 3.5
  • viraptor
    viraptor over 8 years
    It's not fixed in 3.5 - just put parens around the generator and the behaviour is the same.
  • Antti Haapala -- Слава Україні
    Antti Haapala -- Слава Україні over 8 years
    @viraptor good point, in 3.4 the parenthesized expression gives an error
  • viraptor
    viraptor over 8 years
    huh? Running on 3.4.3: f(*[1], 1 for x in [1]) => (<generator object <genexpr> at 0x7fa56c889288>, 1)
  • Antti Haapala -- Слава Україні
    Antti Haapala -- Слава Україні over 8 years
    @viraptor f(*[1], (1 for x in [1])) is syntax error on Python 3.4. It is valid in Python 3.5.
  • Nick Sweeting
    Nick Sweeting over 8 years
    I would gild this answer if I could, thanks for including the relevant C source!