Class inheritance in Python 3.7 dataclasses

83,483

Solution 1

The way dataclasses combines attributes prevents you from being able to use attributes with defaults in a base class and then use attributes without a default (positional attributes) in a subclass.

That's because the attributes are combined by starting from the bottom of the MRO, and building up an ordered list of the attributes in first-seen order; overrides are kept in their original location. So Parent starts out with ['name', 'age', 'ugly'], where ugly has a default, and then Child adds ['school'] to the end of that list (with ugly already in the list). This means you end up with ['name', 'age', 'ugly', 'school'] and because school doesn't have a default, this results in an invalid argument listing for __init__.

This is documented in PEP-557 Dataclasses, under inheritance:

When the Data Class is being created by the @dataclass decorator, it looks through all of the class's base classes in reverse MRO (that is, starting at object) and, for each Data Class that it finds, adds the fields from that base class to an ordered mapping of fields. After all of the base class fields are added, it adds its own fields to the ordered mapping. All of the generated methods will use this combined, calculated ordered mapping of fields. Because the fields are in insertion order, derived classes override base classes.

and under Specification:

TypeError will be raised if a field without a default value follows a field with a default value. This is true either when this occurs in a single class, or as a result of class inheritance.

You do have a few options here to avoid this issue.

The first option is to use separate base classes to force fields with defaults into a later position in the MRO order. At all cost, avoid setting fields directly on classes that are to be used as base classes, such as Parent.

The following class hierarchy works:

# base classes with fields; fields without defaults separate from fields with.
@dataclass
class _ParentBase:
    name: str
    age: int

@dataclass
class _ParentDefaultsBase:
    ugly: bool = False

@dataclass
class _ChildBase(_ParentBase):
    school: str

@dataclass
class _ChildDefaultsBase(_ParentDefaultsBase):
    ugly: bool = True

# public classes, deriving from base-with, base-without field classes
# subclasses of public classes should put the public base class up front.

@dataclass
class Parent(_ParentDefaultsBase, _ParentBase):
    def print_name(self):
        print(self.name)

    def print_age(self):
        print(self.age)

    def print_id(self):
        print(f"The Name is {self.name} and {self.name} is {self.age} year old")

@dataclass
class Child(Parent, _ChildDefaultsBase, _ChildBase):
    pass

By pulling out fields into separate base classes with fields without defaults and fields with defaults, and a carefully selected inheritance order, you can produce an MRO that puts all fields without defaults before those with defaults. The reversed MRO (ignoring object) for Child is:

_ParentBase
_ChildBase
_ParentDefaultsBase
_ChildDefaultsBase
Parent

Note that Parent doesn't set any new fields, so it doesn't matter here that it ends up 'last' in the field listing order. The classes with fields without defaults (_ParentBase and _ChildBase) precede the classes with fields with defaults (_ParentDefaultsBase and _ChildDefaultsBase).

The result is Parent and Child classes with a sane field older, while Child is still a subclass of Parent:

>>> from inspect import signature
>>> signature(Parent)
<Signature (name: str, age: int, ugly: bool = False) -> None>
>>> signature(Child)
<Signature (name: str, age: int, school: str, ugly: bool = True) -> None>
>>> issubclass(Child, Parent)
True

and so you can create instances of both classes:

>>> jack = Parent('jack snr', 32, ugly=True)
>>> jack_son = Child('jack jnr', 12, school='havard', ugly=True)
>>> jack
Parent(name='jack snr', age=32, ugly=True)
>>> jack_son
Child(name='jack jnr', age=12, school='havard', ugly=True)

Another option is to only use fields with defaults; you can still make in an error to not supply a school value, by raising one in __post_init__:

_no_default = object()

@dataclass
class Child(Parent):
    school: str = _no_default
    ugly: bool = True

    def __post_init__(self):
        if self.school is _no_default:
            raise TypeError("__init__ missing 1 required argument: 'school'")

but this does alter the field order; school ends up after ugly:

<Signature (name: str, age: int, ugly: bool = True, school: str = <object object at 0x1101d1210>) -> None>

and a type hint checker will complain about _no_default not being a string.

You can also use the attrs project, which was the project that inspired dataclasses. It uses a different inheritance merging strategy; it pulls overridden fields in a subclass to the end of the fields list, so ['name', 'age', 'ugly'] in the Parent class becomes ['name', 'age', 'school', 'ugly'] in the Child class; by overriding the field with a default, attrs allows the override without needing to do a MRO dance.

attrs supports defining fields without type hints, but lets stick to the supported type hinting mode by setting auto_attribs=True:

import attr

@attr.s(auto_attribs=True)
class Parent:
    name: str
    age: int
    ugly: bool = False

    def print_name(self):
        print(self.name)

    def print_age(self):
        print(self.age)

    def print_id(self):
        print(f"The Name is {self.name} and {self.name} is {self.age} year old")

@attr.s(auto_attribs=True)
class Child(Parent):
    school: str
    ugly: bool = True

Solution 2

You can use attributes with defaults in parent classes if you exclude them from the init function. If you need the possibility to override the default at init, extend the code with the answer of Praveen Kulkarni.

from dataclasses import dataclass, field

@dataclass
class Parent:
    name: str
    age: int
    ugly: bool = field(default=False, init=False)

@dataclass
class Child(Parent):
    school: str

jack = Parent('jack snr', 32)
jack_son = Child('jack jnr', 12, school = 'havard')
jack_son.ugly = True

Or even

@dataclass
class Child(Parent):
    school: str
    ugly = True
    # This does not work
    # ugly: bool = True

jack_son = Child('jack jnr', 12, school = 'havard')
assert jack_son.ugly

Solution 3

Note that with Python 3.10, it is now possible to do it natively with dataclasses.

Dataclasses 3.10 added the kw_only attribute (similar to attrs). It allows you to specify which fields are keyword_only, thus will be set at the end of the init, not causing an inheritance problem.

Taking directly from Eric Smith blog post on the subject, they are two reasons people were asking for this feature:

  • When a dataclass has many fields, specifying them by position can become unreadable. It also requires that for backward compatibility, all new fields are added to the end of the dataclass. This isn't always desirable.
  • When a dataclass inherits from another dataclass, and the base class has fields with default values, then all of the fields in the derived class must also have defaults.

What follow is the simplest way to do it with this new argument, but there are multiple ways you can use it to use inheritance with default values in the parent class:

from dataclasses import dataclass

@dataclass(kw_only=True)
class Parent:
    name: str
    age: int
    ugly: bool = False

@dataclass(kw_only=True)
class Child(Parent):
    school: str

ch = Child(name="Kevin", age=17, school="42")
print(ch.ugly)

Take a look at the blogpost linked above for a more thorough explanation of kw_only.

Cheers !

PS: As it is fairly new, note that your IDE might still raise a possible error, but it works at runtime

Solution 4

The approach below deals with this problem while using pure python dataclasses and without much boilerplate code.

The ugly_init: dataclasses.InitVar[bool] serves as a pseudo-field just to help us do initialization and will be lost once the instance is created. While ugly: bool = field(init=False) is an instance member which will not be initialized by __init__ method but can be alternatively initialized using __post_init__ method (you can find more here.).

from dataclasses import dataclass, field

@dataclass
class Parent:
    name: str
    age: int
    ugly: bool = field(init=False)
    ugly_init: dataclasses.InitVar[bool]

    def __post_init__(self, ugly_init: bool):
        self.ugly = ugly_init

    def print_name(self):
        print(self.name)

    def print_age(self):
        print(self.age)

    def print_id(self):
        print(f'The Name is {self.name} and {self.name} is {self.age} year old')

@dataclass
class Child(Parent):
    school: str

jack = Parent('jack snr', 32, ugly_init=True)
jack_son = Child('jack jnr', 12, school='havard', ugly_init=True)

jack.print_id()
jack_son.print_id()

If you want to use a pattern where ugly_init is optional, you can define a class method on the Parent that includes ugly_init as an optional parameter:

from dataclasses import dataclass, field, InitVar

@dataclass
class Parent:
    name: str
    age: int
    ugly: bool = field(init=False)
    ugly_init: InitVar[bool]

    def __post_init__(self, ugly_init: bool):
        self.ugly = ugly_init
    
    @classmethod
    def create(cls, ugly_init=True, **kwargs):
        return cls(ugly_init=ugly_init, **kwargs)

    def print_name(self):
        print(self.name)

    def print_age(self):
        print(self.age)

    def print_id(self):
        print(f'The Name is {self.name} and {self.name} is {self.age} year old')

@dataclass
class Child(Parent):
    school: str

jack = Parent.create(name='jack snr', age=32, ugly_init=False)
jack_son = Child.create(name='jack jnr', age=12, school='harvard')

jack.print_id()
jack_son.print_id()

Now you can use the create class method as a factory method for creating Parent/Child classes with a default value for ugly_init. Note you must use named parameters for this approach to work.

Solution 5

You're seeing this error because an argument without a default value is being added after an argument with a default value. The insertion order of inherited fields into the dataclass is the reverse of Method Resolution Order, which means that the Parent fields come first, even if they are over written later by their children.

An example from PEP-557 - Data Classes:

@dataclass
class Base:
    x: Any = 15.0
    y: int = 0

@dataclass
class C(Base):
    z: int = 10
    x: int = 15

The final list of fields is, in order,x, y, z. The final type of x is int, as specified in class C.

Unfortunately, I don't think there's any way around this. My understanding is that if the parent class has a default argument, then no child class can have non-default arguments.

Share:
83,483

Related videos on Youtube

Mysterio
Author by

Mysterio

A proud African and a wanna-be-geek

Updated on February 21, 2022

Comments

  • Mysterio
    Mysterio about 2 years

    I'm currently trying my hands on the new dataclass constructions introduced in Python 3.7. I am currently stuck on trying to do some inheritance of a parent class. It looks like the order of the arguments are botched by my current approach such that the bool parameter in the child class is passed before the other parameters. This is causing a type error.

    from dataclasses import dataclass
    
    @dataclass
    class Parent:
        name: str
        age: int
        ugly: bool = False
    
        def print_name(self):
            print(self.name)
    
        def print_age(self):
            print(self.age)
    
        def print_id(self):
            print(f'The Name is {self.name} and {self.name} is {self.age} year old')
    
    @dataclass
    class Child(Parent):
        school: str
        ugly: bool = True
    
    
    jack = Parent('jack snr', 32, ugly=True)
    jack_son = Child('jack jnr', 12, school = 'havard', ugly=True)
    
    jack.print_id()
    jack_son.print_id()
    

    When I run this code I get this TypeError:

    TypeError: non-default argument 'school' follows default argument
    

    How do I fix this?

    • Battery_Al
      Battery_Al over 2 years
      I think it's worth noting that within the attrs / dataclass typed python paradigm, composition is usually preferred over inheritance. Extending your subclass's __init__ like this is vaguely a violation of LSP, because your various subclasses won't be interchangeable. To be clear, I think this way is often practical, but in case you haven't considered using composition: it might also make sense to make a Child dataclass that doesn't inherit, and then have a child attribute on the Parent class.
  • Mysterio
    Mysterio almost 6 years
    I get that the non default argument must come before the default one but how can it when the parent arguments initialise before adding the child arguments?
  • Patrick Haugh
    Patrick Haugh almost 6 years
    I don't think there's any way around it unfortunately. My understanding is that if the parent class has a default argument, then no child class can have non-default arguments.
  • Mysterio
    Mysterio almost 6 years
    Can you add that info to the answer before I mark it? It will help someone one day. It's quite unfortunate that limitation of dataclasses. Renders it moot my current python project. It's nice to see such implementations tho
  • Mysterio
    Mysterio over 5 years
    Thanks a lot for the detailed answer
  • Scott P.
    Scott P. about 5 years
    Did you intend to write "class MyDataclass(DataclassWithDefaults, NoDefaultAttributesPostInitMixin)" above in 2)?
  • Ollie
    Ollie about 4 years
    This is very helpful. I'm confused about the mro though. Running print(Child.mro()) I get: [<class 'main.Child'>, <class 'main.Parent'>, <class 'main._ChildDefaultsBase'>, <class 'main._ParentDefaultsBase'>, <class 'main._ChildBase'>, <class 'main._ParentBase'>, <class 'object'>] So don't the default bases precede the base classes?
  • Martijn Pieters
    Martijn Pieters about 4 years
    @Ollie that’s the correct order; note that I listed it in my answer. When you have multiple base classes you need a way to linearise the classes involved to decide what classes come before others when inheriting. Python uses the C3 linearisation method and my answer takes advantage of how this works to ensure attributes with defaults always come after all attributes without defaults.
  • laike9m
    laike9m almost 4 years
    Actually, attrs can work but you need to use attr.ib(kw_only=True), see github.com/python-attrs/attrs/issues/38
  • Vadym Tyemirov
    Vadym Tyemirov over 3 years
    ugly_init now is a required parameter with no default
  • Nils Bengtsson
    Nils Bengtsson over 3 years
    I think this answer should be more recognised. It's solved the problem of having a default field in the parent class, thus removes the TypeError.
  • lmiguelvargasf
    lmiguelvargasf about 3 years
    @SimonMarcin, this is a great answer!
  • Daniel Albarral
    Daniel Albarral almost 3 years
    Thx, super nice solution, the only problem that I see is that this is not compatible with mypy, I'm trying to fix it.
  • BinarSkugga
    BinarSkugga over 2 years
    This is the right answer. Unless you support the new and shiny (>= 3.10) this solves the problem ! +1
  • boudewijn21
    boudewijn21 over 2 years
    You could add a __post_init__ with the default value: def __post_init__(self): self.ugly = True