How can dataclasses be made to work better with __slots__?

17,131

Solution 1

2021 UPDATE: direct support for __slots__ is added to python 3.10. I am leaving this answer for posterity and won't be updating it.

The problem is not unique to dataclasses. ANY conflicting class attribute will stomp all over a slot:

>>> class Failure:
...     __slots__ = tuple("xyz")
...     x=1
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'x' in __slots__ conflicts with class variable

This is simply how slots work. The error happens because __slots__ creates a class-level descriptor object for each slot name:

>>> class Success:
...     __slots__ = tuple("xyz")
...
>>>
>>> type(Success.x)
<class 'member_descriptor'>

In order to prevent this conflicting variable name error, the class namespace must be altered before the class object is instantiated such that there are not two objects competing for the same member name in the class:

  • the specified (default) value*
  • the slot descriptor (created by the slots machinery)

For this reason, an __init_subclass__ method on a parent class will not be sufficient, nor will a class decorator, because in both cases the class object has already been created by the time these functions have received the class to alter it.

Current option: write a metaclass

Until such time as the slots machinery is altered to allow more flexibility, or the language itself provides an opportunity to alter the class namespace before the class object is instantiated, our only choice is to use a metaclass.

Any metaclass written to solve this problem must, at minimum:

  • remove the conflicting class attributes/members from the namespace
  • instantiate the class object to create the slot descriptors
  • save references to the slot descriptors
  • put the previously removed members and their values back in the class __dict__ (so the dataclass machinery can find them)
  • pass the class object to the dataclass decorator
  • restore the slots descriptors to their respective places
  • also take into account plenty of corner cases (such as what to do if there is a __dict__ slot)

To say the least, this is an extremely complicated endeavor. It would be easier to define the class like the following- without a default value so that the conflict doesn't occur at all- and then add a default value afterward.

Current option: make alterations after class object instantiation

The unaltered dataclass would look like this:

@dataclass
class C:
    __slots__ = "x"
    x: int

The alteration is straightforward. Change the __init__ signature to reflect the desired default value, and then change the __dataclass_fields__ to reflect the presence of a default value.

from functools import wraps

def change_init_signature(init):
    @wraps(init)
    def __init__(self, x=1):
        init(self,x)
    return __init__

C.__init__ = change_init_signature(C.__init__)

C.__dataclass_fields__["x"].default = 1

Test:

>>> C()
C(x=1)
>>> C(2)
C(x=2)
>>> C.x
<member 'x' of 'C' objects>
>>> vars(C())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: vars() argument must have __dict__ attribute

It works!

Current option: a setmember decorator

With some effort, a so-called setmember decorator could be employed to automatically alter the class in the manner above. This would require deviating from the dataclasses API in order to define the default value in a location other than inside the class body, perhaps something like:

@setmember(x=field(default=1))
@dataclass
class C:
    __slots__="x"
    x: int

The same thing could also be accomplished through a __init_subclass__ method on a parent class:

class SlottedDataclass:
    def __init_subclass__(cls, **kwargs):
        cls.__init_subclass__()
        # make the class changes here

class C(SlottedDataclass, x=field(default=1)):
    __slots__ = "x"
    x: int

Future possibility: change the slots machinery

Another possibility, as mentioned above, would be for the python language to alter the slots machinery to allow more flexibility. One way of doing this might be to change the slots descriptor itself to store class level data at the time of class definition.

This could be done, perhaps, by supplying a dict as the __slots__ argument (see below). The class-level data (1 for x, 2 for y) could just be stored on the descriptor itself for retrieval later:

class C:
    __slots__ = {"x": 1, "y": 2}

assert C.x.value == 1
assert C.y.value == y

One difficulty: it may be desired to only have a slot_member.value present on some slots and not others. This could be accommodated by importing a null-slot factory from a new slottools library:

from slottools import nullslot

class C:
    __slots__ = {"x": 1, "y": 2, "z": nullslot()}

assert not hasattr(C.z, "value")

The style of code suggested above would be a deviation from the dataclasses API. However, the slots machinery itself could even be altered to allow for this style of code, with accommodation of the dataclasses API specifically in mind:

class C:
    __slots__ = "x", "y", "z"
    x = 1  # 1 is stored on C.x.value
    y = 2  # 2 is stored on C.y.value

assert C.x.value == 1
assert C.y.value == y
assert not hasattr(C.z, "value")

Future possibility: "prepare" the class namespace inside the class body

The other possibility is altering/preparing (synonymous with the __prepare__ method of a metaclass) the class namespace.

Currently, there is no opportunity (other than writing a metaclass) to write code that alters the class namespace before the class object is instantiated, and the slots machinery goes to work. This could be changed by creating a hook for preparing the class namespace beforehand, and making it so that an error complaining about the conflicting names is only produced after that hook has been run.

This so-called __prepare_slots__ hook could look something like this, which I think is not too bad:

from dataclasses import dataclass, prepare_slots

@dataclass
class C:
    __slots__ = ('x',)
    __prepare_slots__ = prepare_slots
    x: int = field(default=1)

The dataclasses.prepare_slots function would simply be a function-- similar to the __prepare__ method-- that receives the class namespace and alters it before the class is created. For this case in particular, the default dataclass field values would be stored in some other convenient place so that they can be retrieved after the slot descriptor objects have been created.


* Note that the default field value conflicting with the slot might also be created by the dataclass machinery if dataclasses.field is being used.

Solution 2

As noted already in the answers, data classes from dataclasses cannot generate slots for the simple reason that slots must be defined before a class is created.

In fact, the PEP for data classes explicitly mentions this:

At least for the initial release, __slots__ will not be supported. __slots__ needs to be added at class creation time. The Data Class decorator is called after the class is created, so in order to add __slots__ the decorator would have to create a new class, set __slots__, and return it. Because this behavior is somewhat surprising, the initial version of Data Classes will not support automatically setting __slots__.

I wanted to use slots because I needed to initialise many, many data class instances in another project. I ended up writing my own own alternative implementation of data classes which supports this, among a few extra features: dataclassy.

dataclassy uses a metaclass approach which has numerous advantages - it enables decorator inheritance, considerably reduced code complexity and of course, the generation of slots. With dataclassy the following is possible:

from dataclassy import dataclass

@dataclass(slots=True)
class Pet:
    name: str
    age: int
    species: str
    fluffy: bool = True

Printing Pet.__slots__ outputs the expected {'name', 'age', 'species', 'fluffy'}, instances have no __dict__ attribute and the overall memory footprint of the object is therefore lower. These observations indicate that __slots__ has been successfully generated and is effective. Plus, as evidenced, default values work just fine.

Solution 3

In Python 3.10+ you can use slots=True with a dataclass to make it more memory-efficient:

from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class Point:
    x: int = 0
    y: int = 0

This way you can set default field values as well.

Solution 4

The least involved solution I've found for this problem is to specify a custom __init__ using object.__setattr__ to assign values.

@dataclass(init=False, frozen=True)
class MyDataClass(object):
    __slots__ = (
        "required",
        "defaulted",
    )
    required: object
    defaulted: Optional[object]

    def __init__(
        self,
        required: object,
        defaulted: Optional[object] = None,
    ) -> None:
        super().__init__()
        object.__setattr__(self, "required", required)
        object.__setattr__(self, "defaulted", defaulted)

Solution 5

Another solution is to generate the slots parameter inside the class body, from the typed annotations. this can look like:

@dataclass
class Client:
    first: str
    last: str
    age_of_signup: int
    
     __slots__ = slots(__annotations__)

where the slots function is:

def slots(anotes: Dict[str, object]) -> FrozenSet[str]:
    return frozenset(anotes.keys())

running that would generate a slots parameter that looks like: frozenset({'first', 'last', 'age_of_signup})

This takes the annotations above it and makes a set of the specified names. The limitation here is you must re-type the __slots__ = slots(__annotations__) line for every class and it must be positioned below all the annotations and it does not work for annotations with default arguments. This also has the advantage that the slots parameter will never conflict with the specified annotations so you can feel free to add or remove members and not worry about maintaining sperate lists.

Share:
17,131

Related videos on Youtube

Rick
Author by

Rick

Civil Engineer

Updated on March 18, 2022

Comments

  • Rick
    Rick about 2 years

    It was decided to remove direct support for __slots__ from dataclasses for Python 3.7.

    Despite this, __slots__ can still be used with dataclasses:

    from dataclasses import dataclass
    
    @dataclass
    class C():
        __slots__ = "x"
        x: int
    

    However, because of the way __slots__ works it isn't possible to assign a default value to a dataclass field:

    from dataclasses import dataclass
    
    @dataclass
    class C():
        __slots__ = "x"
        x: int = 1
    

    This results in an error:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: 'x' in __slots__ conflicts with class variable
    

    How can __slots__ and default dataclass fields be made to work together?

  • user2357112
    user2357112 about 6 years
    When you define both a slot and a default for an attribute, you're not actually using the slot; you remove the generated slot descriptor to replace it with the default value, so instances have space allocated for a slot that cannot be accessed.
  • Rick
    Rick about 6 years
    @user2357112 I don't think that's right- the slot still can be accessed, it just can't be changed after the descriptor is removed. It is read-only at that point: C.slot=1, print(C().slot) works fine. Even C.slot, then C.slot=1, print(C().slot) works.
  • user2357112
    user2357112 about 6 years
    That's not accessing the slot. That's accessing the class variable you replaced the slot descriptor with.
  • Rick
    Rick about 6 years
    @user2357112 I see. You're right. But in actual usage it certainly appears as if the slot is being used- but you are correct, answer needs editing.
  • Rick
    Rick about 6 years
    @user2357112 Interesting: even if the slot is truly gone, you are still prevented adding the slot to the instance: del C.slot, C.slot=1, C().slot=2 <- ERROR -- there's not a lot of practical difference between this result and my incorrect interpretation of what was happening
  • user2357112
    user2357112 about 6 years
    I believe that currently, to get both the default value and the slot to work correctly, your metaclass would have to perform a very delicate shuffle: remove the defaults, call super().__new__, replace the defaults (while saving the slot descriptors), call dataclass, and finally replace the slot descriptors. This seems very fragile and bug-prone.
  • Rick
    Rick about 6 years
    @user2357112 perhaps. the dataclasses module seems to be taking care of the last stop on its own just fine.
  • Rick
    Rick about 6 years
    @user2357112 not that it is replacing the descriptors, but it doesn't need to; dataclasses works with descriptors already. nothing special is needed.
  • user2357112
    user2357112 about 6 years
    That's not right. With your C1, it uses the __dict__ instead of the slots, because it can't find the slots (except for x). With C2, there is no __dict__, so it can't handle instance attribute assignment at all.
  • Rick
    Rick about 6 years
    @user2357112 I see- yup, z should not be in instance dict in C1. so as you said: the dataclass defaults need to be replaced with the slots descriptors. hmmm.
  • Rick
    Rick about 6 years
    @user2357112 Updated with a much simpler approach.
  • Anonymouse
    Anonymouse over 4 years
    I've followed your suggestion and created slotted_dataclass descriptor, but since the code would be too long for a comment, I did it in a new answer — I hope that's OK with you.
  • Rick
    Rick over 4 years
    the real bummer here of course is it deviates from the dataclasses API, but the very nice API is a big part of the reason why i like dataclasses so much in the first place.
  • Anonymouse
    Anonymouse over 4 years
    Why is it wrong to "deviate from the dataclasses API" in this case, in your opinion?
  • Rick
    Rick over 4 years
    It's not wrong necessarily, it's just that the API is very nice to use. Also, here's an add_slots utility function for people who want to do this: add_slots
  • Rick
    Rick over 4 years
    Such a bummer that the easiest method requires writing the init yourself, this defeating one if the biggest wins dataclasses brings to the table!
  • mcguip
    mcguip over 4 years
    Yes, I'd definitely agree. Beyond that, you're specifying the attribute names as strings which are easily missed by ide's. All the existing solutions I've seem ugly/error prone in some way...
  • Rick
    Rick almost 4 years
    Huh. Who knew that putting slots after the class members would make such a difference. I'll have to try this later.
  • Rick
    Rick almost 4 years
    However it's still broken if you assign anything to the slot using the dataclasses API.
  • WieeRd
    WieeRd almost 3 years
    But why convert it to frozenset instead of just using __annotations__.keys()?
  • Jules G.M.
    Jules G.M. about 2 years
    Note that, as others have mentioned, this is not true anymore