Is it possible to override __new__ in an enum to parse strings to an instance?

11,169

Solution 1

Yes, you can override the __new__() method of an enum subclass to implement a parse method if you're careful, but in order to avoid specifying the integer encoding in two places, you'll need to define the method separately, after the class, so you can reference the symbolic names defined by the enumeration.

Here's what I mean:

import enum

class Types(enum.Enum):
    Unknown = 0
    Source = 1
    NetList = 2

    def __str__(self):
        if (self == Types.Unknown):     return "??"
        elif (self == Types.Source):    return "src"
        elif (self == Types.NetList):   return "nl"
        else:                           raise TypeError(self)

def _Types_parser(cls, value):
    if not isinstance(value, str):
        # forward call to Types' superclass (enum.Enum)
        return super(Types, cls).__new__(cls, value)
    else:
        # map strings to enum values, default to Unknown
        return { 'nl': Types.NetList,
                'ntl': Types.NetList,  # alias
                'src': Types.Source,}.get(value, Types.Unknown)

setattr(Types, '__new__', _Types_parser)


if __name__ == '__main__':

    print("Types('nl') ->",  Types('nl'))   # Types('nl') -> nl
    print("Types('ntl') ->", Types('ntl'))  # Types('ntl') -> nl
    print("Types('wtf') ->", Types('wtf'))  # Types('wtf') -> ??
    print("Types(1) ->",     Types(1))      # Types(1) -> src

Update

Here's a more table-driven version that eliminates some of the repetitious coding that would otherwise be involved:

from collections import OrderedDict
import enum

class Types(enum.Enum):
    Unknown = 0
    Source = 1
    NetList = 2
    __str__ = lambda self: Types._value_to_str.get(self)

# Define after Types class.
Types.__new__ = lambda cls, value: (cls._str_to_value.get(value, Types.Unknown)
                                        if isinstance(value, str) else
                                    super(Types, cls).__new__(cls, value))

# Define look-up table and its inverse.
Types._str_to_value = OrderedDict((( '??', Types.Unknown),
                                   ('src', Types.Source),
                                   ('ntl', Types.NetList),  # alias
                                   ( 'nl', Types.NetList),))
Types._value_to_str = {val: key for key, val in Types._str_to_value.items()}


if __name__ == '__main__':

    print("Types('nl')  ->", Types('nl'))   # Types('nl')  -> nl
    print("Types('ntl') ->", Types('ntl'))  # Types('ntl') -> nl
    print("Types('wtf') ->", Types('wtf'))  # Types('wtf') -> ??
    print("Types(1)     ->", Types(1))      # Types(1)     -> src

    print(list(Types))  # -> [<Types.Unknown: 0>, <Types.Source: 1>, <Types.NetList: 2>]

    import pickle  # Demostrate picklability
    print(pickle.loads(pickle.dumps(Types.NetList)) == Types.NetList)  # -> True

Note that in Python 3.7+ regular dictionaries are ordered, so the use of OrderedDict in the code above would not be needed and it could be simplified to just:

# Define look-up table and its inverse.
Types._str_to_value = {'??': Types.Unknown,
                       'src': Types.Source,
                       'ntl': Types.NetList,  # alias
                       'nl': Types.NetList}
Types._value_to_str = {val: key for key, val in Types._str_to_value.items()}

Solution 2

The __new__ method on the your enum.Enum type is used for creating new instances of the enum values, so the Types.Unknown, Types.Source, etc. singleton instances. The enum call (e.g. Types('nl') is handled by EnumMeta.__call__, which you could subclass.

Using name aliases fits your usecases

Overriding __call__ is perhaps overkill for this situation. Instead, you can easily use name aliases:

class Types(enum.Enum):
    Unknown = 0

    Source = 1
    src = 1

    NetList = 2
    nl = 2

Here Types.nl is an alias and will return the same object as Types.Netlist. You then access members by names (using Types[..] index access); so Types['nl'] works and returns Types.Netlist.

Your assertion that it won't be possible to iterate the enum's values alias free is incorrect. Iteration explicitly doesn't include aliases:

Iterating over the members of an enum does not provide the aliases

Aliases are part of the Enum.__members__ ordered dictionary, if you still need access to these.

A demo:

>>> import enum
>>> class Types(enum.Enum):
...     Unknown = 0
...     Source = 1
...     src = 1
...     NetList = 2
...     nl = 2
...     def __str__(self):
...         if self is Types.Unknown: return '??'
...         if self is Types.Source:  return 'src'
...         if self is Types.Netlist: return 'nl'
... 
>>> list(Types)
[<Types.Unknown: 0>, <Types.Source: 1>, <Types.NetList: 2>]
>>> list(Types.__members__)
['Unknown', 'Source', 'src', 'NetList', 'nl']
>>> Types.Source
<Types.Source: 1>
>>> str(Types.Source)
'src'
>>> Types.src
<Types.Source: 1>
>>> str(Types.src)
'src'
>>> Types['src']
<Types.Source: 1>
>>> Types.Source is Types.src
True

The only thing missing here is translating unknown schemas to Types.Unknown; I'd use exception handling for that:

try:
    scheme = Types[scheme]
except KeyError:
    scheme = Types.Unknown

Overriding __call__

If you want to treat your strings as values, and use calling instead of item access, this is how you override the __call__ method of the metaclass:

class TypesEnumMeta(enum.EnumMeta):
    def __call__(cls, value, *args, **kw):
        if isinstance(value, str):
            # map strings to enum values, defaults to Unknown
            value = {'nl': 2, 'src': 1}.get(value, 0)
        return super().__call__(value, *args, **kw)

class Types(enum.Enum, metaclass=TypesEnumMeta):
    Unknown = 0
    Source = 1
    NetList = 2

Demo:

>>> class TypesEnumMeta(enum.EnumMeta):
...     def __call__(cls, value, *args, **kw):
...         if isinstance(value, str):
...             value = {'nl': 2, 'src': 1}.get(value, 0)
...         return super().__call__(value, *args, **kw)
... 
>>> class Types(enum.Enum, metaclass=TypesEnumMeta):
...     Unknown = 0
...     Source = 1
...     NetList = 2
... 
>>> Types('nl')
<Types.NetList: 2>
>>> Types('?????')
<Types.Unknown: 0>

Note that we translate the string value to integers here and leave the rest to the original Enum logic.

Fully supporting value aliases

So, enum.Enum supports name aliases, you appear to want value aliases. Overriding __call__ can offer a facsimile, but we can do better than than still by putting the definition of the value aliases into the enum class itself. What if specifying duplicate names gave you value aliases, for example?

You'll have to provide a subclass of the enum._EnumDict too as it is that class that prevents names from being re-used. We'll assume that the first enum value is a default:

class ValueAliasEnumDict(enum._EnumDict):
     def __init__(self):
        super().__init__()
        self._value_aliases = {}

     def __setitem__(self, key, value):
        if key in self:
            # register a value alias
            self._value_aliases[value] = self[key]
        else:
            super().__setitem__(key, value)

class ValueAliasEnumMeta(enum.EnumMeta):
    @classmethod
    def __prepare__(metacls, cls, bases):
        return ValueAliasEnumDict()

    def __new__(metacls, cls, bases, classdict):
        enum_class = super().__new__(metacls, cls, bases, classdict)
        enum_class._value_aliases_ = classdict._value_aliases
        return enum_class

    def __call__(cls, value, *args, **kw):
        if value not in cls. _value2member_map_:
            value = cls._value_aliases_.get(value, next(iter(Types)).value)
        return super().__call__(value, *args, **kw)

This then lets you define aliases and a default in the enum class:

class Types(enum.Enum, metaclass=ValueAliasEnumMeta):
    Unknown = 0

    Source = 1
    Source = 'src'

    NetList = 2
    NetList = 'nl'

Demo:

>>> class Types(enum.Enum, metaclass=ValueAliasEnumMeta):
...     Unknown = 0
...     Source = 1
...     Source = 'src'
...     NetList = 2
...     NetList = 'nl'
... 
>>> Types.Source
<Types.Source: 1>
>>> Types('src')
<Types.Source: 1>
>>> Types('?????')
<Types.Unknown: 0>

Solution 3

Is it possible to override __new__ in a python enum to parse strings to an instance?

In a word, yes. As martineau illustrates you can replace the __new__ method after the class has been instanciated (his original code):

class Types(enum.Enum):
    Unknown = 0
    Source = 1
    NetList = 2
    def __str__(self):
        if (self == Types.Unknown):     return "??"
        elif (self == Types.Source):    return "src"
        elif (self == Types.NetList):   return "nl"
        else:                           raise TypeError(self) # completely unnecessary

def _Types_parser(cls, value):
    if not isinstance(value, str):
        raise TypeError(value)
    else:
        # map strings to enum values, default to Unknown
        return { 'nl': Types.NetList,
                'ntl': Types.NetList,  # alias
                'src': Types.Source,}.get(value, Types.Unknown)

setattr(Types, '__new__', _Types_parser)

and also as his demo code illustrates, if you are not extremely careful you will break other things such as pickling, and even basic member-by-value lookup:

--> print("Types(1) ->", Types(1))  # doesn't work
Traceback (most recent call last):
  ...
TypeError: 1
--> import pickle
--> pickle.loads(pickle.dumps(Types.NetList))
Traceback (most recent call last):
  ...
TypeError: 2

Martijn showed is a clever way of enhancing EnumMeta to get what we want:

class TypesEnumMeta(enum.EnumMeta):
    def __call__(cls, value, *args, **kw):
        if isinstance(value, str):
            # map strings to enum values, defaults to Unknown
            value = {'nl': 2, 'src': 1}.get(value, 0)
        return super().__call__(value, *args, **kw)

class Types(enum.Enum, metaclass=TypesEnumMeta):
    ...

but this puts us having duplicate code, and working against the Enum type.

The only thing lacking in basic Enum support for your use-case is the ability to have one member be the default, but even that can be handled gracefully in a normal Enum subclass by creating a new class method.

The class that you want is:

class Types(enum.Enum):
    Unknown = 0
    Source = 1
    src = 1
    NetList = 2
    nl = 2
    def __str__(self):
        if self is Types.Unknown:
            return "??"
        elif self is Types.Source:
            return "src"
        elif self is Types.NetList:
            return "nl"
    @classmethod
    def get(cls, name):
        try:
            return cls[name]
        except KeyError:
            return cls.Unknown

and in action:

--> for obj in Types:
...   print(obj)
... 
??
src
nl

--> Types.get('PoC')
<Types.Unknown: 0>

If you really need value aliases, even that can be handled without resorting to metaclass hacking:

class Types(Enum):
    Unknown = 0, 
    Source  = 1, 'src'
    NetList = 2, 'nl'
    def __new__(cls, int_value, *value_aliases):
        obj = object.__new__(cls)
        obj._value_ = int_value
        for alias in value_aliases:
            cls._value2member_map_[alias] = obj
        return obj

print(list(Types))
print(Types(1))
print(Types('src'))

which gives us:

[<Types.Unknown: 0>, <Types.Source: 1>, <Types.NetList: 2>]
Types.Source
Types.Source

Solution 4

I think the by far easiest solution to your problem is to use the functional API of the Enum class which gives more freedom when it comes to choosing names since we specify them as strings:

from enum import Enum

Types = Enum(
    value='Types',
    names=[
        ('??', 0),
        ('Unknown', 0),
        ('src', 1),
        ('Source', 1),
        ('nl', 2),
        ('NetList', 2),
    ]
)

This creates an enum with name aliases. Mind the order of the entries in the names list. The first one will be chosen as default value (and also returned for name), further ones are considered as aliases but both can be used:

>>> Types.src
<Types.src: 1>
>>> Types.Source
<Types.src: 1>

To use the name property as a return value for str(Types.src) we replace the default version from Enum:

>>> Types.__str__ = lambda self: self.name
>>> Types.__format__ = lambda self, _: self.name
>>> str(Types.Unknown)
'??'
>>> '{}'.format(Types.Source)
'src'
>>> Types['src']
<Types.src: 1>

Note that we also replace the __format__ method which is called by str.format().

Solution 5

I don't have enough rep to comment on the accepted answer, but in Python 2.7 with the enum34 package the following error occurs at run-time:

"unbound method <lambda>() must be called with instance MyEnum as first argument (got EnumMeta instance instead)"

I was able to correct this by changing:

# define after Types class
Types.__new__ = lambda cls, value: (cls._str_to_value.get(value, Types.Unknown)
                                    if isinstance(value, str) else
                                    super(Types, cls).__new__(cls, value))

to the following, wrapping the lambda in with staticmethod():

# define after Types class
Types.__new__ = staticmethod(
    lambda cls, value: (cls._str_to_value.get(value, Types.Unknown)
                        if isinstance(value, str) else
                        super(Types, cls).__new__(cls, value)))

This code tested correctly in both Python 2.7 and 3.6.

Share:
11,169
Paebbels
Author by

Paebbels

Patrick Lehmann studied computer science at Technische Universität Dresden, Germany. His professional career already found its foundation here when he was already teaching as a tutor computer engineering and computer architecture Later on, he specialized in digital design, FPGA technology, and high-speed communication solutions like Serial-ATA, Gigabit Ethernet, or PCI Express. He was sharing his gained knowledge in labs, research articles, and on social platforms. The focus of his research work is was on in-memory database systems, the Serial-ATA protocol implementation, and embedding FPGAs into a Cloud infrastructure. Since 2017, Patrick Lehmann is working for PLC2 GmbH as a instructor in the topics of VHDL, OSVVM, FPGA technology as well as high-speed communication. As a consultant and “fire fighter” he helps critical projects to bring on track. In cooperation with PLC2 Design GmbH, he is a senior system architect for FPGA-based solutions, team leader in FPGA design projects, and technical project advisor. Mr. Lehmann is one of the developers and maintainers of the PoC-Library, a platform and vendor independent open source IP core library. He is a contributor to the GHDL project as well, a free VHDL simulator and synthesis tool. In 2016, he started an initiative called "Open Source VHDL Group", whose aim is a free collection of VHDL packages. As a maintainer of more than 40 open source repositories and owner of several GitHub organizations, he is one of circa 20 people worldwide driving the open source community for EDA related tooling. Furthermore, Mr. Lehmann is very active in the IEEE P1076 "VHDL Analysis and Standardization Group" since 2014. He detailed and wrote major parts of the language changes for the current VHDL-2019 revision. In 2017, he became an IEEE Standards Association member and was announced vice-chair of the IEEE P1076 working group. He managed to register VHDL as an open source pilot project. In cooperation with IEEE SA, the working group successfully publishes all VHDL language packages as the first open-source standard in the history of IEEE. In 2021, the IEEE P1076 working group for VHDL-2025 was approved. One of its main goals is to create a new collaborative and open-source publishing flow at IEEE SA, so the whole standard might become open source.

Updated on June 07, 2022

Comments

  • Paebbels
    Paebbels almost 2 years

    I want to parse strings into python enums. Normally one would implement a parse method to do so. A few days ago I spotted the __new__ method which is capable of returning different instances based on a given parameter.

    Here my code, which will not work:

    import enum
    class Types(enum.Enum):
      Unknown = 0
      Source = 1
      NetList = 2
    
      def __new__(cls, value):
        if (value == "src"):  return Types.Source
    #    elif (value == "nl"): return Types.NetList
    #    else:                 raise Exception()
    
      def __str__(self):
        if (self == Types.Unknown):     return "??"
        elif (self == Types.Source):    return "src"
        elif (self == Types.NetList):   return "nl"
    

    When I execute my Python script, I get this message:

    [...]
      class Types(enum.Enum):
    File "C:\Program Files\Python\Python 3.4.0\lib\enum.py", line 154, in __new__
      enum_member._value_ = member_type(*args)
    TypeError: object() takes no parameters
    

    How can I return a proper instance of a enum value?

    Edit 1:

    This Enum is used in URI parsing, in particular for parsing the schema. So my URI would look like this

    nl:PoC.common.config
    <schema>:<namespace>[.<subnamespace>*].entity
    

    So after a simple string.split operation I would pass the first part of the URI to the enum creation.

    type = Types(splitList[0])
    

    type should now contain a value of the enum Types with 3 possible values (Unknown, Source, NetList)

    If I would allow aliases in the enum's member list, it won't be possible to iterate the enum's values alias free.