How can I memoize a class instantiation in Python?
Solution 1
Let us see two points about your question.
Using memoize
You can use memoization, but you should decorate the class, not the __init__
method. Suppose we have this memoizator:
def get_id_tuple(f, args, kwargs, mark=object()):
"""
Some quick'n'dirty way to generate a unique key for an specific call.
"""
l = [id(f)]
for arg in args:
l.append(id(arg))
l.append(id(mark))
for k, v in kwargs:
l.append(k)
l.append(id(v))
return tuple(l)
_memoized = {}
def memoize(f):
"""
Some basic memoizer
"""
def memoized(*args, **kwargs):
key = get_id_tuple(f, args, kwargs)
if key not in _memoized:
_memoized[key] = f(*args, **kwargs)
return _memoized[key]
return memoized
Now you just need to decorate the class:
@memoize
class Test(object):
def __init__(self, somevalue):
self.somevalue = somevalue
Let us see a test?
tests = [Test(1), Test(2), Test(3), Test(2), Test(4)]
for test in tests:
print test.somevalue, id(test)
The output is below. Note that the same parameters yield the same id of the returned object:
1 3072319660
2 3072319692
3 3072319724
2 3072319692
4 3072319756
Anyway, I would prefer to create a function to generate the objects and memoize it. Seems cleaner to me, but it may be some irrelevant pet peeve:
class Test(object):
def __init__(self, somevalue):
self.somevalue = somevalue
@memoize
def get_test_from_value(somevalue):
return Test(somevalue)
Using __new__
:
Or, of course, you can override __new__
. Some days ago I posted an answer about the ins, outs and best practices of overriding __new__
that can be helpful. Basically, it says to always pass *args, **kwargs
to your __new__
method.
I, for one, would prefer to memoize a function which creates the objects, or even write a specific function which would take care of never recreating a object to the same parameter. Of course, however, this is mostly a opinion of mine, not a rule.
Solution 2
The solution that I ended up using is this:
class memoize(object):
def __init__(self, cls):
self.cls = cls
self.__dict__.update(cls.__dict__)
# This bit allows staticmethods to work as you would expect.
for attr, val in cls.__dict__.items():
if type(val) is staticmethod:
self.__dict__[attr] = val.__func__
def __call__(self, *args):
key = '//'.join(map(str, args))
if key not in self.cls.instances:
self.cls.instances[key] = self.cls(*args)
return self.cls.instances[key]
And then you decorate the class with this, not __init__
. Although brandizzi provided me with that key piece of information, his example decorator didn't function as desired.
I found this concept quite subtle, but basically when you're using decorators in Python, you need to understand that the thing that gets decorated (whether it's a method or a class) is actually replaced by the decorator itself. So for example when I'd try to access Photograph.instances
or Camera.generate_id()
(a staticmethod), I couldn't actually access them because Photograph
doesn't actually refer to the original Photograph class, it refers to the memoized
function (from brandizzi's example).
To get around this, I had to create a decorator class that actually took all the attributes and static methods from the decorated class and exposed them as it's own. Almost like a subclass, except that the decorator class doesn't know ahead of time what classes it will be decorating, so it has to copy the attributes over after the fact.
The end result is that any instance of the memoize
class becomes an almost transparent wrapper around the actual class that it has decorated, with the exception that attempting to instantiate it (but really calling it) will provide you with cached copies when they're available.
Solution 3
The parameters to __new__
also get passed to __init__
, so:
def __init__(self, flubid):
...
You need to accept the flubid
argument there, even if you don't use it in __init__
Here is the relevant comment taken from typeobject.c in Python2.7.3
/* You may wonder why object.__new__() only complains about arguments
when object.__init__() is not overridden, and vice versa.
Consider the use cases:
1. When neither is overridden, we want to hear complaints about
excess (i.e., any) arguments, since their presence could
indicate there's a bug.
2. When defining an Immutable type, we are likely to override only
__new__(), since __init__() is called too late to initialize an
Immutable object. Since __new__() defines the signature for the
type, it would be a pain to have to override __init__() just to
stop it from complaining about excess arguments.
3. When defining a Mutable type, we are likely to override only
__init__(). So here the converse reasoning applies: we don't
want to have to override __new__() just to stop it from
complaining.
4. When __init__() is overridden, and the subclass __init__() calls
object.__init__(), the latter should complain about excess
arguments; ditto for __new__().
Use cases 2 and 3 make it unattractive to unconditionally check for
excess arguments. The best solution that addresses all four use
cases is as follows: __init__() complains about excess arguments
unless __new__() is overridden and __init__() is not overridden
(IOW, if __init__() is overridden or __new__() is not overridden);
symmetrically, __new__() complains about excess arguments unless
__init__() is overridden and __new__() is not overridden
(IOW, if __new__() is overridden or __init__() is not overridden).
However, for backwards compatibility, this breaks too much code.
Therefore, in 2.6, we'll *warn* about excess arguments when both
methods are overridden; for all other cases we'll use the above
rules.
*/
Related videos on Youtube
Comments
-
robru over 1 year
Ok, here is the real world scenario: I'm writing an application, and I have a class that represents a certain type of files (in my case this is photographs but that detail is irrelevant to the problem). Each instance of the Photograph class should be unique to the photo's filename.
The problem is, when a user tells my application to load a file, I need to be able to identify when files are already loaded, and use the existing instance for that filename, rather than create duplicate instances on the same filename.
To me this seems like a good situation to use memoization, and there's a lot of examples of that out there, but in this case I'm not just memoizing an ordinary function, I need to be memoizing
__init__()
. This poses a problem, because by the time__init__()
gets called it's already too late as there's a new instance created already.In my research I found Python's
__new__()
method, and I was actually able to write a working trivial example, but it fell apart when I tried to use it on my real-world objects, and I'm not sure why (the only thing I can think of is that my real world objects were subclasses of other objects that I can't really control, and so there were some incompatibilities with this approach). This is what I had:class Flub(object): instances = {} def __new__(cls, flubid): try: self = Flub.instances[flubid] except KeyError: self = Flub.instances[flubid] = super(Flub, cls).__new__(cls) print 'making a new one!' self.flubid = flubid print id(self) return self @staticmethod def destroy_all(): for flub in Flub.instances.values(): print 'killing', flub a = Flub('foo') b = Flub('foo') c = Flub('bar') print a print b print c print a is b, b is c Flub.destroy_all()
Which output this:
making a new one! 139958663753808 139958663753808 making a new one! 139958663753872 <__main__.Flub object at 0x7f4aaa6fb050> <__main__.Flub object at 0x7f4aaa6fb050> <__main__.Flub object at 0x7f4aaa6fb090> True False killing <__main__.Flub object at 0x7f4aaa6fb050> killing <__main__.Flub object at 0x7f4aaa6fb090>
It's perfect! Only two instances were made for the two unique id's given, and Flub.instances clearly only has two listed.
But when I tried to take this approach with the objects I was using, I got all kinds of nonsensical errors about how
__init__()
took only 0 arguments, not 2. So I'd change some things around and then it would tell me that__init__()
needed an argument. Totally bizarre.After a while of fighting with it, I basically just gave up and moved all the
__new__()
black magic into a staticmethod calledget
, such that I could callPhotograph.get(filename)
and it would only callPhotograph(filename)
if filename wasn't already inPhotograph.instances
.Does anybody know where I went wrong here? Is there some better way to do this?
Another way of thinking about it is that it's similar to a singleton, except it's not globally singleton, just singleton-per-filename.
Here's my real-world code using the staticmethod get if you want to see it all together.
-
robru almost 12 yearsI have edited the question to remove those things you said.
-
-
robru almost 12 yearsWhat you say makes sense, but how does my trivial example work without defining
__init__
at all? Shouldn't it also give me errors about incorrect number of arguments passed? -
John La Rooy almost 12 years@Robru, I updated my answer with the explanation given in
typeobject.c
-
robru almost 12 yearsThanks. I didn't realize that you could put the decorator directly on the class instead of on the methods. That was the key bit of info that I was missing. Your memoize decorator isn't quite what I need because strings aren't singletons like numbers are (and therefore
id
s aren't unique from one identical string to another), but for my simplified needs I was able to just use the first argument directly as the key. -
brandizzi almost 12 years@Robru surely my memoize is just some quick code I use in examples, do not pay much attention to it :)
-
robru almost 12 yearsOf course, after an hour of finessing your memoize decorator to work with my particular configuration of classes, it occurs to me that this solution is not actually going to work because I have a number of methods and functions that iterate over
ClassName.instances
dict in order to do operations on all loaded instances, and this particular memoization technique jumbles all the different instances of different classes into a single dict. It looks like I am going to have to go with__new__
after all. -
robru almost 12 yearsAfter a few more hours of fiddling, I gave up on
__new__
and went back to the decorator. I got it to work exactly as I wanted, including functional staticmethods! (decorators break staticmethods by default because the original class hides behind the decorator object). Solution here: github.com/robru/gottengeography/blob/… -
scooterman almost 11 yearsbeware that when a input parameter is string or unicode, id('string') is not guaranteed to be unique. You should use it's hash instead.
-
CrazyCasta over 8 years@Robru Not all equal ints are guaranteed to have equal id's either. I don't recall the specifics, but if you check out
10**10
vs100**5
, you'll find they're equal but don't share the same id. iirc there's some maximum number beyond which python stops retrieving the existing object for ints. -
CrazyCasta over 8 years@brandizzi You're not being paranoid about not decorating classes. For one thing, unless I'm missing something, you can't extend the decorated class.
-
CrazyCasta over 8 years@Robru P.S. According to PyInt_FromLong docs.python.org/2/c-api/int.html#c.PyInt_FromLong the only values that keep the property id(a)==id(b) if a==b are from -5 to 256. I've tested 257, and it does indeed have different ids if you instantiate it multiple times.
-
Carl Meyer over 7 yearsAnother limitation of the decorate-the-class approach is that your "class" object is no longer actually the class, it's a wrapper function returned by your decorator. For normal use (calling it to instantiate an object) that's fine, but if you try to use it with e.g.
isinstance
orissubclass
or any kind of introspection, it will have unexpected results. -
Lars Ericson almost 6 yearsWhen you say "Anyway, I would prefer to create a function to generate the objects and memoize it.", isn't that function get_test_from_value exactly a factory? So we do like factories?
-
MarcTheSpark over 5 yearsThis was very helpful to me. I'll just add that my use case involved classmethods as well and therefore required adding these lines after the staticmethod check:
if type(val) is classmethod: self.__dict__[attr] = functools.partial(val.__func__, cls)