Descriptors

About Me

Hi, I’m Simeon Franklin.

Technical instructor at Twitter teaching Python + other stuff to the #flock.

Find me @simeonfranklin or http://simeonfranklin.com/

What are descriptors and why do I care?

Great! Show me some descriptor magic!

images/wand.gif

Classes, objects, and attributes

Everybody already knows about classes and objects, right?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> class Circle(object):
...     PI = 3.14
...     def __init__(self, radius):
...         self.radius = radius
...
>>> mycircle = Circle(2)
>>> mycircle.radius
2
>>> mycircle.PI
3.14

How about object attributes vs class attributes?

Object Attribute Access

When you access an attribute of an object like mycircle.radius you are actually getting back a value stored in a dict on the object.

>>> mycircle.__dict__
{'radius': 2}

Class attribute access

But of course you can fall back to class level attributes which are stored in a dict on the class.

1
2
3
4
5
6
>>> Circle.PI
3.14
>>> Circle.__dict__
dict_proxy({...'PI': 3.14...})
>>> mycircle.PI
3.14
Well - dict -like thing at least. dict_proxy is used by Python where you need a dict but don’t want to allow modifications. You can use this yourself in Python 3.3 with collections.MappingView

Just Three Simple Rules!

We can build some rules to model our understanding so far.

Accessing an attribute on an object like obj.foo gets you:

  1. the corresponding value in obj.__dict__ if it exists

  2. or else it falls back to look in the type(obj).__dict__

  3. And assignment always creates an entry in obj.__dict__.

Plus inheritance

Adding inheritance to the mix just means paying attention to the mro.

1
2
3
4
5
6
7
8
>>> class Widget(object):
...     copyright = "Witrett, inc."
...
>>> class Circle(Widget):
...     PI = 3.14
...     def __init__(self, radius):
...         self.radius = radius
...
>>> mycircle = Circle(2)
>>> type(mycircle).mro()
[<class '__main__.Circle'>, <class '__main__.Widget'>, <type 'object'>]
>>> mycircle.copyright
'Witrett, inc.'

Got it?

Three Four Simple Rules

Let’s update our rules:

Accessing an attribute on an object like obj.foo gets you:

  1. the corresponding value in obj.__dict__ if it exists

  2. or else it falls back to look in the type(obj).__dict__ on the class

  3. repeating for each type in the mro until it finds a match

  4. And assignment always creates an entry in obj.__dict__.

One more thing

then we’ll get to descriptors

Sometimes attributes aren’t enough.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> class Circle(Widget):
...     PI = 3.14
...     def __init__(self, radius):
...         self.radius = radius
...         self.circumference = 2 * radius * self.PI
...
>>> mycircle = Circle(2)
>>> mycircle.radius = 3
>>> mycircle.circumference # Whoops!
12.56

Classic OOP mistake - now I’ve got a broken class!

Yeah I’m stealing from Raymond Hettinger

See his PyCon 2013 talk @ http://pyvideo.org/video/1779/pythons-class-development-toolkit

Steal from best, right?

@property to the rescue!

Everybody knows how to fix this:

1
2
3
4
5
6
7
8
>>> class Circle(Widget):
...     PI = 3.14
...     def __init__(self, radius):
...         self.radius = radius
...     @property
...     def circumference(self):
...         return 2 * self.radius * self.PI
...
>>> mycircle = Circle(2)
>>> mycircle.radius = 3
>>> mycircle.circumference # Fixed!
18.84

We can add getters and setters to our class while maintaining what looks like simple attribute access.

You gotta love properties.

I love making Java-istas Java-ers people forced to write Java envious with this feature.

Let’s review our attribute access rules.

Five Six Simple Rules?

Accessing an attribute on an object like obj.foo gets you:

  1. The result of the property of the same name if it is defined

  2. Or the corresponding value in obj.__dict__ if it exists

  3. or else it falls back to look in the type(obj).__dict__

  4. repeating for each type in the mro until it finds a match

  5. And assignment always creates an entry in obj.__dict__.

  6. Unless there was a setter property in which case you’re calling a function.

Rule #1.

Rule #1 is really:

Accessing an attribute on an object like obj.foo gets you:

  1. the result of the __get__ method of the data descriptor of the same name attached to the class if it exists

What’s a data descriptor?

Heck - what’s a descriptor?

images/diver.jpg

All Clear?

We’ll look at the implementation and signature of the methods in a moment…

but first… The Descriptor Protocol!

The Descriptor Protocol

or as we’ve been calling it: Rule #1.

And a new Rule #3.

Plus a few more details…

Six Seven Simple Rules?

Accessing an attribute on an object like obj.foo gets you:

  1. The result of the __get__ method of the data descriptor of the same name attached to the class if it exists

  2. Or the corresponding value in obj.__dict__ if it exists

  3. Or the result of the of the __get__ method of the non-data descriptor of the same name on the class

  4. or else it falls back to look in the type(obj).__dict__

  5. repeating for each type in the mro until it finds a match.

  6. And assignment always creates an entry in obj.__dict__.

  7. Unless there was a setter property (which we now know is a descriptor) in which case you’re calling a function.

Who knew

simple attribute access could be so complicated?

This is the most complicated thing ever!

Maybe not!

images/8-rules.jpg

Writing Descriptors

The signature of __get__, __set__ and __del__ are fixed.

descr.__get__(self, obj, type=None) --> value

descr.__set__(self, obj, value) --> None

descr.__delete__(self, obj) --> None

We’ll ignore __del__ for now.

Who wants to delete attributes anyways?

__get__ and __set__

Descriptors look weird - they’re attached to the class and the methods have a funky signature.

1
2
3
4
5
6
7
8
9
>>> class MyDescriptor(object):
...     def __get__(self, obj, type):
...         print self, obj, type
...     def __set__(self, obj, val):
...         print "Got %s" % val
...
>>> class MyClass(object):
...     x = MyDescriptor() # Attached at class definition time!
...

But they allow us to simulate attribute access with functions instead.

>>> obj = MyClass()
>>> obj.x # a function call is hiding here
<...MyDescriptor object ...> <....MyClass object ...> <class '__main__.MyClass'>
>>>
>>> MyClass.x # and here!
<...MyDescriptor object ...> None <class '__main__.MyClass'>
>>>
>>> obj.x = 4 # and here
Got 4

Method signature details:

self and type are both provided on object attribute access, only type is provided on class attribute access.
Why doesn’t MyClass.x = 5 call the __set__ method of the descriptor?

Ok, let’s do something useful

We could store values in the descriptor itself. But watch out!

What’s wrong with this code?

1
2
3
4
5
6
>>> class MyDescriptor(object):
...     def __get__(self, obj, type):
...         return self.data
...     def __set__(self, obj, val):
...         self.data = val
...

Whoops! We just re-implemented a class level attribute!

1
2
3
4
5
6
7
8
>>> class MyClass(object):
...     val = MyDescriptor()
...
>>> obj1 = MyClass()
>>> obj1.val = 10
>>> obj2 = MyClass()
>>> obj2.val
10

Try again

Possible strategies:

Storing on self

We know we can’t use the same field name for all the pieces of data.

We have to vary by the instance.

Another classic pitfall

1
2
3
4
5
6
7
8
>>> class MyDescriptor(object):
...     def __init__(self):
...         self.data = {}
...     def __get__(self, obj, type):
...         return self.data[obj]
...     def __set__(self, obj, val):
...         self.data[obj] = val
...

This works!

But now every instance of any given class the descriptor will be attached to has an extra reference stored in the descriptor’s data dict.

So much for garbage collection.

Weak-references to the rescue!

Go read PEP 205 and then:

1
2
3
4
5
6
7
8
9
>>> from weakref import WeakKeyDictionary
>>> class MyDescriptor(object):
...     def __init__(self):
...         self.data = WeakKeyDictionary()
...     def __get__(self, obj, type):
...         return self.data.get(obj)
...     def __set__(self, obj, val):
...         self.data[obj] = val
...

Kinda sorta

This solves the reference problem… but not everything can weakref’ed.

In particular weakrefs and the use of slots to optimize your class are incompatible and your type must inherit from a type that is weakref-able.

Of course the type must be hashable to be used as a dict key. That means inheriting from mutable types like list or dict won’t work with this solution.

What about storing values on the object itself?

Problem is - we don’t know the name of the attribute our descriptor is stored under.

val = MyDescriptor()

The descriptor constructor can’t know about "val" yet.

So sometimes we see just a little duplication

class MyClass(object):
    val = MyDescriptor("val") # must put in field name manually

Which makes the descriptor easy to write

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> class MyDescriptor(object):
...     def __init__(self, field=""):
...         self.field = field
...     def __get__(self, obj, type):
...         print "Called __get__"
...         return obj.__dict__.get(self.field)
...     def __set__(self, obj, val):
...         print "Called __set__"
...         obj.__dict__[self.field] = val
...

Everybody gets that right?

If obj.x is always going to get you the descriptor than obj.__dict__['x'] is hidden from normal access and the descriptor can use it to store values…

Fortunately

A little bit of code duplication

… doesn’t bug anybody here, right? RIGHT?

If only we knew something about metaclasses…

Or maybe class decorators …

We could do something …

Like this

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> def named_descriptors(klass):
...     for name, attr in klass.__dict__.items():
...         if isinstance(attr, MyDescriptor):
...             attr.field = name
...     return klass
...
>>> @named_descriptors
... class MyClass(object):
...     x = MyDescriptor()
...

Which works

>>> obj = MyClass()
>>> obj.x = 10
Called __set__
>>> obj.x
Called __get__
10

But might be too much magic…

What’s the point of all this?

Let’s abandon the details of how we might handle implementation

@property was cool.

Why do I need descriptors anyways?

@property is just sugar

So is @staticmethod and @classmethod.

It’s all the descriptor protocol underneath.

Great!

@property is doing the Pythonic thing and giving me a simple interface to a complicated API.

Do I ever have to write custom descriptors?

Yes!

@property doesn’t work for every case where you need to intercept attribute access.

Imagine a class that needs to store various dollar amounts in attributes. Better use decimal.Decimal! And fix the representation to 2 decimal places.

I know - @property to the rescue!

Just a little code duplication

>>> from decimal import Decimal, ROUND_UP
>>> class BankTransaction(object):
...     _cents = Decimal('.01')
...     def __init__(self, account, before, after, min, max):
...         self.account = account
...         self._before = before
...         self._after = after
...         self._min = min
...         self._max =
...     @property
...     def before(self):
...         return Decimal(self._before).quantize(self._cents, ROUND_UP)
...     @before.setter
...     def before(self, val):
...         self._before = str(val)
...# repeat boilerplate getters and setters over and over and over...

images/nope.gif

I thought @property was supposed to save me from boilerplate code!

Takeaway

Descriptors let us write re-usable properties.

Isn’t this much nicer?

class BankTransaction(object):
    before = CurrencyField(0)
    after = CurrencyField(0)

    def __init__(self, account, before, after):
        self.account = account
        self.before = before
        self.after = after

Descriptors are a great solution for attributes with common behaviour across multiple classes

Use cases

Think database fields: each has its own validation logic but might be attached to many different classes with many different names.

class Person(object):
    id = PrimaryKeyField()
    name = VarCharField(max_length=255)

class NickName(object):
    id = PrimaryKeyField()
    person_id = ForeignKey(Person)
    name = VarCharField(max_length=255)

This may look vaguely familiar

Or GUI fields that all need to fire off events when updated.

class PongBall(Widget):
    velocity_x = NumericProperty(0)
    velocity_y = NumericProperty(0)

That too.

It may be cool to simply provide a "declarative" API.

Or implement advanced attribute access patterns like "cached fields".

Every Framework Ever
>>> class LazyProperty(object):
...     def __init__(self, func):
...         self._func = func
...         self.__name__ = func.__name__
...
...     def __get__(self, obj, klass):
...         print "Called the func"
...         result = self._func(obj)
...         obj.__dict__[self.__name__] = result
...         return result
...
>>> class MyClass(object):
...     @LazyProperty
...     def x(self):
...         return 42
...
>>> obj = MyClass()
>>> obj.x
Called the func
42
>>> obj.x
42

Do you get why it works?

Congratulations!

Go forth and wizard!

Helpful Resources

/

#