Simeon Franklin

Blog :: Hidden Django QuerySet Features II

20 February 2011

Recently while reading a co-workers code I discovered that Django's Queryset can be OR'ed together - but this may not always be a good idea. Django's ORM overloads the bitwise OR operator to express logical OR and the documentation demonstrates this with a Q object. Let's query the ORM to find recently active users - users who joined this year or who have logged in this year:


>>> from django.db.models import Q
>>> from django.contrib.auth.models import User
>>> from datetime import date
>>> jan_1st = date(2011, 1, 1)
>>> recent = User.objects.filter(Q(last_login__gte=jan_1st) 
                             | Q(date_joined__gte=jan_1st))
>>> print(recent._as_sql())

('SELECT U0."id"
  FROM "auth_user" U0
  WHERE (U0."last_login" >= %s
          OR U0."date_joined" >= %s )',
 (u'2011-01-01 00:00:00', u'2011-01-01 00:00:00'))

The where clause in the generated SQL is exactly what we wanted. It turns out, however, that you can use the | operator on querysets directly to yield the same thing.


>>> from django.contrib.auth.models import User
>>> from datetime import date
>>> jan_1st = date(2011, 1, 1)
>>> recent_login = User.objects.filter(last_login__gte=jan_1st)
>>> recent_join = User.objects.filter(date_joined__gte=jan_1st)
>>> recent = recent_login | recent_join
>>> print(recent._as_sql())
('SELECT U0."id"
  FROM "auth_user" U0
  WHERE (U0."last_login" >= %s
          OR U0."date_joined" >= %s )',
 (u'2011-01-01 00:00:00', u'2011-01-01 00:00:00'))

Sweet! It's always bothered me that chained calls were AND'ed together and there was apparently no way to do a simple OR without importing an additional class. However this feature has to be used with care - you are OR'ing entire queries instead of clauses so careless usage might lead to expensive queries. For instance imagine that we want to only look at certain automatically created recent users whose usernames start with "applicant". OR'ing entire querysets works. Sort of.



>>> from django.contrib.auth.models import User
>>> from datetime import date
>>> jan_1st = date(2011, 1, 1)
>>> applicants = User.objects.filter(username__startswith="applicant")
>>> recent_login = applicants.filter(last_login__gte=jan_1st)
>>> recent_join = applicants.filter(date_joined__gte=jan_1st)
>>> recent = recent_login | recent_join
>>> print(recent._as_sql())
('SELECT U0."id"
  FROM "auth_user" U0
  WHERE
   ((U0."username"::text LIKE %s  AND U0."last_login" >= %s )
     OR
    (U0."username"::text LIKE %s  AND U0."date_joined" >= %s ))',
 (u'applicant%',
  u'2010-01-01 00:00:00',
  u'applicant%',
  u'2010-01-01 00:00:00'))

Note that this did exactly what we asked - OR'ed two querysets. This results in duplication in the where clause - we really only want to run the LIKE search on the username once and then filter the results by OR'ing the two date criterion. The results from the this query will be correct but the execution (depending on your DB backend) may be slower as the LIKE operator is run on the entire data set twice. In my case I noticed an excessively complicated query in the SQL results pane of my django-debug-toolbar. Due to directly OR'ing two complicated querysets four or five expensive operations were being duplicated on a table with a million records. Switching to a Q object produced a much shorter (and faster) query with the same results.


blog comments powered by Disqus