Concerning text queries, Django ORM offers several helper functions. However, some of those are not backed by a database index and will be slow. Fortunately, on PostgreSQL these searches can be sped up with GIN or GIST indexes quite easily.
This article will focus on Django 1.6 with a backing PostgreSQL. It probably applies to Django 1.7, as well.
The Django ORM layer offers some useful functionality when it comes to text search:
- contains and icontains (https://docs.djangoproject.com/en/1.6/ref/models/querysets/#std:fieldlookup-contains)
- startswith and istartswith (https://docs.djangoproject.com/en/1.6/ref/models/querysets/#startswith)
- endswith and iendswith (https://docs.djangoproject.com/en/1.6/ref/models/querysets/#endswith)
When querying the database, these will result in wildcard SQL queries. ‘endswith’ will use a trailing wildcard, ‘startswith’ a leading wildcard, and ‘contains’ leading and trailing wildcards.
In the following, I will focus on ‘contains’ as it covers both the leading and trailing wildcards.
The Django ORM layer will create the following WHERE clause when a ‘contains’ lookup is requested:
# python
models.Article.objects.filter(title__icontains='fancy')
# SQL
WHERE UPPER("blog_article"."title"::text) LIKE UPPER('%fancy%')
Asking PostgreSQL to explain a query making use of ‘icontains’ on shows:
blog_db=# explain analyze SELECT COUNT(*) FROM "blog_article" WHERE UPPER("blog_article"."title"::text) LIKE UPPER('%fancy%');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=557.64..557.65 rows=1 width=0) (actual time=29.156..29.156 rows=1 loops=1)
-> Seq Scan on blog_article (cost=0.00..557.64 rows=2 width=0) (actual time=0.236..29.126 rows=3 loops=1)
Filter: (upper(title) ~~ '%FANCY%'::text)
Rows Removed by Filter: 19373
Total runtime: 29.280 ms
(5 rows)
So, a table with around 20 thousand entries will already require around 30 milliseconds for a simple ‘count’ when the result is filtered with ‘icontains’. Can this be reduced?
It can, but it requires a GIN or GIST index, that Django does not create by default as either requires an extension.
I decided to go with GIN, which seemed more appropriate in our case. Installing the extension is fairly simply under Ubuntu:
$ sudo apt-get install postgresql-contrib-9.3
$ psql blog_db
blog_db=# CREATE EXTENSION pg_trgm;
CREATE EXTENSION
To add the index to a specific column run:
blog_db=# CREATE INDEX blog_article_title_trigram ON blog_article USING gin (UPPER(title) gin_trgm_ops);
CREATE INDEX
Now, check that the index is really being used in the query:
_db=# explain analyze SELECT COUNT(*) FROM "blog_article" WHERE UPPER("blog_article"."title"::text) LIKE UPPER('%fancy%');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=51.53..51.54 rows=1 width=0) (actual time=0.323..0.323 rows=1 loops=1)
-> Bitmap Heap Scan on blog_article (cost=44.02..51.53 rows=2 width=0) (actual time=0.307..0.319 rows=3 loops=1)
Recheck Cond: (upper(title) ~~ '%FANCY%'::text)
-> Bitmap Index Scan on blog_article_title_trigram (cost=0.00..44.02 rows=2 width=0) (actual time=0.287..0.287 rows=3 loops=1)
Index Cond: (upper(title) ~~ '% FANCY%'::text)
Total runtime: 0.411 ms
(6 rows)
The query now lasts around half a millisecond instead of nearly 30 milliseconds.
To find any other queries, that might profit from a special index, simply turn on PostgreSQL’s logging. You can tell Postgres to only log queries that last longer than a configured amount of time.
Further reading: Tuning Django ORM (part 2): Many-to-Many Queries