Saturday, October 11, 2014

Multicolumn Indexes in Postgres

Multicolumn index means putting an index on a combination of two column. Let's look into an example to see how multicolumn index works. Let's say we have user table with following columns

human
  • id - primary key of record
  • day_of_year - Day from 1 to 366, when the human was born
  • first_name - First name given to human

Let's say we want to build this table for all the humans on earth.  Quite a number of records. Before we look into multicolumn index, let me be very clear that the indexing should be driven by the kind
of queries you are going to do. 

For creating the table you can use the following query in Postgres



CREATE TABLE human
(
  id bigint NOT NULL,
  day_of_year integer,
  first_name character varying(50),
  CONSTRAINT human_pkey PRIMARY KEY (id)
);

CREATE SEQUENCE human_seq INCREMENT 1 MINVALUE 1 NO MAXVALUE START 1 CACHE 1;

Now let's insert a set of records in the table.

How to define multicolumn index

Let's say we want to put multicolumn index on day_of_year and first_name. The index can be applied by running the following sql

CREATE INDEX human_day_of_year_first_name ON human (day_of_year , first_name );

In defining the index, the first column is considered as major and second as minor. This index will be scanned for the queries with following where clauses

where day_of_year =? and first_name = ?

This index will also be scanned for 

where day_of_year =? 

However this index will not be scanned for 

where day_of_year =? or first_name = ?

and also not for

where first_name =? 

Let's run two queries and see the query plan. For query plan see the query plan post

First Query

explain analyze select * from human where day_of_year=100 and first_name='abc';

and the plan
                      QUERY PLAN
--------------------------------------------------------------------
 Index Scan using human_day_of_year_first_name on human  (cost=0.28..8.30 rows=1 width=29) (actual time=0.054..0.054 rows=0 loops=1)
   Index Cond: ((day_of_year = 100) AND ((first_name)::text = 'abc'::text))
 Total runtime: 0.083 ms

(3 rows)

Note that the index scan happens.

Second Query

explain analyze select * from human where day_of_year=100 or first_name='abc';

and the plan
                                QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on human  (cost=0.00..124.11 rows=15 width=29) (actual time=0.059..2.782 rows=14 loops=1)
   Filter: ((day_of_year = 100) OR ((first_name)::text = 'abc'::text))
   Rows Removed by Filter: 5527
 Total runtime: 2.842 ms

(4 rows)

This results in full sequential scan of the records. 


If you have predominantly 'or' queries than you would like to create index on the columns separately. The database can run scan on individual indexes and then 'or' the results to build the final set. However if the queries are predominantly 'and' than multicolumn indexes are better as that will result in only one index scan.  

No comments:

Post a Comment