Saturday, October 11, 2014

Query Plan in Postgres

When a query is issued to a database, the query is executed as per query plan. To fetch the relevant results, database has to scan the records and filter them based on criteria. For example if for fetching the results of query, the database can fetch the records from indexes or will have to do a complete search on table. 

Understanding query plans is an important tool to understand and diagnose the query performances. Most of the database provide the tools to study query plans including Oracle. In this post we will look into how to study query plans of Postgres but conceptually it's similar in other databases.

Let's take and example of a table and understand how we can study the query plan. Let's say we have
a users table with following columns. Let 's keep the table structure simple

users
  • id (primary key)
  • email
For query plan postgres provides two commanda which need to be appended before query
  • Explain: Explain does not runs the actual query and returns the plan. It will tell if query is run how it will get executed.
  • Explain Analyze: Explain also returns the query plan but it executes the query also on the way. So be careful. If you are running DDL or DML statements, this will result in changing the state of the database.
Let's see now how we can use them. Run the following query on users table using Explain

explain select * from users;

It will return

                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on users  (cost=0.00..10.70 rows=70 width=1062)
(1 row)

This tells that the query will be run by doing a plain sequential scan on users. As we need all rows, that makes sense. We can format the query in JSON format also

explain (format json) select * from users;

And it will return

           QUERY PLAN
---------------------------------
 [                              +
   {                            +
     "Plan": {                  +
       "Node Type": "Seq Scan", +
       "Relation Name": "users",+
       "Alias": "users",        +
       "Startup Cost": 0.00,    +
       "Total Cost": 10.70,     +
       "Plan Rows": 70,         +
       "Plan Width": 1062       +
     }                          +
   }                            +
 ]
(1 row)

Let's look into what each field means. We will look from the cost row as other are self explanatory
  • Startup Cost: This is the time before output phase begins
  • Total Cost: Total cost to complete the query.
  • Plan Rows: Estimated number of rows outputted.
  • Plan Width: Width of row in bytes.
The cost are measures in arbitrary units. Usually it's measured in number of disk fetches. For practical purposes look it in a relative way. 

If we do explain analyze than the query is actually executed

 explain analyze select * from users;

The results are
                                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..10.70 rows=70 width=1062) (actual time=0.011..0.011 rows=1 loops=1)
 Total runtime: 0.043 ms

(2 rows)

This executes the query and also tell the actual time taken to execute the query.  The actual time is in ms and may vary time to time, but that's what the query takes to execute. Again it has no relation with the cost as cost is in arbitrary units.  The loops tell how many time this node is run. Here we have query which has just one node but complex queries contain many nodes.

Now let's run a query which has a where clause on name

explain analyze select * from users where email = 'abc';

The output
                                            QUERY PLAN
-------------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..10.88 rows=1 width=1062) (actual time=0.029..0.029 rows=0 loops=1)
   Filter: ((email)::text = 'abc'::text)
   Rows Removed by Filter: 1
 Total runtime: 0.066 ms
(4 rows)

See how a filter is applied in the sequential scan.

Let's see how the query plan changes if we put an index on email

CREATE INDEX users_email_index ON users USING btree (email);

The query

explain analyze select * from users where email = 'abc';

The output is 
                                            QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..1.01 rows=1 width=1062) (actual time=0.021..0.021 rows=0 loops=1)
   Filter: ((email)::text = 'abc'::text)
   Rows Removed by Filter: 1
 Total runtime: 0.071 ms

(4 rows)

Note that the database still chose to use the sequential scan and not index. Database are smart enough to find the most optimum path.

Now I will insert a lot of records in my users table. and after inserting the number of records are now 2500.

Running the explain analyze again gives me

                                                        QUERY PLAN
--------------------------------------------------------------------
 Index Scan using users_email_index on users  (cost=0.28..8.30 rows=1 width=100)
 (actual time=0.029..0.029 rows=0 loops=1)
   Index Cond: ((email)::text = 'abc'::text)
 Total runtime: 0.059 ms

(3 rows)

See now the index scan is done on the email_index.

Evaluating query plans are one of the best tool to understand how database executes query. Again be careful, 'Explain Analyze' will actually execute the query and may result in change in database state. You would not want to do that in your production databases.

For further details see

No comments:

Post a Comment