Index examples
Simple one table example
Test data preparation
CREATE TABLE persons (
id SERIAL PRIMARY KEY,
email TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO persons (email)
SELECT md5(random()::text) || '@example.com'
FROM generate_series(1, 1000000);
Execute a command while there is no index created for email
column
EXPLAIN ANALYZE SELECT * FROM persons WHERE email = 'test@example.com';
What you can see (IIUC when number of rows is about ~100k):
Seq Scan on persons (cost=0.00..2387.00 rows=1 width=57) (actual time=3.825..3.825 rows=0 loops=1)
Filter: (email = 'test@example.com'::text)
Rows Removed by Filter: 100000
Planning Time: 0.107 ms
Execution Time: 3.836 ms
Seq Scan on persons
Seq Scan on persons
Meaning: PostgreSQL performed a sequential scan (full table scan) on the table
persons
.Implication: Since there's no index on the
email
column, PostgreSQL had to scan every row to find a match.
2. (cost=0.00..2387.00 rows=1 width=57)
(cost=0.00..2387.00 rows=1 width=57)
Meaning: This is the query planner's estimate:
0.00
: Startup cost (cost to begin the operation).2387.00
: Total estimated cost to complete the operation.rows=1
: Estimated number of rows returned.width=57
: Estimated average size (in bytes) of each row.
Implication: The planner expected to find 1 matching row, but had to scan all rows (which it estimated would cost 2387 units of work).
3. (actual time=3.825..3.825 rows=0 loops=1)
(actual time=3.825..3.825 rows=0 loops=1)
Meaning:
actual time
: Time taken to execute this node in milliseconds.First number (
3.825
): Time to return the first row.Second number (
3.825
): Time to return all rows.
rows=0
: Actual number of rows returned.loops=1
: Number of times this operation was executed (usually 1 unless in a loop or subquery).
Implication: The query took ~3.825 ms to scan the entire table and found no rows matching
email = 'test@example.com'
.
4. Filter: (email = 'test@example.com'::text)
Filter: (email = 'test@example.com'::text)
Meaning: The condition used to filter rows during the scan.
Implication: Since there's no index, the filter is applied to every row during the scan.
5. Rows Removed by Filter: 100000
Rows Removed by Filter: 100000
Meaning: Out of 100,000 rows in the table, all were scanned and none matched the filter condition.
Implication: This confirms that the query had to process all rows, and none met the condition — which is inefficient without an index.
6. Planning Time: 0.107 ms
Planning Time: 0.107 ms
Meaning: Time taken by PostgreSQL to plan the query execution.
Implication: Very fast, as expected for a simple query.
7. Execution Time: 3.836 ms
Execution Time: 3.836 ms
Meaning: Total time taken to execute the query from start to finish.
Implication: This is the real-world performance metric. Without an index, scanning 100,000 rows took ~3.8 ms — acceptable for small datasets, but could be much slower on larger tables.
What you can see (when number of rows is about ~1,1kk)
Gather (cost=1000.00..19226.06 rows=1 width=57) (actual time=20.481..22.033 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on persons (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)
Filter: (email = 'test@example.com'::text)
Rows Removed by Filter: 366667
Planning Time: 0.037 ms
Execution Time: 22.043 ms
Gather (cost=1000.00..19226.06 rows=1 width=57) (actual time=20.481..22.033 rows=0 loops=1)
Meaning: This is the Gather node, which collects results from parallel worker processes.
Cost: Estimated cost range for the entire operation.
Rows: Estimated number of rows returned (still 1).
Actual Time: Time taken to gather results from all workers (20.481 ms to get first row, 22.033 ms to finish).
Rows: 0 rows returned (no match found).
Loops: 1 (the gather operation ran once).
2. Workers Planned: 2
Workers Planned: 2
Meaning: PostgreSQL planned to use 2 parallel workers to speed up the scan.
Implication: The query planner decided that the table was large enough to benefit from parallelism.
3. Workers Launched: 2
Workers Launched: 2
Meaning: PostgreSQL successfully launched 2 parallel workers.
Implication: The scan was split across 3 processes (1 main + 2 workers).
4. -> Parallel Seq Scan on persons (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)
-> Parallel Seq Scan on persons (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)
Meaning: Each worker (and the main process) performed a parallel sequential scan on a portion of the
persons
table.Cost: Estimated cost for this node (lower than the gather node).
Actual Time: Time taken per scan — all workers finished in ~18.120 ms.
Rows: 0 rows returned.
Loops: 3 (main process + 2 workers).
5. Filter: (email = 'test@example.com'::text)
Filter: (email = 'test@example.com'::text)
Meaning: The same filter condition as before — applied during the scan.
Implication: No index, so every row had to be checked.
6. Rows Removed by Filter: 366667
Rows Removed by Filter: 366667
Meaning: Each worker scanned ~366,667 rows and removed them all because they didn’t match the filter.
Total Rows Scanned: ~366,667 × 3 = 1,100,001 rows
Implication: The entire table was scanned in parallel, but still no match found.
7. Planning Time: 0.037 ms
Planning Time: 0.037 ms
Meaning: Time taken to plan the query.
Implication: Very fast, as the query is simple.
8. Execution Time: 22.043 ms
Execution Time: 22.043 ms
Meaning: Total time taken to execute the query.
Implication: Even with parallelism, scanning over a million rows took ~22 ms — and still no match.
Create an index for email column
CREATE INDEX idx_persons_email ON persons(email);
Index Scan using idx_persons_email on persons (cost=0.43..8.45 rows=1 width=57) (actual time=0.018..0.018 rows=0 loops=1)
Index Cond: (email = 'test@example.com'::text)
Planning Time: 0.176 ms
Execution Time: 0.025 ms
Index Scan using idx_persons_email on persons
Meaning: PostgreSQL used an index scan on the
idx_persons_email
index to find matching rows.Implication: Instead of scanning the entire table, it used the B-tree index to directly locate the rows where
email = 'test@example.com'
.
2. (cost=0.43..8.45 rows=1 width=57)
(cost=0.43..8.45 rows=1 width=57)
Meaning: Estimated cost of the operation:
0.43
: Startup cost (cost to begin the index scan).8.45
: Total estimated cost to retrieve all matching rows.rows=1
: Estimated number of rows returned.width=57
: Estimated average size (in bytes) of each row.
Implication: The query planner expects this to be a very fast operation, as it involves only a few index node lookups and possibly one table row fetch.
3. (actual time=0.018..0.018 rows=0 loops=1)
(actual time=0.018..0.018 rows=0 loops=1)
Meaning:
actual time
: Time taken to execute this node.First number (
0.018
): Time to return the first row.Second number (
0.018
): Time to return all rows.
rows=0
: No rows matched the condition.loops=1
: The scan was executed once.
Implication: The index allowed PostgreSQL to quickly determine that no matching rows exist, without scanning the entire table.
4. Index Cond: (email = 'test@example.com'::text)
Index Cond: (email = 'test@example.com'::text)
Meaning: The condition used to search the index.
Implication: The index was used effectively to evaluate the
WHERE
clause.
5. Planning Time: 0.176 ms
Planning Time: 0.176 ms
Meaning: Time taken by PostgreSQL to plan the query.
Implication: Very fast, as expected for a simple indexed query.
More complex example
Asssume there are tables
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
...
);
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
...
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(user_id),
product_id INT REFERENCES products(product_id),
...
);
EXPLAIN ANALYZE
SELECT o.order_id, p.name, o.quantity, p.price
FROM orders o
JOIN users u ON o.user_id = u.user_id
JOIN products p ON o.product_id = p.product_id
WHERE u.email = 'user500@example.com'
AND p.category = 'Electronics';
Nested Loop (cost=8.59..1911.00 rows=2 width=25) (actual time=0.829..10.144 rows=2 loops=1)
-> Hash Join (cost=8.31..1908.04 rows=10 width=12) (actual time=0.824..10.114 rows=11 loops=1)
Hash Cond: (o.user_id = u.user_id)
-> Seq Scan on orders o (cost=0.00..1637.09 rows=100009 width=16) (actual time=0.007..5.726 rows=100000 loops=1)
-> Hash (cost=8.30..8.30 rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Scan using users_email_key on users u (cost=0.29..8.30 rows=1 width=4) (actual time=0.015..0.015 rows=1 loops=1)
Index Cond: (email = 'user500@example.com'::text)
-> Index Scan using products_pkey on products p (cost=0.28..0.30 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=11)
Index Cond: (product_id = o.product_id)
Filter: (category = 'Electronics'::text)
Rows Removed by Filter: 1
Planning Time: 0.217 ms
Execution Time: 10.161 ms
After index creation
CREATE INDEX idx_orders_user_product ON orders(user_id, product_id);
CREATE INDEX idx_products_category ON products(category);
Nested Loop (cost=4.93..52.09 rows=2 width=25) (actual time=0.652..0.680 rows=2 loops=1)
-> Nested Loop (cost=4.66..49.14 rows=10 width=12) (actual time=0.648..0.661 rows=11 loops=1)
-> Index Scan using users_email_key on users u (cost=0.29..8.30 rows=1 width=4) (actual time=0.630..0.630 rows=1 loops=1)
Index Cond: (email = 'user500@example.com'::text)
-> Bitmap Heap Scan on orders o (cost=4.37..40.74 rows=10 width=16) (actual time=0.015..0.027 rows=11 loops=1)
Recheck Cond: (user_id = u.user_id)
Heap Blocks: exact=11
-> Bitmap Index Scan on idx_orders_user_product (cost=0.00..4.37 rows=10 width=0) (actual time=0.008..0.008 rows=11 loops=1)
Index Cond: (user_id = u.user_id)
-> Index Scan using products_pkey on products p (cost=0.28..0.30 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=11)
Index Cond: (product_id = o.product_id)
Filter: (category = 'Electronics'::text)
Rows Removed by Filter: 1
Planning Time: 3.442 ms
Execution Time: 0.698 ms
SELECT * FROM pg_stat_user_indexes;
Last updated
Was this helpful?