Index examples
Simple one table example
Test data preparation
CREATE TABLE persons (
id SERIAL PRIMARY KEY,
email TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO persons (email)
SELECT md5(random()::text) || '@example.com'
FROM generate_series(1, 1000000);Execute a command while there is no index created for email column
EXPLAIN ANALYZE SELECT * FROM persons WHERE email = 'test@example.com';What you can see (IIUC when number of rows is about ~100k):
Seq Scan on persons (cost=0.00..2387.00 rows=1 width=57) (actual time=3.825..3.825 rows=0 loops=1)
Filter: (email = 'test@example.com'::text)
Rows Removed by Filter: 100000
Planning Time: 0.107 ms
Execution Time: 3.836 msSeq Scan on persons
Seq Scan on personsMeaning: PostgreSQL performed a sequential scan (full table scan) on the table
persons.Implication: Since there's no index on the
emailcolumn, PostgreSQL had to scan every row to find a match.
2. (cost=0.00..2387.00 rows=1 width=57)
(cost=0.00..2387.00 rows=1 width=57)Meaning: This is the query planner's estimate:
0.00: Startup cost (cost to begin the operation).2387.00: Total estimated cost to complete the operation.rows=1: Estimated number of rows returned.width=57: Estimated average size (in bytes) of each row.
Implication: The planner expected to find 1 matching row, but had to scan all rows (which it estimated would cost 2387 units of work).
3. (actual time=3.825..3.825 rows=0 loops=1)
(actual time=3.825..3.825 rows=0 loops=1)Meaning:
actual time: Time taken to execute this node in milliseconds.First number (
3.825): Time to return the first row.Second number (
3.825): Time to return all rows.
rows=0: Actual number of rows returned.loops=1: Number of times this operation was executed (usually 1 unless in a loop or subquery).
Implication: The query took ~3.825 ms to scan the entire table and found no rows matching
email = 'test@example.com'.
4. Filter: (email = 'test@example.com'::text)
Filter: (email = 'test@example.com'::text)Meaning: The condition used to filter rows during the scan.
Implication: Since there's no index, the filter is applied to every row during the scan.
5. Rows Removed by Filter: 100000
Rows Removed by Filter: 100000Meaning: Out of 100,000 rows in the table, all were scanned and none matched the filter condition.
Implication: This confirms that the query had to process all rows, and none met the condition β which is inefficient without an index.
6. Planning Time: 0.107 ms
Planning Time: 0.107 msMeaning: Time taken by PostgreSQL to plan the query execution.
Implication: Very fast, as expected for a simple query.
7. Execution Time: 3.836 ms
Execution Time: 3.836 msMeaning: Total time taken to execute the query from start to finish.
Implication: This is the real-world performance metric. Without an index, scanning 100,000 rows took ~3.8 ms β acceptable for small datasets, but could be much slower on larger tables.
What you can see (when number of rows is about ~1,1kk)
Gather (cost=1000.00..19226.06 rows=1 width=57) (actual time=20.481..22.033 rows=0 loops=1)
Meaning: This is the Gather node, which collects results from parallel worker processes.
Cost: Estimated cost range for the entire operation.
Rows: Estimated number of rows returned (still 1).
Actual Time: Time taken to gather results from all workers (20.481 ms to get first row, 22.033 ms to finish).
Rows: 0 rows returned (no match found).
Loops: 1 (the gather operation ran once).
2. Workers Planned: 2
Workers Planned: 2Meaning: PostgreSQL planned to use 2 parallel workers to speed up the scan.
Implication: The query planner decided that the table was large enough to benefit from parallelism.
3. Workers Launched: 2
Workers Launched: 2Meaning: PostgreSQL successfully launched 2 parallel workers.
Implication: The scan was split across 3 processes (1 main + 2 workers).
4. -> Parallel Seq Scan on persons (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)
-> Parallel Seq Scan on persons (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)Meaning: Each worker (and the main process) performed a parallel sequential scan on a portion of the
personstable.Cost: Estimated cost for this node (lower than the gather node).
Actual Time: Time taken per scan β all workers finished in ~18.120 ms.
Rows: 0 rows returned.
Loops: 3 (main process + 2 workers).
5. Filter: (email = 'test@example.com'::text)
Filter: (email = 'test@example.com'::text)Meaning: The same filter condition as before β applied during the scan.
Implication: No index, so every row had to be checked.
6. Rows Removed by Filter: 366667
Rows Removed by Filter: 366667Meaning: Each worker scanned ~366,667 rows and removed them all because they didnβt match the filter.
Total Rows Scanned: ~366,667 Γ 3 = 1,100,001 rows
Implication: The entire table was scanned in parallel, but still no match found.
7. Planning Time: 0.037 ms
Planning Time: 0.037 msMeaning: Time taken to plan the query.
Implication: Very fast, as the query is simple.
8. Execution Time: 22.043 ms
Execution Time: 22.043 msMeaning: Total time taken to execute the query.
Implication: Even with parallelism, scanning over a million rows took ~22 ms β and still no match.
Create an index for email column
Index Scan using idx_persons_email on persons
Meaning: PostgreSQL used an index scan on the
idx_persons_emailindex to find matching rows.Implication: Instead of scanning the entire table, it used the B-tree index to directly locate the rows where
email = 'test@example.com'.
2. (cost=0.43..8.45 rows=1 width=57)
(cost=0.43..8.45 rows=1 width=57)Meaning: Estimated cost of the operation:
0.43: Startup cost (cost to begin the index scan).8.45: Total estimated cost to retrieve all matching rows.rows=1: Estimated number of rows returned.width=57: Estimated average size (in bytes) of each row.
Implication: The query planner expects this to be a very fast operation, as it involves only a few index node lookups and possibly one table row fetch.
3. (actual time=0.018..0.018 rows=0 loops=1)
(actual time=0.018..0.018 rows=0 loops=1)Meaning:
actual time: Time taken to execute this node.First number (
0.018): Time to return the first row.Second number (
0.018): Time to return all rows.
rows=0: No rows matched the condition.loops=1: The scan was executed once.
Implication: The index allowed PostgreSQL to quickly determine that no matching rows exist, without scanning the entire table.
4. Index Cond: (email = 'test@example.com'::text)
Index Cond: (email = 'test@example.com'::text)Meaning: The condition used to search the index.
Implication: The index was used effectively to evaluate the
WHEREclause.
5. Planning Time: 0.176 ms
Planning Time: 0.176 msMeaning: Time taken by PostgreSQL to plan the query.
Implication: Very fast, as expected for a simple indexed query.
More complex example
Asssume there are tables
explanation
π Outer Loop: Nested Loop
Nested LoopType:
Nested LoopCost: 8.59..1911.00
Rows: Expected 2 rows
Actual Time: 0.829..10.144 ms
Loops: 1
This is the top-level operation. It joins the result of the
Hash Join(inner query) with theproductstable using a nested loop.
π Inner Loop: Hash Join
Hash JoinType:
Hash JoinCost: 8.31..1908.04
Rows: Expected 10 rows
Actual Time: 0.824..10.114 ms
Loops: 1
This joins the
orderstable with theuserstable using a hash join.
π Seq Scan on orders o
Seq Scan on orders oType:
Seq ScanCost: 0.00..1637.09
Rows: 100,009
Actual Time: 0.007..5.726 ms
Loops: 1
Rows Scanned: 100,000
This is a full table scan of the
orderstable β very inefficient. It's scanning all 100,000 rows just to find a few matching orders for one user.
π§ Hash on users u
Hash on users uType:
HashCost: 8.30..8.30
Rows: 1
Actual Time: 0.017..0.018 ms
Loops: 1
This builds a hash table from the result of the inner index scan (on
users), to speed up the join withorders.
π Index Scan using users_email_key on users u
Index Scan using users_email_key on users uType:
Index ScanCost: 0.29..8.30
Rows: 1
Actual Time: 0.015..0.015 ms
Loops: 1
This is efficient β it uses the unique index on
users.emailto find the useruser500@example.com.
π Index Scan using products_pkey on products p
Index Scan using products_pkey on products pType:
Index ScanCost: 0.28..0.30
Rows: 1
Actual Time: 0.002..0.002 ms
Loops: 11
This scans the
productstable using the primary key (product_id) β efficient per-row lookup, but it runs 11 times (once per row from the hash join).
π§Ή Filter: (category = 'Electronics'::text)
Filter: (category = 'Electronics'::text)Rows Removed by Filter: 1
Even though the index finds the product by ID, the
categoryfilter is applied after the index lookup β meaning the index doesn't help with filtering by category.
π Summary Table
Seq Scan on orders
Sequential Scan
100,000
~5.7
β Inefficient
Index Scan on users
Index Scan
1
~0.015
β Efficient
Hash Join
Hash Join
11 rows
~10.1
OK, but could be better
Index Scan on products
Index Scan
11 times
~0.002 x11
β Efficient per row
Filter on category
In-memory filter
1 removed
N/A
β Not indexed
π§ Key Observations
The main bottleneck is the full scan of the
orderstable β it's scanning 100,000 rows to find a few matching orders.The
userstable is efficiently accessed via an index onemail.The
productstable is accessed efficiently via its primary key, but thecategoryfilter is applied after the index lookup.The
Nested LoopandHash Joinare not the problem β they're just the tools used to join data. The real issue is missing indexes on the join and filter columns.
β
How to Improve This
1. Add an index on orders.user_id
orders.user_idThis will allow PostgreSQL to use an index scan instead of a full table scan when joining orders with users.
2. Add a composite index on (orders.user_id, orders.product_id)
(orders.user_id, orders.product_id)This allows PostgreSQL to efficiently join and filter in one step.
3. Add an index on products.category
products.categoryThis allows PostgreSQL to filter by category using an index, instead of scanning and filtering in memory.
After index creation
explanation
π Outer Loop: Nested Loop
Nested LoopType:
Nested LoopCost: 4.93..52.09
Rows: Expected 2 rows
Actual Time: 0.652..0.680 ms
Loops: 1
This is the top-level loop joining the result of the inner loop (user + orders) with the
productstable.
π Inner Loop: Nested Loop
Nested LoopType:
Nested LoopCost: 4.66..49.14
Rows: Expected 10 rows
Actual Time: 0.648..0.661 ms
Loops: 1
This loop joins the
userstable with theorderstable.
π Index Scan using users_email_key on users u
Index Scan using users_email_key on users uType:
Index ScanCost: 0.29..8.30
Rows: 1
Actual Time: 0.630..0.630 ms
Loops: 1
Efficiently finds the user
user500@example.comusing the unique index on email.
π§ Bitmap Heap Scan on orders o
Bitmap Heap Scan on orders oType:
Bitmap Heap ScanCost: 4.37..40.74
Rows: 10
Actual Time: 0.015..0.027 ms
Loops: 1
Heap Blocks: 11 exact matches
This is where the index on
orders(user_id)(or part of a composite index) shines:
It finds all orders for the user efficiently.
It uses a Bitmap Index Scan to gather matching TIDs (tuple IDs), then fetches the actual rows.
π¦ Bitmap Index Scan on idx_orders_user_product
Bitmap Index Scan on idx_orders_user_productType:
Bitmap Index ScanCost: 0.00..4.37
Rows: 10
Actual Time: 0.008..0.008 ms
Loops: 1
This is the core optimization:
It uses your composite index on
(user_id, product_id)to find matching orders.It's very fast and avoids scanning the entire table.
π Final Loop: Index Scan using products_pkey on products p
Index Scan using products_pkey on products pType:
Index ScanCost: 0.28..0.30
Rows: 1
Actual Time: 0.002..0.002 ms
Loops: 11
This scans the
productstable using the primary key (product_id) β efficient per-row lookup.
π§Ή Filter: (category = 'Electronics'::text)
Filter: (category = 'Electronics'::text)Rows Removed by Filter: 1
This is still a post-index filter β the index on
products(product_id)doesn't help with filtering bycategory.
π Summary Table
Index Scan on users
Index Scan
1
~0.63
β Efficient
Bitmap Index Scan on orders
Bitmap Index Scan
11
~0.008
β Efficient
Bitmap Heap Scan on orders
Bitmap Heap Scan
11
~0.015
β Efficient
Index Scan on products
Index Scan
11 times
~0.002 x11
β Efficient
Filter on category
In-memory filter
1 removed
N/A
β Not indexed
π§ Key Improvements
Full scan of 100,000 rows in orders
Only 11 rows scanned
β 99.99% fewer rows
High execution time (~10 ms)
Execution time ~0.7 ms
β ~14x faster
No index on orders.user_id
Composite index used
β Index-based join
No index on products.category
Still missing
β οΈ Opportunity for further optimization
β
Final Recommendation
You've already made huge performance gains by adding:
CREATE INDEX idx_orders_user_product ON orders(user_id, product_id);CREATE INDEX idx_products_category ON products(category);
π Bonus Tip: If you add the
categorycondition to the index onorders, you can avoid the final in-memory filter:
Or better yet, if you often filter by category:
This is a partial index, and it can be extremely fast for common filters.
Last updated