Index examples

Simple one table example

Test data preparation

CREATE TABLE persons (
    id SERIAL PRIMARY KEY,
    email TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

INSERT INTO persons (email)
SELECT md5(random()::text) || '@example.com'
FROM generate_series(1, 1000000);

Execute a command while there is no index created for email column

EXPLAIN ANALYZE SELECT * FROM persons WHERE email = 'test@example.com';

What you can see (IIUC when number of rows is about ~100k):

Seq Scan on persons (cost=0.00..2387.00 rows=1 width=57) (actual time=3.825..3.825 rows=0 loops=1)
Filter: (email = 'test@example.com'::text)
Rows Removed by Filter: 100000
Planning Time: 0.107 ms
Execution Time: 3.836 ms

Seq Scan on persons

  • Meaning: PostgreSQL performed a sequential scan (full table scan) on the table persons.

  • Implication: Since there's no index on the email column, PostgreSQL had to scan every row to find a match.


2. (cost=0.00..2387.00 rows=1 width=57)

  • Meaning: This is the query planner's estimate:

    • 0.00: Startup cost (cost to begin the operation).

    • 2387.00: Total estimated cost to complete the operation.

    • rows=1: Estimated number of rows returned.

    • width=57: Estimated average size (in bytes) of each row.

  • Implication: The planner expected to find 1 matching row, but had to scan all rows (which it estimated would cost 2387 units of work).


3. (actual time=3.825..3.825 rows=0 loops=1)

  • Meaning:

    • actual time: Time taken to execute this node in milliseconds.

      • First number (3.825): Time to return the first row.

      • Second number (3.825): Time to return all rows.

    • rows=0: Actual number of rows returned.

    • loops=1: Number of times this operation was executed (usually 1 unless in a loop or subquery).

  • Implication: The query took ~3.825 ms to scan the entire table and found no rows matching email = 'test@example.com'.


4. Filter: (email = 'test@example.com'::text)

  • Meaning: The condition used to filter rows during the scan.

  • Implication: Since there's no index, the filter is applied to every row during the scan.


5. Rows Removed by Filter: 100000

  • Meaning: Out of 100,000 rows in the table, all were scanned and none matched the filter condition.

  • Implication: This confirms that the query had to process all rows, and none met the condition β€” which is inefficient without an index.


6. Planning Time: 0.107 ms

  • Meaning: Time taken by PostgreSQL to plan the query execution.

  • Implication: Very fast, as expected for a simple query.


7. Execution Time: 3.836 ms

  • Meaning: Total time taken to execute the query from start to finish.

  • Implication: This is the real-world performance metric. Without an index, scanning 100,000 rows took ~3.8 ms β€” acceptable for small datasets, but could be much slower on larger tables.

What you can see (when number of rows is about ~1,1kk)

Gather (cost=1000.00..19226.06 rows=1 width=57) (actual time=20.481..22.033 rows=0 loops=1)

  • Meaning: This is the Gather node, which collects results from parallel worker processes.

  • Cost: Estimated cost range for the entire operation.

  • Rows: Estimated number of rows returned (still 1).

  • Actual Time: Time taken to gather results from all workers (20.481 ms to get first row, 22.033 ms to finish).

  • Rows: 0 rows returned (no match found).

  • Loops: 1 (the gather operation ran once).


2. Workers Planned: 2

  • Meaning: PostgreSQL planned to use 2 parallel workers to speed up the scan.

  • Implication: The query planner decided that the table was large enough to benefit from parallelism.


3. Workers Launched: 2

  • Meaning: PostgreSQL successfully launched 2 parallel workers.

  • Implication: The scan was split across 3 processes (1 main + 2 workers).


4. -> Parallel Seq Scan on persons (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)

  • Meaning: Each worker (and the main process) performed a parallel sequential scan on a portion of the persons table.

  • Cost: Estimated cost for this node (lower than the gather node).

  • Actual Time: Time taken per scan β€” all workers finished in ~18.120 ms.

  • Rows: 0 rows returned.

  • Loops: 3 (main process + 2 workers).


5. Filter: (email = 'test@example.com'::text)

  • Meaning: The same filter condition as before β€” applied during the scan.

  • Implication: No index, so every row had to be checked.


6. Rows Removed by Filter: 366667

  • Meaning: Each worker scanned ~366,667 rows and removed them all because they didn’t match the filter.

  • Total Rows Scanned: ~366,667 Γ— 3 = 1,100,001 rows

  • Implication: The entire table was scanned in parallel, but still no match found.


7. Planning Time: 0.037 ms

  • Meaning: Time taken to plan the query.

  • Implication: Very fast, as the query is simple.


8. Execution Time: 22.043 ms

  • Meaning: Total time taken to execute the query.

  • Implication: Even with parallelism, scanning over a million rows took ~22 ms β€” and still no match.

Create an index for email column

Index Scan using idx_persons_email on persons

  • Meaning: PostgreSQL used an index scan on the idx_persons_email index to find matching rows.

  • Implication: Instead of scanning the entire table, it used the B-tree index to directly locate the rows where email = 'test@example.com'.


2. (cost=0.43..8.45 rows=1 width=57)

  • Meaning: Estimated cost of the operation:

    • 0.43: Startup cost (cost to begin the index scan).

    • 8.45: Total estimated cost to retrieve all matching rows.

    • rows=1: Estimated number of rows returned.

    • width=57: Estimated average size (in bytes) of each row.

  • Implication: The query planner expects this to be a very fast operation, as it involves only a few index node lookups and possibly one table row fetch.


3. (actual time=0.018..0.018 rows=0 loops=1)

  • Meaning:

    • actual time: Time taken to execute this node.

      • First number (0.018): Time to return the first row.

      • Second number (0.018): Time to return all rows.

    • rows=0: No rows matched the condition.

    • loops=1: The scan was executed once.

  • Implication: The index allowed PostgreSQL to quickly determine that no matching rows exist, without scanning the entire table.


4. Index Cond: (email = 'test@example.com'::text)

  • Meaning: The condition used to search the index.

  • Implication: The index was used effectively to evaluate the WHERE clause.


5. Planning Time: 0.176 ms

  • Meaning: Time taken by PostgreSQL to plan the query.

  • Implication: Very fast, as expected for a simple indexed query.

More complex example

Asssume there are tables

explanation

πŸ” Outer Loop: Nested Loop

  • Type: Nested Loop

  • Cost: 8.59..1911.00

  • Rows: Expected 2 rows

  • Actual Time: 0.829..10.144 ms

  • Loops: 1

This is the top-level operation. It joins the result of the Hash Join (inner query) with the products table using a nested loop.


πŸ” Inner Loop: Hash Join

  • Type: Hash Join

  • Cost: 8.31..1908.04

  • Rows: Expected 10 rows

  • Actual Time: 0.824..10.114 ms

  • Loops: 1

This joins the orders table with the users table using a hash join.


πŸ“„ Seq Scan on orders o

  • Type: Seq Scan

  • Cost: 0.00..1637.09

  • Rows: 100,009

  • Actual Time: 0.007..5.726 ms

  • Loops: 1

  • Rows Scanned: 100,000

This is a full table scan of the orders table β€” very inefficient. It's scanning all 100,000 rows just to find a few matching orders for one user.


🧠 Hash on users u

  • Type: Hash

  • Cost: 8.30..8.30

  • Rows: 1

  • Actual Time: 0.017..0.018 ms

  • Loops: 1

This builds a hash table from the result of the inner index scan (on users), to speed up the join with orders.


πŸ” Index Scan using users_email_key on users u

  • Type: Index Scan

  • Cost: 0.29..8.30

  • Rows: 1

  • Actual Time: 0.015..0.015 ms

  • Loops: 1

This is efficient β€” it uses the unique index on users.email to find the user user500@example.com.


πŸ” Index Scan using products_pkey on products p

  • Type: Index Scan

  • Cost: 0.28..0.30

  • Rows: 1

  • Actual Time: 0.002..0.002 ms

  • Loops: 11

This scans the products table using the primary key (product_id) β€” efficient per-row lookup, but it runs 11 times (once per row from the hash join).


🧹 Filter: (category = 'Electronics'::text)

  • Rows Removed by Filter: 1

Even though the index finds the product by ID, the category filter is applied after the index lookup β€” meaning the index doesn't help with filtering by category.


πŸ“Š Summary Table

Step
Type
Rows Scanned
Time (ms)
Notes

Seq Scan on orders

Sequential Scan

100,000

~5.7

❌ Inefficient

Index Scan on users

Index Scan

1

~0.015

βœ… Efficient

Hash Join

Hash Join

11 rows

~10.1

OK, but could be better

Index Scan on products

Index Scan

11 times

~0.002 x11

βœ… Efficient per row

Filter on category

In-memory filter

1 removed

N/A

❌ Not indexed


🧠 Key Observations

  • The main bottleneck is the full scan of the orders table β€” it's scanning 100,000 rows to find a few matching orders.

  • The users table is efficiently accessed via an index on email.

  • The products table is accessed efficiently via its primary key, but the category filter is applied after the index lookup.

  • The Nested Loop and Hash Join are not the problem β€” they're just the tools used to join data. The real issue is missing indexes on the join and filter columns.


βœ… How to Improve This

1. Add an index on orders.user_id

This will allow PostgreSQL to use an index scan instead of a full table scan when joining orders with users.


2. Add a composite index on (orders.user_id, orders.product_id)

This allows PostgreSQL to efficiently join and filter in one step.


3. Add an index on products.category

This allows PostgreSQL to filter by category using an index, instead of scanning and filtering in memory.

After index creation

explanation

πŸ” Outer Loop: Nested Loop

  • Type: Nested Loop

  • Cost: 4.93..52.09

  • Rows: Expected 2 rows

  • Actual Time: 0.652..0.680 ms

  • Loops: 1

This is the top-level loop joining the result of the inner loop (user + orders) with the products table.


πŸ” Inner Loop: Nested Loop

  • Type: Nested Loop

  • Cost: 4.66..49.14

  • Rows: Expected 10 rows

  • Actual Time: 0.648..0.661 ms

  • Loops: 1

This loop joins the users table with the orders table.


πŸ” Index Scan using users_email_key on users u

  • Type: Index Scan

  • Cost: 0.29..8.30

  • Rows: 1

  • Actual Time: 0.630..0.630 ms

  • Loops: 1

Efficiently finds the user user500@example.com using the unique index on email.


🧠 Bitmap Heap Scan on orders o

  • Type: Bitmap Heap Scan

  • Cost: 4.37..40.74

  • Rows: 10

  • Actual Time: 0.015..0.027 ms

  • Loops: 1

  • Heap Blocks: 11 exact matches

This is where the index on orders(user_id) (or part of a composite index) shines:

  • It finds all orders for the user efficiently.

  • It uses a Bitmap Index Scan to gather matching TIDs (tuple IDs), then fetches the actual rows.


πŸ“¦ Bitmap Index Scan on idx_orders_user_product

  • Type: Bitmap Index Scan

  • Cost: 0.00..4.37

  • Rows: 10

  • Actual Time: 0.008..0.008 ms

  • Loops: 1

This is the core optimization:

  • It uses your composite index on (user_id, product_id) to find matching orders.

  • It's very fast and avoids scanning the entire table.


πŸ” Final Loop: Index Scan using products_pkey on products p

  • Type: Index Scan

  • Cost: 0.28..0.30

  • Rows: 1

  • Actual Time: 0.002..0.002 ms

  • Loops: 11

This scans the products table using the primary key (product_id) β€” efficient per-row lookup.


🧹 Filter: (category = 'Electronics'::text)

  • Rows Removed by Filter: 1

This is still a post-index filter β€” the index on products(product_id) doesn't help with filtering by category.


πŸ“Š Summary Table

Step
Type
Rows Scanned
Time (ms)
Notes

Index Scan on users

Index Scan

1

~0.63

βœ… Efficient

Bitmap Index Scan on orders

Bitmap Index Scan

11

~0.008

βœ… Efficient

Bitmap Heap Scan on orders

Bitmap Heap Scan

11

~0.015

βœ… Efficient

Index Scan on products

Index Scan

11 times

~0.002 x11

βœ… Efficient

Filter on category

In-memory filter

1 removed

N/A

❌ Not indexed


🧠 Key Improvements

Before
After
Improvement

Full scan of 100,000 rows in orders

Only 11 rows scanned

βœ… 99.99% fewer rows

High execution time (~10 ms)

Execution time ~0.7 ms

βœ… ~14x faster

No index on orders.user_id

Composite index used

βœ… Index-based join

No index on products.category

Still missing

⚠️ Opportunity for further optimization


βœ… Final Recommendation

You've already made huge performance gains by adding:

  • CREATE INDEX idx_orders_user_product ON orders(user_id, product_id);

  • CREATE INDEX idx_products_category ON products(category);

πŸš€ Bonus Tip: If you add the category condition to the index on orders, you can avoid the final in-memory filter:

Or better yet, if you often filter by category:

This is a partial index, and it can be extremely fast for common filters.

Last updated

Was this helpful?