Index examples

Simple one table example

Test data preparation

CREATE TABLE persons (
    id SERIAL PRIMARY KEY,
    email TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

INSERT INTO persons (email)
SELECT md5(random()::text) || '@example.com'
FROM generate_series(1, 1000000);

Execute a command while there is no index created for email column

EXPLAIN ANALYZE SELECT * FROM persons WHERE email = 'test@example.com';

What you can see (IIUC when number of rows is about ~100k):

Seq Scan on persons (cost=0.00..2387.00 rows=1 width=57) (actual time=3.825..3.825 rows=0 loops=1)
Filter: (email = 'test@example.com'::text)
Rows Removed by Filter: 100000
Planning Time: 0.107 ms
Execution Time: 3.836 ms

`Seq Scan on persons`

Meaning: PostgreSQL performed a sequential scan (full table scan) on the table persons.
Implication: Since there's no index on the email column, PostgreSQL had to scan every row to find a match.

2. `(cost=0.00..2387.00 rows=1 width=57)`

Meaning: This is the query planner's estimate:
- 0.00: Startup cost (cost to begin the operation).
- 2387.00: Total estimated cost to complete the operation.
- rows=1: Estimated number of rows returned.
- width=57: Estimated average size (in bytes) of each row.
Implication: The planner expected to find 1 matching row, but had to scan all rows (which it estimated would cost 2387 units of work).

3. `(actual time=3.825..3.825 rows=0 loops=1)`

Meaning:
- actual time: Time taken to execute this node in milliseconds.
  - First number (3.825): Time to return the first row.
  - Second number (3.825): Time to return all rows.
- rows=0: Actual number of rows returned.
- loops=1: Number of times this operation was executed (usually 1 unless in a loop or subquery).
Implication: The query took ~3.825 ms to scan the entire table and found no rows matching email = 'test@example.com'.

4. `Filter: (email = 'test@example.com'::text)`

Meaning: The condition used to filter rows during the scan.
Implication: Since there's no index, the filter is applied to every row during the scan.

5. `Rows Removed by Filter: 100000`

Meaning: Out of 100,000 rows in the table, all were scanned and none matched the filter condition.
Implication: This confirms that the query had to process all rows, and none met the condition — which is inefficient without an index.

6. `Planning Time: 0.107 ms`

Meaning: Time taken by PostgreSQL to plan the query execution.
Implication: Very fast, as expected for a simple query.

7. `Execution Time: 3.836 ms`

Meaning: Total time taken to execute the query from start to finish.
Implication: This is the real-world performance metric. Without an index, scanning 100,000 rows took ~3.8 ms — acceptable for small datasets, but could be much slower on larger tables.

What you can see (when number of rows is about ~1,1kk)

Gather  (cost=1000.00..19226.06 rows=1 width=57) (actual time=20.481..22.033 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on persons  (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)
        Filter: (email = 'test@example.com'::text)
        Rows Removed by Filter: 366667
Planning Time: 0.037 ms
Execution Time: 22.043 ms

Gather (cost=1000.00..19226.06 rows=1 width=57) (actual time=20.481..22.033 rows=0 loops=1)

Meaning: This is the Gather node, which collects results from parallel worker processes.
Cost: Estimated cost range for the entire operation.
Rows: Estimated number of rows returned (still 1).
Actual Time: Time taken to gather results from all workers (20.481 ms to get first row, 22.033 ms to finish).
Rows: 0 rows returned (no match found).
Loops: 1 (the gather operation ran once).

2. `Workers Planned: 2`

Meaning: PostgreSQL planned to use 2 parallel workers to speed up the scan.
Implication: The query planner decided that the table was large enough to benefit from parallelism.

3. `Workers Launched: 2`

Meaning: PostgreSQL successfully launched 2 parallel workers.
Implication: The scan was split across 3 processes (1 main + 2 workers).

4. `-> Parallel Seq Scan on persons (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)`

Meaning: Each worker (and the main process) performed a parallel sequential scan on a portion of the persons table.
Cost: Estimated cost for this node (lower than the gather node).
Actual Time: Time taken per scan — all workers finished in ~18.120 ms.
Rows: 0 rows returned.
Loops: 3 (main process + 2 workers).

5. `Filter: (email = 'test@example.com'::text)`

Meaning: The same filter condition as before — applied during the scan.
Implication: No index, so every row had to be checked.

6. `Rows Removed by Filter: 366667`

Meaning: Each worker scanned ~366,667 rows and removed them all because they didn’t match the filter.
Total Rows Scanned: ~366,667 × 3 = 1,100,001 rows
Implication: The entire table was scanned in parallel, but still no match found.

7. `Planning Time: 0.037 ms`

Meaning: Time taken to plan the query.
Implication: Very fast, as the query is simple.

8. `Execution Time: 22.043 ms`

Meaning: Total time taken to execute the query.
Implication: Even with parallelism, scanning over a million rows took ~22 ms — and still no match.

Create an index for email column

CREATE INDEX idx_persons_email ON persons(email);

Index Scan using idx_persons_email on persons  (cost=0.43..8.45 rows=1 width=57) (actual time=0.018..0.018 rows=0 loops=1)
  Index Cond: (email = 'test@example.com'::text)
Planning Time: 0.176 ms
Execution Time: 0.025 ms

Index Scan using idx_persons_email on persons

Meaning: PostgreSQL used an index scan on the idx_persons_email index to find matching rows.
Implication: Instead of scanning the entire table, it used the B-tree index to directly locate the rows where email = 'test@example.com'.

2. `(cost=0.43..8.45 rows=1 width=57)`

Meaning: Estimated cost of the operation:
- 0.43: Startup cost (cost to begin the index scan).
- 8.45: Total estimated cost to retrieve all matching rows.
- rows=1: Estimated number of rows returned.
- width=57: Estimated average size (in bytes) of each row.
Implication: The query planner expects this to be a very fast operation, as it involves only a few index node lookups and possibly one table row fetch.

3. `(actual time=0.018..0.018 rows=0 loops=1)`

Meaning:
- actual time: Time taken to execute this node.
  - First number (0.018): Time to return the first row.
  - Second number (0.018): Time to return all rows.
- rows=0: No rows matched the condition.
- loops=1: The scan was executed once.
Implication: The index allowed PostgreSQL to quickly determine that no matching rows exist, without scanning the entire table.

4. `Index Cond: (email = 'test@example.com'::text)`

Meaning: The condition used to search the index.
Implication: The index was used effectively to evaluate the WHERE clause.

5. `Planning Time: 0.176 ms`

Meaning: Time taken by PostgreSQL to plan the query.
Implication: Very fast, as expected for a simple indexed query.

More complex example

Asssume there are tables

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
...
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
...
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    user_id INT REFERENCES users(user_id),
    product_id INT REFERENCES products(product_id),
...
);

EXPLAIN ANALYZE
SELECT o.order_id, p.name, o.quantity, p.price
FROM orders o
JOIN users u ON o.user_id = u.user_id
JOIN products p ON o.product_id = p.product_id
WHERE u.email = 'user500@example.com'
  AND p.category = 'Electronics';

Nested Loop  (cost=8.59..1911.00 rows=2 width=25) (actual time=0.829..10.144 rows=2 loops=1)
  ->  Hash Join  (cost=8.31..1908.04 rows=10 width=12) (actual time=0.824..10.114 rows=11 loops=1)
        Hash Cond: (o.user_id = u.user_id)
        ->  Seq Scan on orders o  (cost=0.00..1637.09 rows=100009 width=16) (actual time=0.007..5.726 rows=100000 loops=1)
        ->  Hash  (cost=8.30..8.30 rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 9kB
              ->  Index Scan using users_email_key on users u  (cost=0.29..8.30 rows=1 width=4) (actual time=0.015..0.015 rows=1 loops=1)
                    Index Cond: (email = 'user500@example.com'::text)
  ->  Index Scan using products_pkey on products p  (cost=0.28..0.30 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=11)
        Index Cond: (product_id = o.product_id)
        Filter: (category = 'Electronics'::text)
        Rows Removed by Filter: 1
Planning Time: 0.217 ms
Execution Time: 10.161 ms

explanation

🔁 Outer Loop: `Nested Loop`

Type: Nested Loop
Cost: 8.59..1911.00
Rows: Expected 2 rows
Actual Time: 0.829..10.144 ms
Loops: 1

This is the top-level operation. It joins the result of the Hash Join (inner query) with the products table using a nested loop.

🔁 Inner Loop: `Hash Join`

Type: Hash Join
Cost: 8.31..1908.04
Rows: Expected 10 rows
Actual Time: 0.824..10.114 ms
Loops: 1

This joins the orders table with the users table using a hash join.

📄 `Seq Scan on orders o`

Type: Seq Scan
Cost: 0.00..1637.09
Rows: 100,009
Actual Time: 0.007..5.726 ms
Loops: 1
Rows Scanned: 100,000

This is a full table scan of the orders table — very inefficient. It's scanning all 100,000 rows just to find a few matching orders for one user.

🧠 `Hash` on users u

Type: Hash
Cost: 8.30..8.30
Rows: 1
Actual Time: 0.017..0.018 ms
Loops: 1

This builds a hash table from the result of the inner index scan (on users), to speed up the join with orders.

🔍 `Index Scan using users_email_key on users u`

Type: Index Scan
Cost: 0.29..8.30
Rows: 1
Actual Time: 0.015..0.015 ms
Loops: 1

This is efficient — it uses the unique index on users.email to find the user user500@example.com.

🔁 `Index Scan using products_pkey on products p`

Type: Index Scan
Cost: 0.28..0.30
Rows: 1
Actual Time: 0.002..0.002 ms
Loops: 11

This scans the products table using the primary key (product_id) — efficient per-row lookup, but it runs 11 times (once per row from the hash join).

🧹 `Filter: (category = 'Electronics'::text)`

Rows Removed by Filter: 1

Even though the index finds the product by ID, the category filter is applied after the index lookup — meaning the index doesn't help with filtering by category.

📊 Summary Table

Step

Type

Rows Scanned

Time (ms)

Notes

Seq Scan on orders

Sequential Scan

100,000

~5.7

❌ Inefficient

Index Scan on users

Index Scan

~0.015

✅ Efficient

Hash Join

Hash Join

11 rows

~10.1

OK, but could be better

Index Scan on products

Index Scan

11 times

~0.002 x11

✅ Efficient per row

Filter on category

In-memory filter

1 removed

N/A

❌ Not indexed

🧠 Key Observations

The main bottleneck is the full scan of the orders table — it's scanning 100,000 rows to find a few matching orders.
The users table is efficiently accessed via an index on email.
The products table is accessed efficiently via its primary key, but the category filter is applied after the index lookup.
The Nested Loop and Hash Join are not the problem — they're just the tools used to join data. The real issue is missing indexes on the join and filter columns.

✅ How to Improve This

1. Add an index on `orders.user_id`

CREATE INDEX idx_orders_user_id ON orders(user_id);

This will allow PostgreSQL to use an index scan instead of a full table scan when joining orders with users.

2. Add a composite index on `(orders.user_id, orders.product_id)`

CREATE INDEX idx_orders_user_product ON orders(user_id, product_id);

This allows PostgreSQL to efficiently join and filter in one step.

3. Add an index on `products.category`

CREATE INDEX idx_products_category ON products(category);

This allows PostgreSQL to filter by category using an index, instead of scanning and filtering in memory.

After index creation

CREATE INDEX idx_orders_user_product ON orders(user_id, product_id);
CREATE INDEX idx_products_category ON products(category);

Nested Loop  (cost=4.93..52.09 rows=2 width=25) (actual time=0.652..0.680 rows=2 loops=1)
  ->  Nested Loop  (cost=4.66..49.14 rows=10 width=12) (actual time=0.648..0.661 rows=11 loops=1)
        ->  Index Scan using users_email_key on users u  (cost=0.29..8.30 rows=1 width=4) (actual time=0.630..0.630 rows=1 loops=1)
              Index Cond: (email = 'user500@example.com'::text)
        ->  Bitmap Heap Scan on orders o  (cost=4.37..40.74 rows=10 width=16) (actual time=0.015..0.027 rows=11 loops=1)
              Recheck Cond: (user_id = u.user_id)
              Heap Blocks: exact=11
              ->  Bitmap Index Scan on idx_orders_user_product  (cost=0.00..4.37 rows=10 width=0) (actual time=0.008..0.008 rows=11 loops=1)
                    Index Cond: (user_id = u.user_id)
  ->  Index Scan using products_pkey on products p  (cost=0.28..0.30 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=11)
        Index Cond: (product_id = o.product_id)
        Filter: (category = 'Electronics'::text)
        Rows Removed by Filter: 1
Planning Time: 3.442 ms
Execution Time: 0.698 ms

explanation

🔁 Outer Loop: `Nested Loop`

Type: Nested Loop
Cost: 4.93..52.09
Rows: Expected 2 rows
Actual Time: 0.652..0.680 ms
Loops: 1

This is the top-level loop joining the result of the inner loop (user + orders) with the products table.

🔁 Inner Loop: `Nested Loop`

Type: Nested Loop
Cost: 4.66..49.14
Rows: Expected 10 rows
Actual Time: 0.648..0.661 ms
Loops: 1

This loop joins the users table with the orders table.

🔍 `Index Scan using users_email_key on users u`

Type: Index Scan
Cost: 0.29..8.30
Rows: 1
Actual Time: 0.630..0.630 ms
Loops: 1

Efficiently finds the user user500@example.com using the unique index on email.

🧠 `Bitmap Heap Scan on orders o`

Type: Bitmap Heap Scan
Cost: 4.37..40.74
Rows: 10
Actual Time: 0.015..0.027 ms
Loops: 1
Heap Blocks: 11 exact matches

This is where the index on orders(user_id) (or part of a composite index) shines:

It finds all orders for the user efficiently.
It uses a Bitmap Index Scan to gather matching TIDs (tuple IDs), then fetches the actual rows.

📦 `Bitmap Index Scan on idx_orders_user_product`

Type: Bitmap Index Scan
Cost: 0.00..4.37
Rows: 10
Actual Time: 0.008..0.008 ms
Loops: 1

This is the core optimization:

It uses your composite index on (user_id, product_id) to find matching orders.
It's very fast and avoids scanning the entire table.

🔁 Final Loop: `Index Scan using products_pkey on products p`

Type: Index Scan
Cost: 0.28..0.30
Rows: 1
Actual Time: 0.002..0.002 ms
Loops: 11

This scans the products table using the primary key (product_id) — efficient per-row lookup.

🧹 `Filter: (category = 'Electronics'::text)`

Rows Removed by Filter: 1

This is still a post-index filter — the index on products(product_id) doesn't help with filtering by category.

📊 Summary Table

Step

Type

Rows Scanned

Time (ms)

Notes

Index Scan on users

Index Scan

~0.63

✅ Efficient

Bitmap Index Scan on orders

Bitmap Index Scan

~0.008

✅ Efficient

Bitmap Heap Scan on orders

Bitmap Heap Scan

~0.015

✅ Efficient

Index Scan on products

Index Scan

11 times

~0.002 x11

✅ Efficient

Filter on category

In-memory filter

1 removed

N/A

❌ Not indexed

🧠 Key Improvements

Before

After

Improvement

Full scan of 100,000 rows in orders

Only 11 rows scanned

✅ 99.99% fewer rows

High execution time (~10 ms)

Execution time ~0.7 ms

✅ ~14x faster

No index on orders.user_id

Composite index used

✅ Index-based join

No index on products.category

Still missing

⚠️ Opportunity for further optimization

✅ Final Recommendation

You've already made huge performance gains by adding:

CREATE INDEX idx_orders_user_product ON orders(user_id, product_id);
CREATE INDEX idx_products_category ON products(category);

🚀 Bonus Tip: If you add the category condition to the index on orders, you can avoid the final in-memory filter:

CREATE INDEX idx_orders_user_category ON orders(user_id, product_id, (category = 'Electronics'));

Or better yet, if you often filter by category:

CREATE INDEX idx_orders_user_category ON orders(user_id, product_id)
WHERE category = 'Electronics';

This is a partial index, and it can be extremely fast for common filters.

SELECT * FROM pg_stat_user_indexes;

PreviousRelations NextBackup

Last updated 3 months ago

Was this helpful?

Simple one table example

What you can see (IIUC when number of rows is about ~100k):

Seq Scan on persons

2. (cost=0.00..2387.00 rows=1 width=57)

3. (actual time=3.825..3.825 rows=0 loops=1)

4. Filter: (email = 'test@example.com'::text)

5. Rows Removed by Filter: 100000

6. Planning Time: 0.107 ms

7. Execution Time: 3.836 ms

What you can see (when number of rows is about ~1,1kk)

2. Workers Planned: 2

3. Workers Launched: 2

4. -> Parallel Seq Scan on persons (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)

5. Filter: (email = 'test@example.com'::text)

6. Rows Removed by Filter: 366667

7. Planning Time: 0.037 ms

8. Execution Time: 22.043 ms

Create an index for email column

2. (cost=0.43..8.45 rows=1 width=57)

3. (actual time=0.018..0.018 rows=0 loops=1)

4. Index Cond: (email = 'test@example.com'::text)

5. Planning Time: 0.176 ms

More complex example

🔁 Outer Loop: Nested Loop

🔁 Inner Loop: Hash Join

📄 Seq Scan on orders o

🧠 Hash on users u

🔍 Index Scan using users_email_key on users u

🔁 Index Scan using products_pkey on products p

🧹 Filter: (category = 'Electronics'::text)

📊 Summary Table

🧠 Key Observations

✅ How to Improve This

1. Add an index on orders.user_id

2. Add a composite index on (orders.user_id, orders.product_id)

3. Add an index on products.category

🔁 Outer Loop: Nested Loop

🔁 Inner Loop: Nested Loop

🔍 Index Scan using users_email_key on users u

🧠 Bitmap Heap Scan on orders o

📦 Bitmap Index Scan on idx_orders_user_product

🔁 Final Loop: Index Scan using products_pkey on products p

🧹 Filter: (category = 'Electronics'::text)

📊 Summary Table

🧠 Key Improvements

✅ Final Recommendation

`Seq Scan on persons`

2. `(cost=0.00..2387.00 rows=1 width=57)`

3. `(actual time=3.825..3.825 rows=0 loops=1)`

4. `Filter: (email = 'test@example.com'::text)`

5. `Rows Removed by Filter: 100000`

6. `Planning Time: 0.107 ms`

7. `Execution Time: 3.836 ms`

2. `Workers Planned: 2`

3. `Workers Launched: 2`

4. `-> Parallel Seq Scan on persons (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)`

5. `Filter: (email = 'test@example.com'::text)`

6. `Rows Removed by Filter: 366667`

7. `Planning Time: 0.037 ms`

8. `Execution Time: 22.043 ms`

2. `(cost=0.43..8.45 rows=1 width=57)`

3. `(actual time=0.018..0.018 rows=0 loops=1)`

4. `Index Cond: (email = 'test@example.com'::text)`

5. `Planning Time: 0.176 ms`

🔁 Outer Loop: `Nested Loop`

🔁 Inner Loop: `Hash Join`

📄 `Seq Scan on orders o`

🧠 `Hash` on users u

🔍 `Index Scan using users_email_key on users u`

🔁 `Index Scan using products_pkey on products p`

🧹 `Filter: (category = 'Electronics'::text)`

1. Add an index on `orders.user_id`

2. Add a composite index on `(orders.user_id, orders.product_id)`

3. Add an index on `products.category`

🔁 Outer Loop: `Nested Loop`

🔁 Inner Loop: `Nested Loop`

🔍 `Index Scan using users_email_key on users u`

🧠 `Bitmap Heap Scan on orders o`

📦 `Bitmap Index Scan on idx_orders_user_product`

🔁 Final Loop: `Index Scan using products_pkey on products p`

🧹 `Filter: (category = 'Electronics'::text)`