Index examples

Simple one table example

Test data preparation

CREATE TABLE persons (
    id SERIAL PRIMARY KEY,
    email TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

INSERT INTO persons (email)
SELECT md5(random()::text) || '@example.com'
FROM generate_series(1, 1000000);

Execute a command while there is no index created for email column

EXPLAIN ANALYZE SELECT * FROM persons WHERE email = 'test@example.com';

What you can see (IIUC when number of rows is about ~100k):

Seq Scan on persons (cost=0.00..2387.00 rows=1 width=57) (actual time=3.825..3.825 rows=0 loops=1)
Filter: (email = 'test@example.com'::text)
Rows Removed by Filter: 100000
Planning Time: 0.107 ms
Execution Time: 3.836 ms

Seq Scan on persons

  • Meaning: PostgreSQL performed a sequential scan (full table scan) on the table persons.

  • Implication: Since there's no index on the email column, PostgreSQL had to scan every row to find a match.


2. (cost=0.00..2387.00 rows=1 width=57)

  • Meaning: This is the query planner's estimate:

    • 0.00: Startup cost (cost to begin the operation).

    • 2387.00: Total estimated cost to complete the operation.

    • rows=1: Estimated number of rows returned.

    • width=57: Estimated average size (in bytes) of each row.

  • Implication: The planner expected to find 1 matching row, but had to scan all rows (which it estimated would cost 2387 units of work).


3. (actual time=3.825..3.825 rows=0 loops=1)

  • Meaning:

    • actual time: Time taken to execute this node in milliseconds.

      • First number (3.825): Time to return the first row.

      • Second number (3.825): Time to return all rows.

    • rows=0: Actual number of rows returned.

    • loops=1: Number of times this operation was executed (usually 1 unless in a loop or subquery).

  • Implication: The query took ~3.825 ms to scan the entire table and found no rows matching email = 'test@example.com'.


4. Filter: (email = 'test@example.com'::text)

  • Meaning: The condition used to filter rows during the scan.

  • Implication: Since there's no index, the filter is applied to every row during the scan.


5. Rows Removed by Filter: 100000

  • Meaning: Out of 100,000 rows in the table, all were scanned and none matched the filter condition.

  • Implication: This confirms that the query had to process all rows, and none met the condition — which is inefficient without an index.


6. Planning Time: 0.107 ms

  • Meaning: Time taken by PostgreSQL to plan the query execution.

  • Implication: Very fast, as expected for a simple query.


7. Execution Time: 3.836 ms

  • Meaning: Total time taken to execute the query from start to finish.

  • Implication: This is the real-world performance metric. Without an index, scanning 100,000 rows took ~3.8 ms — acceptable for small datasets, but could be much slower on larger tables.

What you can see (when number of rows is about ~1,1kk)

Gather  (cost=1000.00..19226.06 rows=1 width=57) (actual time=20.481..22.033 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on persons  (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)
        Filter: (email = 'test@example.com'::text)
        Rows Removed by Filter: 366667
Planning Time: 0.037 ms
Execution Time: 22.043 ms

Gather (cost=1000.00..19226.06 rows=1 width=57) (actual time=20.481..22.033 rows=0 loops=1)

  • Meaning: This is the Gather node, which collects results from parallel worker processes.

  • Cost: Estimated cost range for the entire operation.

  • Rows: Estimated number of rows returned (still 1).

  • Actual Time: Time taken to gather results from all workers (20.481 ms to get first row, 22.033 ms to finish).

  • Rows: 0 rows returned (no match found).

  • Loops: 1 (the gather operation ran once).


2. Workers Planned: 2

  • Meaning: PostgreSQL planned to use 2 parallel workers to speed up the scan.

  • Implication: The query planner decided that the table was large enough to benefit from parallelism.


3. Workers Launched: 2

  • Meaning: PostgreSQL successfully launched 2 parallel workers.

  • Implication: The scan was split across 3 processes (1 main + 2 workers).


4. -> Parallel Seq Scan on persons (cost=0.00..18225.96 rows=1 width=57) (actual time=18.120..18.120 rows=0 loops=3)

  • Meaning: Each worker (and the main process) performed a parallel sequential scan on a portion of the persons table.

  • Cost: Estimated cost for this node (lower than the gather node).

  • Actual Time: Time taken per scan — all workers finished in ~18.120 ms.

  • Rows: 0 rows returned.

  • Loops: 3 (main process + 2 workers).


5. Filter: (email = 'test@example.com'::text)

  • Meaning: The same filter condition as before — applied during the scan.

  • Implication: No index, so every row had to be checked.


6. Rows Removed by Filter: 366667

  • Meaning: Each worker scanned ~366,667 rows and removed them all because they didn’t match the filter.

  • Total Rows Scanned: ~366,667 × 3 = 1,100,001 rows

  • Implication: The entire table was scanned in parallel, but still no match found.


7. Planning Time: 0.037 ms

  • Meaning: Time taken to plan the query.

  • Implication: Very fast, as the query is simple.


8. Execution Time: 22.043 ms

  • Meaning: Total time taken to execute the query.

  • Implication: Even with parallelism, scanning over a million rows took ~22 ms — and still no match.

Create an index for email column

CREATE INDEX idx_persons_email ON persons(email);
Index Scan using idx_persons_email on persons  (cost=0.43..8.45 rows=1 width=57) (actual time=0.018..0.018 rows=0 loops=1)
  Index Cond: (email = 'test@example.com'::text)
Planning Time: 0.176 ms
Execution Time: 0.025 ms

Index Scan using idx_persons_email on persons

  • Meaning: PostgreSQL used an index scan on the idx_persons_email index to find matching rows.

  • Implication: Instead of scanning the entire table, it used the B-tree index to directly locate the rows where email = 'test@example.com'.


2. (cost=0.43..8.45 rows=1 width=57)

  • Meaning: Estimated cost of the operation:

    • 0.43: Startup cost (cost to begin the index scan).

    • 8.45: Total estimated cost to retrieve all matching rows.

    • rows=1: Estimated number of rows returned.

    • width=57: Estimated average size (in bytes) of each row.

  • Implication: The query planner expects this to be a very fast operation, as it involves only a few index node lookups and possibly one table row fetch.


3. (actual time=0.018..0.018 rows=0 loops=1)

  • Meaning:

    • actual time: Time taken to execute this node.

      • First number (0.018): Time to return the first row.

      • Second number (0.018): Time to return all rows.

    • rows=0: No rows matched the condition.

    • loops=1: The scan was executed once.

  • Implication: The index allowed PostgreSQL to quickly determine that no matching rows exist, without scanning the entire table.


4. Index Cond: (email = 'test@example.com'::text)

  • Meaning: The condition used to search the index.

  • Implication: The index was used effectively to evaluate the WHERE clause.


5. Planning Time: 0.176 ms

  • Meaning: Time taken by PostgreSQL to plan the query.

  • Implication: Very fast, as expected for a simple indexed query.

More complex example

Asssume there are tables

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
...
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
...
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    user_id INT REFERENCES users(user_id),
    product_id INT REFERENCES products(product_id),
...
);
EXPLAIN ANALYZE
SELECT o.order_id, p.name, o.quantity, p.price
FROM orders o
JOIN users u ON o.user_id = u.user_id
JOIN products p ON o.product_id = p.product_id
WHERE u.email = 'user500@example.com'
  AND p.category = 'Electronics';
Nested Loop  (cost=8.59..1911.00 rows=2 width=25) (actual time=0.829..10.144 rows=2 loops=1)
  ->  Hash Join  (cost=8.31..1908.04 rows=10 width=12) (actual time=0.824..10.114 rows=11 loops=1)
        Hash Cond: (o.user_id = u.user_id)
        ->  Seq Scan on orders o  (cost=0.00..1637.09 rows=100009 width=16) (actual time=0.007..5.726 rows=100000 loops=1)
        ->  Hash  (cost=8.30..8.30 rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 9kB
              ->  Index Scan using users_email_key on users u  (cost=0.29..8.30 rows=1 width=4) (actual time=0.015..0.015 rows=1 loops=1)
                    Index Cond: (email = 'user500@example.com'::text)
  ->  Index Scan using products_pkey on products p  (cost=0.28..0.30 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=11)
        Index Cond: (product_id = o.product_id)
        Filter: (category = 'Electronics'::text)
        Rows Removed by Filter: 1
Planning Time: 0.217 ms
Execution Time: 10.161 ms
explanation

🔁 Outer Loop: Nested Loop

  • Type: Nested Loop

  • Cost: 8.59..1911.00

  • Rows: Expected 2 rows

  • Actual Time: 0.829..10.144 ms

  • Loops: 1

This is the top-level operation. It joins the result of the Hash Join (inner query) with the products table using a nested loop.


🔁 Inner Loop: Hash Join

  • Type: Hash Join

  • Cost: 8.31..1908.04

  • Rows: Expected 10 rows

  • Actual Time: 0.824..10.114 ms

  • Loops: 1

This joins the orders table with the users table using a hash join.


📄 Seq Scan on orders o

  • Type: Seq Scan

  • Cost: 0.00..1637.09

  • Rows: 100,009

  • Actual Time: 0.007..5.726 ms

  • Loops: 1

  • Rows Scanned: 100,000

This is a full table scan of the orders table — very inefficient. It's scanning all 100,000 rows just to find a few matching orders for one user.


🧠 Hash on users u

  • Type: Hash

  • Cost: 8.30..8.30

  • Rows: 1

  • Actual Time: 0.017..0.018 ms

  • Loops: 1

This builds a hash table from the result of the inner index scan (on users), to speed up the join with orders.


🔍 Index Scan using users_email_key on users u

  • Type: Index Scan

  • Cost: 0.29..8.30

  • Rows: 1

  • Actual Time: 0.015..0.015 ms

  • Loops: 1

This is efficient — it uses the unique index on users.email to find the user user500@example.com.


🔁 Index Scan using products_pkey on products p

  • Type: Index Scan

  • Cost: 0.28..0.30

  • Rows: 1

  • Actual Time: 0.002..0.002 ms

  • Loops: 11

This scans the products table using the primary key (product_id) — efficient per-row lookup, but it runs 11 times (once per row from the hash join).


🧹 Filter: (category = 'Electronics'::text)

  • Rows Removed by Filter: 1

Even though the index finds the product by ID, the category filter is applied after the index lookup — meaning the index doesn't help with filtering by category.


📊 Summary Table

Step
Type
Rows Scanned
Time (ms)
Notes

Seq Scan on orders

Sequential Scan

100,000

~5.7

❌ Inefficient

Index Scan on users

Index Scan

1

~0.015

✅ Efficient

Hash Join

Hash Join

11 rows

~10.1

OK, but could be better

Index Scan on products

Index Scan

11 times

~0.002 x11

✅ Efficient per row

Filter on category

In-memory filter

1 removed

N/A

❌ Not indexed


🧠 Key Observations

  • The main bottleneck is the full scan of the orders table — it's scanning 100,000 rows to find a few matching orders.

  • The users table is efficiently accessed via an index on email.

  • The products table is accessed efficiently via its primary key, but the category filter is applied after the index lookup.

  • The Nested Loop and Hash Join are not the problem — they're just the tools used to join data. The real issue is missing indexes on the join and filter columns.


✅ How to Improve This

1. Add an index on orders.user_id

CREATE INDEX idx_orders_user_id ON orders(user_id);

This will allow PostgreSQL to use an index scan instead of a full table scan when joining orders with users.


2. Add a composite index on (orders.user_id, orders.product_id)

CREATE INDEX idx_orders_user_product ON orders(user_id, product_id);

This allows PostgreSQL to efficiently join and filter in one step.


3. Add an index on products.category

CREATE INDEX idx_products_category ON products(category);

This allows PostgreSQL to filter by category using an index, instead of scanning and filtering in memory.

After index creation

CREATE INDEX idx_orders_user_product ON orders(user_id, product_id);
CREATE INDEX idx_products_category ON products(category);
Nested Loop  (cost=4.93..52.09 rows=2 width=25) (actual time=0.652..0.680 rows=2 loops=1)
  ->  Nested Loop  (cost=4.66..49.14 rows=10 width=12) (actual time=0.648..0.661 rows=11 loops=1)
        ->  Index Scan using users_email_key on users u  (cost=0.29..8.30 rows=1 width=4) (actual time=0.630..0.630 rows=1 loops=1)
              Index Cond: (email = 'user500@example.com'::text)
        ->  Bitmap Heap Scan on orders o  (cost=4.37..40.74 rows=10 width=16) (actual time=0.015..0.027 rows=11 loops=1)
              Recheck Cond: (user_id = u.user_id)
              Heap Blocks: exact=11
              ->  Bitmap Index Scan on idx_orders_user_product  (cost=0.00..4.37 rows=10 width=0) (actual time=0.008..0.008 rows=11 loops=1)
                    Index Cond: (user_id = u.user_id)
  ->  Index Scan using products_pkey on products p  (cost=0.28..0.30 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=11)
        Index Cond: (product_id = o.product_id)
        Filter: (category = 'Electronics'::text)
        Rows Removed by Filter: 1
Planning Time: 3.442 ms
Execution Time: 0.698 ms
explanation

🔁 Outer Loop: Nested Loop

  • Type: Nested Loop

  • Cost: 4.93..52.09

  • Rows: Expected 2 rows

  • Actual Time: 0.652..0.680 ms

  • Loops: 1

This is the top-level loop joining the result of the inner loop (user + orders) with the products table.


🔁 Inner Loop: Nested Loop

  • Type: Nested Loop

  • Cost: 4.66..49.14

  • Rows: Expected 10 rows

  • Actual Time: 0.648..0.661 ms

  • Loops: 1

This loop joins the users table with the orders table.


🔍 Index Scan using users_email_key on users u

  • Type: Index Scan

  • Cost: 0.29..8.30

  • Rows: 1

  • Actual Time: 0.630..0.630 ms

  • Loops: 1

Efficiently finds the user user500@example.com using the unique index on email.


🧠 Bitmap Heap Scan on orders o

  • Type: Bitmap Heap Scan

  • Cost: 4.37..40.74

  • Rows: 10

  • Actual Time: 0.015..0.027 ms

  • Loops: 1

  • Heap Blocks: 11 exact matches

This is where the index on orders(user_id) (or part of a composite index) shines:

  • It finds all orders for the user efficiently.

  • It uses a Bitmap Index Scan to gather matching TIDs (tuple IDs), then fetches the actual rows.


📦 Bitmap Index Scan on idx_orders_user_product

  • Type: Bitmap Index Scan

  • Cost: 0.00..4.37

  • Rows: 10

  • Actual Time: 0.008..0.008 ms

  • Loops: 1

This is the core optimization:

  • It uses your composite index on (user_id, product_id) to find matching orders.

  • It's very fast and avoids scanning the entire table.


🔁 Final Loop: Index Scan using products_pkey on products p

  • Type: Index Scan

  • Cost: 0.28..0.30

  • Rows: 1

  • Actual Time: 0.002..0.002 ms

  • Loops: 11

This scans the products table using the primary key (product_id) — efficient per-row lookup.


🧹 Filter: (category = 'Electronics'::text)

  • Rows Removed by Filter: 1

This is still a post-index filter — the index on products(product_id) doesn't help with filtering by category.


📊 Summary Table

Step
Type
Rows Scanned
Time (ms)
Notes

Index Scan on users

Index Scan

1

~0.63

✅ Efficient

Bitmap Index Scan on orders

Bitmap Index Scan

11

~0.008

✅ Efficient

Bitmap Heap Scan on orders

Bitmap Heap Scan

11

~0.015

✅ Efficient

Index Scan on products

Index Scan

11 times

~0.002 x11

✅ Efficient

Filter on category

In-memory filter

1 removed

N/A

❌ Not indexed


🧠 Key Improvements

Before
After
Improvement

Full scan of 100,000 rows in orders

Only 11 rows scanned

✅ 99.99% fewer rows

High execution time (~10 ms)

Execution time ~0.7 ms

✅ ~14x faster

No index on orders.user_id

Composite index used

✅ Index-based join

No index on products.category

Still missing

⚠️ Opportunity for further optimization


✅ Final Recommendation

You've already made huge performance gains by adding:

  • CREATE INDEX idx_orders_user_product ON orders(user_id, product_id);

  • CREATE INDEX idx_products_category ON products(category);

🚀 Bonus Tip: If you add the category condition to the index on orders, you can avoid the final in-memory filter:

CREATE INDEX idx_orders_user_category ON orders(user_id, product_id, (category = 'Electronics'));

Or better yet, if you often filter by category:

CREATE INDEX idx_orders_user_category ON orders(user_id, product_id)
WHERE category = 'Electronics';

This is a partial index, and it can be extremely fast for common filters.

SELECT * FROM pg_stat_user_indexes;

Last updated

Was this helpful?