Citus: distributed postgres

Citus is an open-source PostgreSQL extension that transforms PostgreSQL into a distributed database, enabling horizontal scaling through sharding and distributed query execution. Citus maintains PostgreSQL compatibility, allowing you to use standard SQL, indexes, transactions, and JSON support while scaling out across multiple nodes.

Key Concepts in Citus:

  1. Coordinator and Worker Nodes:

    • Coordinator Node: Acts as the entry point for queries. It parses queries, plans distributed execution, and aggregates results.

    • Worker Nodes: Store and process individual shards of distributed tables.

  2. Sharding Strategies:

    • Hash Distribution: Rows are distributed across shards using a hash of the distribution column (e.g., user_id). Best for even distribution.

    • Range Distribution: Used for time-series data, where data is distributed based on a range (e.g., created_at).

    • Reference Tables: Small lookup tables replicated across all workers for efficient joins.

  3. Distributed Query Execution:

    • Citus pushes down query execution to the worker nodes where the data resides, minimizing data transfer.

    • Supports distributed JOINs, aggregations, subqueries, and CTEs.

  4. Colocation:

    • Tables that share the same distribution column can be colocated, enabling efficient joins and transactions across them.

  5. High Availability and Failover:

    • Citus supports replication of shards across workers for fault tolerance.

    • With Citus Enterprise or Citus Cloud, automatic failover and rebalancing are available.

  6. Use Cases:

    • Multi-tenant SaaS applications

    • Real-time analytics

    • Time-series data platforms

    • Large-scale transactional systems

Install single-node citus

docker run -d --name citus -p 5432:5432 -e POSTGRES_PASSWORD=mypass \
           citusdata/citus:13.0

psql -U postgres -h localhost -d postgres -c "SELECT * FROM citus_version();"

Last updated

Was this helpful?