PostreSQL

Terminology

  • PostgreSQL instance is called a cluster, i.e. an instance can serve multiple DBs.

  • DB can be organised into schemas

  • Users

    • normal

      • operations are based on privilege set

    • super

      • do anything with any DB object

  • PostgreSQL relies on underlying filesystem

    • all content is kept in directory PGDATA

      • directory is made by write-ahead logs (WALs)

      • and by data storage

  • start_PostgreSQL_cluster();
    Process postmaster = PostgreSQL.launchProcess();
    
    postmaster.wait(for_client_connection);
    
    if (client_connection.open()) {
        Process backend_process = postmaster.forkProcess();
        backend_process.serveOnly(client_connection);
        
        when (client_connection.ends()) {
            backend_process.destroy();
        }
    }

How to install on mac

brew install libpq (sometimes I needed to execute brew link --force libpq to have pasl available)

Useful commands

connect to db psql -d mart_db -h localhost -U postgres

  • \l list all databases

  • \c database_name connect to specific database

  • \dn list all schemas

  • \df list all stored procedures and functions

  • \dv lst all views

  • \dt lists all tables in a current database

  • \dt+ get more information on tables in the current database

  • \d+ table_name get detailed information on a table

  • SET ROLE new_role; change role for the current session to the new_role

  • \du list roles

    • SELECT * FROM information_schema.table_privileges

Roles

Wonderful AWS article about users and roles with great examples.

PostgreSQL lets you grant permissions directly to the database users. However, as a good practice, it is recommended that you create multiple roles with specific sets of permissions based on application and access requirements. Then assign the appropriate role to each user.

The master user that is created during Amazon RDS and Aurora PostgreSQL instance creation should be used only for database administration tasks like creating other users, roles, and databases. The master user should never be used by the application.

⚠️Keep in mind Public schema and public role:

concept of search path: if you do not specify a schema when selecting a table this search path will be used (this is basically a list of schemas to search table in)

Detail on SET ROLE command

Commands for readonly and readwrite user creation

Inspiration article about how to split roles for db migration and application usage LINK

Role attributes

CREATE ROLE name WITH option; In this syntax, the WITH keyword is optional. And the option can be one or more attributes including SUPER, CREATEDB, CREATEROLE.

Create login role

CREATE ROLE alice LOGIN PASSWORD 'securePass1'; This will create a role with login privileges and initial password. Now it is possible to login using this role psql -d mart_db -h localhost -U alice

Create superuser role

CREATE ROLE john SUPERUSER LOGIN PASSWORD 'securePass1'; The superuser can override all access restrictions within the database therefore you should create this role only when needed. Notice that you must be a superuser in order to create another superuser role.

Create roles that can create databases

CREATE ROLE dba CREATEDB LOGIN PASSWORD 'Abcd1234';

Create roles with validity period

To set a date and time after which the role’s password is no longer valid, you use the valid until attribute:

CREATE ROLE dev_api WITH LOGIN PASSWORD 'securePass1' VALID UNTIL '2030-01-01';

Create roles with connection limit

To specify the number of concurrent connections a role can make, you use the CONNECTION LIMIT attribute

CREATE ROLE api LOGIN PASSWORD 'securePass1' CONNECTION LIMIT 1000;

Alter Role

Create database

OWNER

Assign a role that will be the owner of the database. If you omit the OWNER option, the owner of the database is the role that you use to execute the CREATE DATABASE statement.

TEMPLATE

Specify the template database from which the new database is created. By default, PostgreSQL uses the template1 database as the template database if you don’t explicitly specify the template database.

ENCODING

Determine the character set encoding in the new database.

LC_COLLATE

Specify the collation order (LC_COLLATE) that the new database will use. This parameter affects the sort order of string in the queries that contain the ORDER BY clause. It defaults to the LC_COLLATE of the template database.

LC_CTYPE

Specify the character classification that the new database will use. It affects the classification of character e.g., lower, upper, and digit. It defaults to the LC_CTYPE of the template database

TABLESPACE

Specify the tablespace name for the new database. The default is the tablespace of the template database.

CONNECTION LIMIT

Specify the maximum concurrent connections to the new database. The default is -1 i.e., unlimited. This parameter is useful in the shared hosting environments where you can configure the maximum concurrent connections for a particular database.

ALLOW_CONNECTIONS

The allow_connections parameter is a boolean value. If it is false, you cannot connect to the database.

TABLESPACE

Specify the tablespace that the new database will use. It defaults to the tablespace of the template database.

IS_TEMPLATE

If the IS_TEMPLATE is true, any user with the CREATEDB privilege can clone it. If false, only superusers or the database owner can clone it.

Schema

Create

CREATE SCHEMA [IF NOT EXISTS] schema_name;

It is also possible to create a schema for a user, in this case the schema name will match a username;

CREATE SCHEMA [IF NOT EXISTS] AUTHORIZATION username;

CREATE SCHEMA IF NOT EXISTS doe AUTHORIZATION john;

In above case it will be a schema with name doe and the owner of it will be john

PostgreSQL also allows you to create a schema and a list of objects such as tables and views using a single statement as follows:

Notice that each subcommand does not end with a semicolon (;).

\dn list schemas

SELECT * FROM pg_catalog.pg_namespace ORDER BY nspname; query schemas

Alter

Rename schema (you have to be an owner of the schema and have CREATE privilege)

ALTER SCHEMA schema_name RENAME TO new_name;

It is possible to change an owner of the schema

ALTER SCHEMA schema_name OWNER TO { new_owner | CURRENT_USER | SESSION_USER};

Drop

use CASCADE to delete schema and all of its objects, and in turn, all objects that depend on those objects. If you want to delete schema only when it is empty, you can use the RESTRICT option. By default, the DROP SCHEMA uses the RESTRICT option.

To execute the DROP SCHEMA statement, you must be the owner of the schema that you want to drop or a superuser.

Keys

Primary

With many RDBMS, the AUTO_INCREMENT keyword is used. With PostgreSQL, we use serial to auto increment our ids and set their type as integer. It is not required to manually specify the primary key value each time you insert data, PostgreSQL will do it for you if you specify serial as the type for id.

Foreign key columns are used to reference another row of data, perhaps in another table. In order to reference another row, the database needs a unique identifier for that row. Therefore, foreign key columns contain the value of the referenced row's primary key. For instance, a table might have a column called user_id as a foreign key column, which corresponds to the id column of the users table. Foreign keys are how RDBMS sets up relationships between rows of data, either in the same table or across tables.

The ON DELETE CASCADE clause indicates that if a book is deleted all reviews associated with that book are also deleted.

Besides the Primary Key, a database also allows applying a Unique constraint on any column in the database. For example, in the books table, we may choose to not only have the book_id as a unique primary key, but also need to store the book's ISBN number. The ISBN, International Standard Book Number is a 10 digit unique number for a book. We can choose to design our database to make this column unique. Even though this is not a primary key, duplicate ISBNs will not be allowed in the books table, and the data quality will be maintained.

NOT NULL

We have seen "NOT NULL" in many of our SQL statements. The NOT NULL constraint prevents a column from allowing a null value (i.e. this column must contain data). This is important for key fields. For example: If you want to add a user to a database, you must provide a username. Without this constraint the database would get filled with null values and become less useful.

Materialized View

View
Materialized View

It is a virtual table from one or more base tables which is never stored on disk

It is a physical copy of the base table and stored on disk

It is updated every time when base tables are updated

Updated manually or using triggers. One common approach is to define triggers that listen for inserts/updates on the tables that feed the materialized view and let those events execute the refresh.

It is slow processing. (if you update any content in view, it is reflected in original table. If any changes are done to original base table it would reflect in View. It makes performance of View slower.

Fast processing. It responds faster than View, because it is precomputed.

It is not precomputed as needed to compute each time when it is used or accessed. So data in view is always updated.

It is precomputed and stored on disk like an object. It is not updated each time it is used. Has to be updated manually or by trigger. Update process is Materialized View Maintenance

Use when: - The view query is fast, or you can tolerate the slowness. - You can’t tolerate stale data

Use when: - The view query is slow, and you can’t tolerate the slowness. - You have some tolerance for stale data.

Interesting article about real world usage of MATVIEW

I can understand the policy of "no VIEWs allowed". It's simply because in the past I did come across heavy queries making use of VIEWs more than once. The main problem is that if - for some reason - your predicates don't get pushed down into the VIEWs definition, you end up fetching all the rows that the VIEW in question would fetch without the predicate - then you filter through them. That's quite a penalty and you probably wouldn't use the VIEW in the first place if you knew about that.

(it was about non-Materialized View)

Executing this refresh query will lock the materialized view so it can’t be accessed while refreshing. This can be a problem if your application can’t tolerate downtime while the refresh is happening.

PostgreSQL 9.4 allows you to refresh your view in a way that enables queries during the refresh:

To better optimize your materialized view queries, you can add indexes to the materialized view columns just as you would with a database table.

Procedure

By default, PostgreSQL supports three procedural languages: SQL, PL/pgSQL, and C. You can also load other procedural languages e.g., Perl, Python, and TCL into PostgreSQL using extensions.

Advantages
Disadvantages

Reduce the number of round trips between applications and database servers. All SQL statements are wrapped inside a function stored in the PostgreSQL database server so the application only has to issue a function call to get the result back instead of sending multiple SQL statements and wait for the result between each call.

Slowness in software development because stored procedure programming requires specialized skills that many developers do not possess.

Increase application performance because the user-defined functions and stored procedures are pre-compiled and stored in the PostgreSQL database server.

Difficult to manage versions and hard to debug.

Reusable in many applications. Once you develop a function, you can reuse it in any applications.

May not be portable to other database management systems e.g., MySQL or Microsoft SQL Server.

User-defined function

examples

Function stability

Functions can have different stabilities, meaning that calling them multiple times with the same inputs can result in different outputs, or that the result can be affected by parameters that can be changed at any point. You can declare the stability on the function by specifying either:

IMMUTABLE — given the same inputs, the result will never change no matter what

STABLE — given the same inputs, the result can change between statements

VOLATILE — given the same inputs, the result can change at any point, and may even make changes to the database to make subsequent call results change

Why would you care about specifying which one of these applies? Well, PostgreSQL's optimizer will assess whether it can cache the result of the function when the call is being made multiple times, which saves the function from being called unnecessarily.

If your function takes two parameters and adds them together, the result will be IMMUTABLE, because nothing can affect the operation, and the planner knows that if it has called the function before with the same parameters, it doesn't need to bother running it again; it just needs to remember what the result was from last time. This can be particularly helpful if a lot of calculations are performed, as it will save a lot of time and improve performance.

However, if your result depends on parameters such as timezone, or use functions that are themselves STABLE or VOLATILE, then your function has to match that same stability level.

You can declare the stability on your function like this:

Function vs Procedure

Function
Procedure

Returns a result

Does not return a result (do the work and return control to the caller)

Do NOT execute transactions

Support transactions

Data types

Character types

Name
Description

character varying(n), varchar(n)

variable-length with limit

character(n), char(n)

fixed-length, blank padded

text

variable unlimited length

There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

Date

DATE type uses 4 bytes, yyyy-mm-dd

Timestamp

  • timestamp: a timestamp without timezone one.

    • It means that when you change the timezone of your database server, the timestamp value stored in the database will not change automatically.

  • timestamptz: timestamp with a timezone.

    • When you insert a value into a timestamptz column, PostgreSQL converts the timestamptz value into a UTC value and stores the UTC value in the table.

    • When you query timestamptz from the database, PostgreSQL converts the UTC value back to the time value of the timezone set by the database server, the user, or the current database connection.

It’s important to note that timestamptz value is stored as a UTC value. PostgreSQL does not store any timezone data with the timestamptz value.

8 bytes

Usage example

Name
Storage Size
Description
Range

smallint

2 bytes

small-range integer

-32768 to +32767

integer

4 bytes

typical choice for integer

-2147483648 to +2147483647

bigint

8 bytes

large-range integer

-9223372036854775808 to +9223372036854775807

decimal

variable

user-specified precision, exact

up to 131072 digits before the decimal point; up to 16383 digits after the decimal point

numeric

variable

user-specified precision, exact

up to 131072 digits before the decimal point; up to 16383 digits after the decimal point

real

4 bytes

variable-precision, inexact

6 decimal digits precision

double precision

8 bytes

variable-precision, inexact

15 decimal digits precision

smallserial

2 bytes

small autoincrementing integer

1 to 32767

serial

4 bytes

autoincrementing integer

1 to 2147483647

bigserial

8 bytes

large autoincrementing integer

1 to 9223372036854775807

Last updated

Was this helpful?