PostreSQL

Useful page with Don't Do This for Postgres LINK

Terminology

  • PostgreSQL instance is called a cluster, i.e. an instance can serve multiple DBs.

  • DB can be organised into schemas

  • Users

    • normal

      • operations are based on privilege set

    • super

      • do anything with any DB object

  • PostgreSQL relies on underlying filesystem

    • all content is kept in directory PGDATA

      • directory is made by write-ahead logs (WALs)

      • and by data storage

  • start_PostgreSQL_cluster();
    Process postmaster = PostgreSQL.launchProcess();
    
    postmaster.wait(for_client_connection);
    
    if (client_connection.open()) {
        Process backend_process = postmaster.forkProcess();
        backend_process.serveOnly(client_connection);
        
        when (client_connection.ends()) {
            backend_process.destroy();
        }
    }

How to install on mac

brew install libpq

Useful commands

connect to db psql -d mart_db -h localhost -U postgres

  • \l list all databases

  • \c database_name connect to specific database

  • \dn list all schemas

  • \df list all stored procedures and functions

  • \dv lst all views

  • \dt lists all tables in a current database

  • \dt+ get more information on tables in the current database

  • \d+ table_name get detailed information on a table

  • SET ROLE new_role; change role for the current session to the new_role

  • \du list roles

    • SELECT * FROM information_schema.table_privileges

mart_db=# set role readonly    // you missed ; in the end
mart_db-# \g                    // use \g to terminate a command
SET

Roles

Wonderful AWS article about users and roles with great examples.

PostgreSQL lets you grant permissions directly to the database users. However, as a good practice, it is recommended that you create multiple roles with specific sets of permissions based on application and access requirements. Then assign the appropriate role to each user.

The master user that is created during Amazon RDS and Aurora PostgreSQL instance creation should be used only for database administration tasks like creating other users, roles, and databases. The master user should never be used by the application.

⚠️Keep in mind Public schema and public role:

new_database_created();
PostgreSQL.run {
   new_default_schema_is_created(); // with name public
   grant_access_on_this_schema_backend_role_named_public();
}

concept of search path: if you do not specify a schema when selecting a table this search path will be used (this is basically a list of schemas to search table in)

cars=> show search_path;
   search_path
-----------------
 "$user", public
(1 row)

Detail on SET ROLE command

cars=# SELECT SESSION_USER, CURRENT_USER;
 session_user | current_user
--------------+--------------
 postgres     | postgres
(1 row)

cars=# SET ROLE readonly;
SET
cars=> SELECT SESSION_USER, CURRENT_USER;
 session_user | current_user
--------------+--------------
 postgres     | readonly

Commands for readonly and readwrite user creation

-- Revoke privileges from 'public' role
REVOKE CREATE ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON DATABASE mydatabase FROM PUBLIC;

-- Read-only role
CREATE ROLE readonly;
GRANT CONNECT ON DATABASE mydatabase TO readonly;
GRANT USAGE ON SCHEMA myschema TO readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA myschema TO readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA myschema GRANT SELECT ON TABLES TO readonly;

-- Read/write role
CREATE ROLE readwrite;
GRANT CONNECT ON DATABASE mydatabase TO readwrite;
GRANT USAGE, CREATE ON SCHEMA myschema TO readwrite;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA myschema TO readwrite;
ALTER DEFAULT PRIVILEGES IN SCHEMA myschema GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO readwrite;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA myschema TO readwrite;
ALTER DEFAULT PRIVILEGES IN SCHEMA myschema GRANT USAGE ON SEQUENCES TO readwrite;

-- Users creation
CREATE USER reporting_user1 WITH PASSWORD 'some_secret_passwd';
CREATE USER reporting_user2 WITH PASSWORD 'some_secret_passwd';
CREATE USER app_user1 WITH PASSWORD 'some_secret_passwd';
CREATE USER app_user2 WITH PASSWORD 'some_secret_passwd';

-- Grant privileges to users
GRANT readonly TO reporting_user1;
GRANT readonly TO reporting_user2;
GRANT readwrite TO app_user1;
GRANT readwrite TO app_user2;

Inspiration article about how to split roles for db migration and application usage LINK

Role attributes

CREATE ROLE name WITH option; In this syntax, the WITH keyword is optional. And the option can be one or more attributes including SUPER, CREATEDB, CREATEROLE.

Create login role

CREATE ROLE alice LOGIN PASSWORD 'securePass1'; This will create a role with login privileges and initial password. Now it is possible to login using this role psql -d mart_db -h localhost -U alice

Create superuser role

CREATE ROLE john SUPERUSER LOGIN PASSWORD 'securePass1'; The superuser can override all access restrictions within the database therefore you should create this role only when needed. Notice that you must be a superuser in order to create another superuser role.

Create roles that can create databases

CREATE ROLE dba CREATEDB LOGIN PASSWORD 'Abcd1234';

Create roles with validity period

To set a date and time after which the role’s password is no longer valid, you use the valid until attribute:

CREATE ROLE dev_api WITH LOGIN PASSWORD 'securePass1' VALID UNTIL '2030-01-01';

Create roles with connection limit

To specify the number of concurrent connections a role can make, you use the CONNECTION LIMIT attribute

CREATE ROLE api LOGIN PASSWORD 'securePass1' CONNECTION LIMIT 1000;

Alter Role

ALTER ROLE role_name | CURRENT_USER | SESSION_USER | ALL
[IN DATABASE database_name]
SET configuration_param = { value | DEFAULT }

Create database

\dn
WITH
   [OWNER =  role_name]
   [TEMPLATE = template]
   [ENCODING = encoding]
   [LC_COLLATE = collate]
   [LC_CTYPE = ctype]
   [TABLESPACE = tablespace_name]
   [ALLOW_CONNECTIONS = true | false]
   [CONNECTION LIMIT = max_concurrent_connection]
   [IS_TEMPLATE = true | false ]

OWNER

Assign a role that will be the owner of the database. If you omit the OWNER option, the owner of the database is the role that you use to execute the CREATE DATABASE statement.

TEMPLATE

Specify the template database from which the new database is created. By default, PostgreSQL uses the template1 database as the template database if you don’t explicitly specify the template database.

ENCODING

Determine the character set encoding in the new database.

LC_COLLATE

Specify the collation order (LC_COLLATE) that the new database will use. This parameter affects the sort order of string in the queries that contain the ORDER BY clause. It defaults to the LC_COLLATE of the template database.

LC_CTYPE

Specify the character classification that the new database will use. It affects the classification of character e.g., lower, upper, and digit. It defaults to the LC_CTYPE of the template database

TABLESPACE

Specify the tablespace name for the new database. The default is the tablespace of the template database.

CONNECTION LIMIT

Specify the maximum concurrent connections to the new database. The default is -1 i.e., unlimited. This parameter is useful in the shared hosting environments where you can configure the maximum concurrent connections for a particular database.

ALLOW_CONNECTIONS

The allow_connections parameter is a boolean value. If it is false, you cannot connect to the database.

TABLESPACE

Specify the tablespace that the new database will use. It defaults to the tablespace of the template database.

IS_TEMPLATE

If the IS_TEMPLATE is true, any user with the CREATEDB privilege can clone it. If false, only superusers or the database owner can clone it.

Schema

Create

CREATE SCHEMA [IF NOT EXISTS] schema_name;

It is also possible to create a schema for a user, in this case the schema name will match a username;

CREATE SCHEMA [IF NOT EXISTS] AUTHORIZATION username;

CREATE SCHEMA IF NOT EXISTS doe AUTHORIZATION john;

In above case it will be a schema with name doe and the owner of it will be john

PostgreSQL also allows you to create a schema and a list of objects such as tables and views using a single statement as follows:

CREATE SCHEMA schema_name
    CREATE TABLE table_name1 (...)
    CREATE TABLE table_name2 (...)
    CREATE VIEW view_name1
        SELECT select_list FROM table_name1;

Notice that each subcommand does not end with a semicolon (;).

\dn list schemas

SELECT * FROM pg_catalog.pg_namespace ORDER BY nspname; query schemas

Alter

Rename schema (you have to be an owner of the schema and have CREATE privilege)

ALTER SCHEMA schema_name RENAME TO new_name;

It is possible to change an owner of the schema

ALTER SCHEMA schema_name OWNER TO { new_owner | CURRENT_USER | SESSION_USER};

Drop

DROP SCHEMA [IF EXISTS] schema_name 
[ CASCADE | RESTRICT ];

use CASCADE to delete schema and all of its objects, and in turn, all objects that depend on those objects. If you want to delete schema only when it is empty, you can use the RESTRICT option. By default, the DROP SCHEMA uses the RESTRICT option.

To execute the DROP SCHEMA statement, you must be the owner of the schema that you want to drop or a superuser.

Keys

Primary

CREATE TABLE users (
  id serial,
  username VARCHAR(25) NOT NULL,
  PRIMARY KEY (id)
);

With many RDBMS, the AUTO_INCREMENT keyword is used. With PostgreSQL, we use serial to auto increment our ids and set their type as integer. It is not required to manually specify the primary key value each time you insert data, PostgreSQL will do it for you if you specify serial as the type for id.

Foreign key columns are used to reference another row of data, perhaps in another table. In order to reference another row, the database needs a unique identifier for that row. Therefore, foreign key columns contain the value of the referenced row's primary key. For instance, a table might have a column called user_id as a foreign key column, which corresponds to the id column of the users table. Foreign keys are how RDBMS sets up relationships between rows of data, either in the same table or across tables.

CREATE TABLE reviews (
  id serial,
  book_id int NOT NULL,
  user_id int NOT NULL,
  review_content VARCHAR(255),
  
  PRIMARY KEY (id),
  FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE,
  FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

The ON DELETE CASCADE clause indicates that if a book is deleted all reviews associated with that book are also deleted.

CREATE TABLE users_books (
  user_id int NOT NULL ,
  book_id int NOT NULL,
  checkout_date timestamp,
  
  PRIMARY KEY (user_id, book_id),
  FOREIGN KEY (user_id) REFERENCES users(id) ON UPDATE CASCADE,
  FOREIGN KEY (book_id) REFERENCES books(id) ON UPDATE CASCADE
);

Besides the Primary Key, a database also allows applying a Unique constraint on any column in the database. For example, in the books table, we may choose to not only have the book_id as a unique primary key, but also need to store the book's ISBN number. The ISBN, International Standard Book Number is a 10 digit unique number for a book. We can choose to design our database to make this column unique. Even though this is not a primary key, duplicate ISBNs will not be allowed in the books table, and the data quality will be maintained.

CREATE TABLE books (
  id serial,
  title VARCHAR(100) NOT NULL,
  isbn int,
  
  PRIMARY KEY (id),
  UNIQUE (isbn)
);
library=# \d books
                                       Table "public.books"
     Column     |            Type             |                     Modifiers
----------------+-----------------------------+----------------------------------------------------
 id             | integer                     | not null default nextval('books_id_seq'::regclass)
 title          | character varying(100)      | not null
 author         | character varying(100)      | not null
 published_date | timestamp without time zone | not null
 isbn           | integer                     |
Indexes:
    "books_pkey" PRIMARY KEY, btree (id)
    "books_isbn_key" UNIQUE CONSTRAINT, btree (isbn)

library=# INSERT INTO books( title , author, published_date, isbn) values ('My first rails book', 'Mereda', now(),1234567890);
INSERT 0 1
library=# INSERT INTO books( title , author, published_date, isbn) values ('My first rails book', 'Mereda', now(),1234567890);
ERROR:  duplicate key value violates unique constraint "books_isbn_key"
DETAIL:  Key (isbn)=(1234567890) already exists.

NOT NULL

We have seen "NOT NULL" in many of our SQL statements. The NOT NULL constraint prevents a column from allowing a null value (i.e. this column must contain data). This is important for key fields. For example: If you want to add a user to a database, you must provide a username. Without this constraint the database would get filled with null values and become less useful.

Materialized View

ViewMaterialized View

It is a virtual table from one or more base tables which is never stored on disk

It is a physical copy of the base table and stored on disk

It is updated every time when base tables are updated

Updated manually or using triggers. One common approach is to define triggers that listen for inserts/updates on the tables that feed the materialized view and let those events execute the refresh.

It is slow processing. (if you update any content in view, it is reflected in original table. If any changes are done to original base table it would reflect in View. It makes performance of View slower.

Fast processing. It responds faster than View, because it is precomputed.

It is not precomputed as needed to compute each time when it is used or accessed. So data in view is always updated.

It is precomputed and stored on disk like an object. It is not updated each time it is used. Has to be updated manually or by trigger. Update process is Materialized View Maintenance

Use when: - The view query is fast, or you can tolerate the slowness. - You can’t tolerate stale data

Use when: - The view query is slow, and you can’t tolerate the slowness. - You have some tolerance for stale data.

Interesting article about real world usage of MATVIEW

I can understand the policy of "no VIEWs allowed". It's simply because in the past I did come across heavy queries making use of VIEWs more than once. The main problem is that if - for some reason - your predicates don't get pushed down into the VIEWs definition, you end up fetching all the rows that the VIEW in question would fetch without the predicate - then you filter through them. That's quite a penalty and you probably wouldn't use the VIEW in the first place if you knew about that.

(it was about non-Materialized View)

// create Materialized View
CREATE MATERIALIZED VIEW my_view AS
  your query here;
// refresh Materialized View
REFRESH MATERIALIZED VIEW my_view

Executing this refresh query will lock the materialized view so it can’t be accessed while refreshing. This can be a problem if your application can’t tolerate downtime while the refresh is happening.

PostgreSQL 9.4 allows you to refresh your view in a way that enables queries during the refresh:

REFRESH MATERIALIZED VIEW CONCURRENTLY my_view

To better optimize your materialized view queries, you can add indexes to the materialized view columns just as you would with a database table.

CREATE INDEX my_index_1 ON my_view (some_column);
CREATE INDEX my_index_2 ON my_view (some_other_column);
...

Procedure

By default, PostgreSQL supports three procedural languages: SQL, PL/pgSQL, and C. You can also load other procedural languages e.g., Perl, Python, and TCL into PostgreSQL using extensions.

AdvantagesDisadvantages

Reduce the number of round trips between applications and database servers. All SQL statements are wrapped inside a function stored in the PostgreSQL database server so the application only has to issue a function call to get the result back instead of sending multiple SQL statements and wait for the result between each call.

Slowness in software development because stored procedure programming requires specialized skills that many developers do not possess.

Increase application performance because the user-defined functions and stored procedures are pre-compiled and stored in the PostgreSQL database server.

Difficult to manage versions and hard to debug.

Reusable in many applications. Once you develop a function, you can reuse it in any applications.

May not be portable to other database management systems e.g., MySQL or Microsoft SQL Server.

create or replace procedure transfer(
   sender int,
   receiver int, 
   amount dec
)
language plpgsql    
as $$
begin
    -- subtracting the amount from the sender's account 
    update accounts 
    set balance = balance - amount 
    where id = sender;

    -- adding the amount to the receiver's account
    update accounts 
    set balance = balance + amount 
    where id = receiver;

    commit;
end;$$;
-- how to call/execute a procedure
call transfer(1, 2, 1000);

User-defined function

create [or replace] function function_name(param_list)
   returns return_type 
   language plpgsql
  as
$$
declare 
-- variable declaration
begin
 -- logic
end;
$$
examples
create function get_film_count(len_from int, len_to int)
returns int
language plpgsql
as
$$
declare
   film_count integer;
begin
   select count(*) 
   into film_count
   from film
   where length between len_from and len_to;
   
   return film_count;
end;
$$;
create or replace function count_by_page(page_name text)
returns int
language plpgsql
as
$$
declare
   page_count integer;
begin
   select count(*) 
   into page_count
   from ad_clicks
   where page=page_name;
   
   return page_count;
end;
$$;

select get_film_count(40,90);
OR
select get_film_count(
    len_from => 40, 
     len_to => 90
);

Function stability

Functions can have different stabilities, meaning that calling them multiple times with the same inputs can result in different outputs, or that the result can be affected by parameters that can be changed at any point. You can declare the stability on the function by specifying either:

IMMUTABLE — given the same inputs, the result will never change no matter what

STABLE — given the same inputs, the result can change between statements

VOLATILE — given the same inputs, the result can change at any point, and may even make changes to the database to make subsequent call results change

Why would you care about specifying which one of these applies? Well, PostgreSQL's optimizer will assess whether it can cache the result of the function when the call is being made multiple times, which saves the function from being called unnecessarily.

If your function takes two parameters and adds them together, the result will be IMMUTABLE, because nothing can affect the operation, and the planner knows that if it has called the function before with the same parameters, it doesn't need to bother running it again; it just needs to remember what the result was from last time. This can be particularly helpful if a lot of calculations are performed, as it will save a lot of time and improve performance.

However, if your result depends on parameters such as timezone, or use functions that are themselves STABLE or VOLATILE, then your function has to match that same stability level.

You can declare the stability on your function like this:

CREATE FUNCTION add_numbers (first_number integer, second_number integer) RETURNS integer

LANGUAGE SQL IMMUTABLE

AS $$

  SELECT $1 + $2;

$$;

Function vs Procedure

FunctionProcedure

Returns a result

Does not return a result (do the work and return control to the caller)

Do NOT execute transactions

Support transactions

Data types

Character types

NameDescription

character varying(n), varchar(n)

variable-length with limit

character(n), char(n)

fixed-length, blank padded

text

variable unlimited length

There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

Date

DATE type uses 4 bytes, yyyy-mm-dd

CREATE TABLE documents (
	posting_date DATE NOT NULL DEFAULT CURRENT_DATE
);
-- select current date
SELECT CURRENT_DATE;
SELECT NOW()::date;

-- use specific format
SELECT TO_CHAR(NOW() :: DATE, 'dd/mm/yyyy');
SELECT TO_CHAR(NOW() :: DATE, 'Mon dd, yyyy');

-- get interval between two dates
SELECT now() - hire_date as diff FROM employees;

-- extract year/.. from value
SELECT
	EXTRACT (YEAR FROM birth_date) AS YEAR,
	EXTRACT (MONTH FROM birth_date) AS MONTH,
	EXTRACT (DAY FROM birth_date) AS DAY
FROM employees;

Timestamp

  • timestamp: a timestamp without timezone one.

    • It means that when you change the timezone of your database server, the timestamp value stored in the database will not change automatically.

  • timestamptz: timestamp with a timezone.

    • When you insert a value into a timestamptz column, PostgreSQL converts the timestamptz value into a UTC value and stores the UTC value in the table.

    • When you query timestamptz from the database, PostgreSQL converts the UTC value back to the time value of the timezone set by the database server, the user, or the current database connection.

It’s important to note that timestamptz value is stored as a UTC value. PostgreSQL does not store any timezone data with the timestamptz value.

8 bytes

Usage example

-- show current date time
SELECT NOW();
SELECT CURRENT_TIMESTAMP;

SHOW TIMEZONE;
SELECT timezone('America/New_York','2016-06-01 00:00');

NameStorage SizeDescriptionRange

smallint

2 bytes

small-range integer

-32768 to +32767

integer

4 bytes

typical choice for integer

-2147483648 to +2147483647

bigint

8 bytes

large-range integer

-9223372036854775808 to +9223372036854775807

decimal

variable

user-specified precision, exact

up to 131072 digits before the decimal point; up to 16383 digits after the decimal point

numeric

variable

user-specified precision, exact

up to 131072 digits before the decimal point; up to 16383 digits after the decimal point

real

4 bytes

variable-precision, inexact

6 decimal digits precision

double precision

8 bytes

variable-precision, inexact

15 decimal digits precision

smallserial

2 bytes

small autoincrementing integer

1 to 32767

serial

4 bytes

autoincrementing integer

1 to 2147483647

bigserial

8 bytes

large autoincrementing integer

1 to 9223372036854775807

Last updated