PostreSQL
Useful page with Don't Do This for Postgres LINK
Terminology
PostgreSQL instance is called a cluster, i.e. an instance can serve multiple DBs.
DB can be organised into schemas
Users
normal
operations are based on privilege set
super
do anything with any DB object
PostgreSQL relies on underlying filesystem
all content is kept in directory
PGDATA
directory is made by write-ahead logs (WALs)
and by data storage
How to install on mac
brew install libpq
Useful commands
connect to db psql -d mart_db -h localhost -U postgres
\l
list all databases\c database_name
connect to specific database\dn
list all schemas\df
list all stored procedures and functions\dv
lst all views\dt
lists all tables in a current database\dt+
get more information on tables in the current database\d+ table_name
get detailed information on a tableSET ROLE new_role;
change role for the current session to the new_role\du
list rolesSELECT * FROM information_schema.table_privileges
Roles
PostgreSQL lets you grant permissions directly to the database users. However, as a good practice, it is recommended that you create multiple roles with specific sets of permissions based on application and access requirements. Then assign the appropriate role to each user.
The master user that is created during Amazon RDS and Aurora PostgreSQL instance creation should be used only for database administration tasks like creating other users, roles, and databases. The master user should never be used by the application.
Detail on SET ROLE command
Commands for readonly and readwrite user creation
Inspiration article about how to split roles for db migration and application usage LINK
Role attributes
CREATE ROLE name WITH option;
In this syntax, the WITH keyword is optional. And the option can be one or more attributes including SUPER, CREATEDB, CREATEROLE.
Create login role
CREATE ROLE alice LOGIN PASSWORD 'securePass1';
This will create a role with login privileges and initial password.
Now it is possible to login using this role psql -d mart_db -h localhost -U alice
Create superuser role
CREATE ROLE john SUPERUSER LOGIN PASSWORD 'securePass1';
The superuser can override all access restrictions within the database therefore you should create this role only when needed. Notice that you must be a superuser in order to create another superuser role.
Create roles that can create databases
CREATE ROLE dba CREATEDB LOGIN PASSWORD 'Abcd1234';
Create roles with validity period
To set a date and time after which the roleβs password is no longer valid, you use the valid until attribute:
CREATE ROLE dev_api WITH LOGIN PASSWORD 'securePass1' VALID UNTIL '2030-01-01';
Create roles with connection limit
To specify the number of concurrent connections a role can make, you use the CONNECTION LIMIT
attribute
CREATE ROLE api LOGIN PASSWORD 'securePass1' CONNECTION LIMIT 1000;
Alter Role
Create database
OWNER
Assign a role that will be the owner of the database. If you omit the OWNER
option, the owner of the database is the role that you use to execute the CREATE DATABASE
statement.
TEMPLATE
Specify the template database from which the new database is created. By default, PostgreSQL uses the template1
database as the template database if you donβt explicitly specify the template database.
ENCODING
Determine the character set encoding in the new database.
LC_COLLATE
Specify the collation order (LC_COLLATE
) that the new database will use. This parameter affects the sort order of string in the queries that contain the ORDER BY
clause. It defaults to the LC_COLLATE
of the template database.
LC_CTYPE
Specify the character classification that the new database will use. It affects the classification of character e.g., lower, upper, and digit. It defaults to the LC_CTYPE
of the template database
TABLESPACE
Specify the tablespace name for the new database. The default is the tablespace of the template database.
CONNECTION LIMIT
Specify the maximum concurrent connections to the new database. The default is -1 i.e., unlimited. This parameter is useful in the shared hosting environments where you can configure the maximum concurrent connections for a particular database.
ALLOW_CONNECTIONS
The allow_connections
parameter is a boolean value. If it is false
, you cannot connect to the database.
TABLESPACE
Specify the tablespace that the new database will use. It defaults to the tablespace of the template database.
IS_TEMPLATE
If the IS_TEMPLATE
is true, any user with the CREATEDB
privilege can clone it. If false, only superusers or the database owner can clone it.
Schema
Create
CREATE SCHEMA [IF NOT EXISTS] schema_name;
It is also possible to create a schema for a user, in this case the schema name will match a username;
CREATE SCHEMA [IF NOT EXISTS] AUTHORIZATION username;
CREATE SCHEMA IF NOT EXISTS doe AUTHORIZATION john;
In above case it will be a schema with name doe
and the owner of it will be john
PostgreSQL also allows you to create a schema and a list of objects such as tables and views using a single statement as follows:
Notice that each subcommand does not end with a semicolon (;).
\dn
list schemas
SELECT * FROM pg_catalog.pg_namespace ORDER BY nspname;
query schemas
Alter
Rename schema (you have to be an owner of the schema and have CREATE
privilege)
ALTER SCHEMA schema_name RENAME TO new_name;
It is possible to change an owner of the schema
ALTER SCHEMA schema_name OWNER TO { new_owner | CURRENT_USER | SESSION_USER};
Drop
use CASCADE
to delete schema and all of its objects, and in turn, all objects that depend on those objects. If you want to delete schema only when it is empty, you can use the RESTRICT
option. By default, the DROP SCHEMA
uses the RESTRICT
option.
To execute the DROP SCHEMA
statement, you must be the owner of the schema that you want to drop or a superuser.
Keys
Primary
With many RDBMS, the AUTO_INCREMENT
keyword is used. With PostgreSQL, we use serial to auto increment our ids and set their type as integer. It is not required to manually specify the primary key value each time you insert data, PostgreSQL will do it for you if you specify serial as the type for id.
Foreign key columns are used to reference another row of data, perhaps in another table. In order to reference another row, the database needs a unique identifier for that row. Therefore, foreign key columns contain the value of the referenced row's primary key. For instance, a table might have a column called user_id
as a foreign key column, which corresponds to the id
column of the users
table. Foreign keys are how RDBMS sets up relationships between rows of data, either in the same table or across tables.
The ON DELETE CASCADE
clause indicates that if a book is deleted all reviews associated with that book are also deleted.
Besides the Primary Key, a database also allows applying a Unique constraint on any column in the database. For example, in the books table, we may choose to not only have the book_id
as a unique primary key, but also need to store the book's ISBN number. The ISBN, International Standard Book Number is a 10 digit unique number for a book. We can choose to design our database to make this column unique. Even though this is not a primary key, duplicate ISBNs will not be allowed in the books table, and the data quality will be maintained.
NOT NULL
We have seen "NOT NULL" in many of our SQL statements. The NOT NULL
constraint prevents a column from allowing a null value (i.e. this column must contain data). This is important for key fields. For example: If you want to add a user to a database, you must provide a username. Without this constraint the database would get filled with null values and become less useful.
Materialized View
It is a virtual table from one or more base tables which is never stored on disk
It is a physical copy of the base table and stored on disk
It is updated every time when base tables are updated
Updated manually or using triggers. One common approach is to define triggers that listen for inserts/updates on the tables that feed the materialized view and let those events execute the refresh.
It is slow processing. (if you update any content in view, it is reflected in original table. If any changes are done to original base table it would reflect in View. It makes performance of View slower.
Fast processing. It responds faster than View, because it is precomputed.
It is not precomputed as needed to compute each time when it is used or accessed. So data in view is always updated.
It is precomputed and stored on disk like an object. It is not updated each time it is used. Has to be updated manually or by trigger. Update process is Materialized View Maintenance
Use when: - The view query is fast, or you can tolerate the slowness. - You canβt tolerate stale data
Use when: - The view query is slow, and you canβt tolerate the slowness. - You have some tolerance for stale data.
Interesting article about real world usage of MATVIEW
I can understand the policy of "no VIEWs allowed". It's simply because in the past I did come across heavy queries making use of VIEWs more than once. The main problem is that if - for some reason - your predicates don't get pushed down into the VIEWs definition, you end up fetching all the rows that the VIEW in question would fetch without the predicate - then you filter through them. That's quite a penalty and you probably wouldn't use the VIEW in the first place if you knew about that.
(it was about non-Materialized View)
Executing this refresh query will lock the materialized view so it canβt be accessed while refreshing. This can be a problem if your application canβt tolerate downtime while the refresh is happening.
PostgreSQL 9.4 allows you to refresh your view in a way that enables queries during the refresh:
To better optimize your materialized view queries, you can add indexes to the materialized view columns just as you would with a database table.
Procedure
By default, PostgreSQL supports three procedural languages: SQL, PL/pgSQL, and C. You can also load other procedural languages e.g., Perl, Python, and TCL into PostgreSQL using extensions.
Reduce the number of round trips between applications and database servers. All SQL statements are wrapped inside a function stored in the PostgreSQL database server so the application only has to issue a function call to get the result back instead of sending multiple SQL statements and wait for the result between each call.
Slowness in software development because stored procedure programming requires specialized skills that many developers do not possess.
Increase application performance because the user-defined functions and stored procedures are pre-compiled and stored in the PostgreSQL database server.
Difficult to manage versions and hard to debug.
Reusable in many applications. Once you develop a function, you can reuse it in any applications.
May not be portable to other database management systems e.g., MySQL or Microsoft SQL Server.
User-defined function
Function stability
Functions can have different stabilities, meaning that calling them multiple times with the same inputs can result in different outputs, or that the result can be affected by parameters that can be changed at any point. You can declare the stability on the function by specifying either:
IMMUTABLE β given the same inputs, the result will never change no matter what
STABLE β given the same inputs, the result can change between statements
VOLATILE β given the same inputs, the result can change at any point, and may even make changes to the database to make subsequent call results change
Why would you care about specifying which one of these applies? Well, PostgreSQL's optimizer will assess whether it can cache the result of the function when the call is being made multiple times, which saves the function from being called unnecessarily.
If your function takes two parameters and adds them together, the result will be IMMUTABLE, because nothing can affect the operation, and the planner knows that if it has called the function before with the same parameters, it doesn't need to bother running it again; it just needs to remember what the result was from last time. This can be particularly helpful if a lot of calculations are performed, as it will save a lot of time and improve performance.
However, if your result depends on parameters such as timezone, or use functions that are themselves STABLE or VOLATILE, then your function has to match that same stability level.
You can declare the stability on your function like this:
Function vs Procedure
Returns a result
Does not return a result (do the work and return control to the caller)
Do NOT execute transactions
Support transactions
Data types
Character types
character varying(
n
)
, varchar(
n
)
variable-length with limit
character(
n
)
, char(
n
)
fixed-length, blank padded
text
variable unlimited length
There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(
n
)
has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(
n
)
is usually the slowest of the three because of its additional storage costs. In most situations text
or character varying
should be used instead.
Date
DATE type uses 4 bytes, yyyy-mm-dd
Timestamp
timestamp
: a timestamp without timezone one.It means that when you change the timezone of your database server, the timestamp value stored in the database will not change automatically.
timestamptz
: timestamp with a timezone.When you insert a value into a
timestamptz
column, PostgreSQL converts thetimestamptz
value into a UTC value and stores the UTC value in the table.When you query
timestamptz
from the database, PostgreSQL converts the UTC value back to the time value of the timezone set by the database server, the user, or the current database connection.
Itβs important to note that timestamptz
value is stored as a UTC value. PostgreSQL does not store any timezone data with the timestamptz
value.
8 bytes
smallint
2 bytes
small-range integer
-32768 to +32767
integer
4 bytes
typical choice for integer
-2147483648 to +2147483647
bigint
8 bytes
large-range integer
-9223372036854775808 to +9223372036854775807
decimal
variable
user-specified precision, exact
up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
numeric
variable
user-specified precision, exact
up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
real
4 bytes
variable-precision, inexact
6 decimal digits precision
double precision
8 bytes
variable-precision, inexact
15 decimal digits precision
smallserial
2 bytes
small autoincrementing integer
1 to 32767
serial
4 bytes
autoincrementing integer
1 to 2147483647
bigserial
8 bytes
large autoincrementing integer
1 to 9223372036854775807
Last updated
Was this helpful?