Query Optimization in PostgreSQL - Part 1

postgresql

database

Query optimization in PostgreSQL involves identifying and resolving performance issues within SQL queries. PostgreSQL provides the “EXPLAIN” command, which outputs the query execution plan and helps identify potential bottlenecks. Here’s a step-by-step breakdown:

Using EXPLAIN:
When you prefix a query with the keyword EXPLAIN, PostgreSQL produces an execution plan without actually running the query.
For example:
```
  EXPLAIN SELECT * 
  FROM your_table 
  WHERE some_column = 'some_value';
```
Reading the Execution Plan:
The execution plan provides insights into how PostgreSQL intends to execute the query.
It includes details such as the order of table scans, index scans, join algorithms, and potential use of aggregate or window functions.
Analyzing the Plan:
Look for sequential scans, index scans, join types (nested loop, hash, merge), and potential sequences of operations.
Identify inefficiencies, such as unnecessary full table scans or unutilized indexes.
Understanding Cost:
PostgreSQL’s query planner estimates the cost of various operations and aims to choose the most efficient execution plan.
Watch out for high-cost operations and identify potential areas for improvement.
Index Usage:
Ensure that appropriate indexes exist and are being utilized effectively by the query.
Verify through the execution plan whether the query is using the expected indexes.
Query Rewrites:
Consider rewriting the query to optimize its performance. This could involve restructuring the query to make better use of indexes, avoiding unnecessary joins, or rewriting subqueries for efficiency.
Post-Optimization Analysis:
After making changes, re-run the EXPLAIN command to ensure that the query planner is now using a more efficient execution plan.

By analyzing the output of EXPLAIN and understanding the query planner’s decision-making process, you can make informed decisions to optimize query performance.

PostgreSQL EXPLAIN Command

The PostgreSQL EXPLAIN command provides insights into how the database plans to execute a query. It reveals information about the query execution plan, including the types of scans, join methods, and other operations that will be performed. Here’s an explanation of the various types of scan and other key details:

Sequential Scan:
A sequential scan involves scanning the entire table or index from start to finish.
It is generally the slowest method, especially for large tables, and is usually employed when an index is not available or would not be beneficial.
Index Scan:
An index scan involves using an index to look up matching rows directly, potentially more efficiently than a sequential scan.
There can be different types of index scans, such as bitmap index scans or index-only scans, depending on the query and index configuration.
- Bitmap Index Scan:
  This scan method is used when multiple indexes can be combined through bitmap operations to efficiently evaluate complex query conditions.
- Index Only Scan:
  If the query only requires data from the index itself, PostgreSQL may perform an index-only scan, avoiding the need to access the actual table data.

Examples

Let’s say we have a table “users” with columns “id”, “name”, and “age”. We have separate indexes on the “name” and “age” columns. We want to perform a query that uses both indexes to speed up the data retrieval.

-- Creating a sample table
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    age INT
);

-- Creating indexes on the name and age columns
CREATE INDEX idx_name ON users (name);
CREATE INDEX idx_age ON users (age);

Now let’s say we want to perform a query that combines the use of these indexes. For example, we want to retrieve the IDs of users with a specific name and age range. We can use the bitmap index scan in the following query:

EXPLAIN SELECT id
FROM users
WHERE name = 'John' AND age BETWEEN 30 AND 40;

When you run this query, you might see that PostgreSQL uses a bitmap index scan to efficiently combine the results from the “name” and “age” indexes to retrieve the required data. The bitmap index scan is effective when the combination of multiple indexes can significantly reduce the number of rows that need to be fetched from the table.

Suppose we have a table “products” with columns “product_id”, “product_name”, and “price”. We create an index on the “price” column, and we want to retrieve only the prices from the table using the index without touching the actual table data.

-- Creating a sample table
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(100),
    price DECIMAL(10, 2)
);

-- Creating an index on the price column
CREATE INDEX idx_price ON products (price);

Now, let’s perform a query that can take advantage of index-only scan to avoid accessing the table data:

EXPLAIN SELECT price
FROM products;

In this case, as long as the index on the “price” column provides all the necessary information to answer the query, PostgreSQL might choose an index-only scan to directly retrieve the prices from the index without accessing the actual table data. This can lead to improved query performance since it avoids disk I/O related to fetching data from the table.

Now, let’s consider a scenario where we have a query that may use an index scan or a sequential scan.

Suppose we have a similar table “orders” with columns “order_id”, “customer_id”, “order_date”, and “total_amount”. We create an index on the “order_date” column.

-- Creating a sample table
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10, 2)
);

-- Creating an index on the order_date column
CREATE INDEX idx_order_date ON orders (order_date);

Now let’s construct a query that may either use an index scan or a sequential scan:

EXPLAIN SELECT order_id, total_amount
FROM orders
WHERE order_date >= '2023-01-01' AND 
      order_date < '2023-02-01';

Depending on the data distribution and the selectivity of the “order_date” values, PostgreSQL may choose to use an index scan if it finds that using the index is more efficient due to the filter conditions. Alternatively, it may resort to a sequential scan if a large portion of the table needs to be accessed and the index scan becomes less efficient.

We will talk about join types, aggregation functions and window functions in another post soon.

Written on January 27, 2024