PostgreSQL’s GREATEST (and LEAST) function

PostgreSQL has two functions that are not standard SQL, but are very helpful in certain situations : GREATEST and LEAST. In SQL MIN and MAX are aggregate functions that operate across all rows. But there are times where you want the smallest or largest value in a finite number of columns, a scalar min/max. That is where GREATEST and LEAST come in. Both allow for more than two values to be passed in, and also handle NULLs. From the docs:

The GREATEST and LEAST functions select the largest or smallest value from a list of any number of expressions. The expressions must all be convertible to a common data type, which will be the type of the result (see Section 10.5 for details). NULL values in the list are ignored. The result will be NULL only if all the expressions evaluate to NULL.

PostgreSQL 11 Documentation

What about Other Database Vendors?

MySQL also has GREATEST and LEAST functions. Like Postgres, they accepts any number of arguments. However, the NULL semantics are different:

GREATEST() returns NULL if any argument is NULL.

SQLServer has the same functions, GREATEST and LEAST, and have similar NULL semantics as Postgres.

SQLite has scalar MAX and MIN functions, with MySQL-like NULL semantics.

Real World Example

I worked on an order purchase system that had a VIEW that joined together multiple tables. This view had an UPDATED_AT column which represents the most recent UPDATED_AT value from all of the tables that backed the view. This was necessary because the view was being used to snapshot data into another system, and would periodically be checked by comparing the UPDATED_AT field (ie: WHERE vw.UPDATED_AT > ?).

Here is a simplified version of that view:

CREATE VIEW vw_customer_order_line_items (customer_id, order_id, amount, sku,
item_price, item_quantity, most_recent_update)
AS
  SELECT c.id,
         o.id,
         o.amount,
         li.sku,
         li.price,
         li.quantity,
         (SELECT MAX(updatecheck.updated_at)
          FROM   ( VALUES (c.updated_at),
                          (o.updated_at),
                          (li.updated_at) ) updatecheck(updated_at))
  FROM   customers c
         JOIN orders o
           ON o.customer_id = c.id
         JOIN line_items li
           ON li.orders_id = o.id;

SELECT *
FROM   vw_customer_order_line_items
WHERE  updated_at > '2019-12-01'; 

The nested SELECT statement creates an aliased nested query, updatecheck, with a single column - updated_at. Let's look at the query plan to see what is going on:

  • SELECT
    • Merge Join : 119200
      • Index Scan : orders_pk : 18047
      • Temporary (Materialize) : 111767
        • Sort: 111413
          • Hash Join : 86875
            • Full Scan : customers : 14310
            • Transformation Hash : 24128
              • Full scan : line_items : 24128
            • Aggregate : 0.4
              • Value : 0.3
      • Aggregate : 0.4
        • Value: 0.3

A query plan cost of 119,200? That is pretty high, but, not surprising given the lack of other WHERE clauses. With different WHERE clauses added, like on a customer_id, that cost would come down to 25.

I was working on trying to optimize this query, and one of the portions that stood out was the nested table for computing the maximum updated_at value. I knew that GREATEST was available, so I refactored the query to use it and re-ran the plan:

CREATE VIEW vw_customer_order_line_items (customer_id, order_id, amount, sku,
item_price, item_quantity, most_recent_update)
AS
  SELECT c.id,
         o.id,
         o.amount,
         li.sku,
         li.price,
         li.quantity,
         GREATEST(c.updated_at, o.updated_at, li.updated_at)
  FROM   customers c
         JOIN orders o
           ON o.customer_id = c.id
         JOIN line_items li
           ON li.orders_id = o.id;

SELECT *
FROM   vw_customer_order_line_items
WHERE  updated_at > '2019-12-01'; 

After optimization:

  • SELECT
    • Gather : 81701
      • Hash Join : 66558
        • Hash Join : 50999
          • Full Scan : line_items : 19226
          • Transformation Hash : 11718
            • Full Scan : orders : 11718
      • Transformation Hash : 8252
        • Full Scan : customers : 8252

The cost went from 119,200 down to 81,701, a roughly 32% reduction. What is interesting is that the query optimizer chose to do three full scans this time, where as previously it only needed to do two plus an index scan (further optimizations took care of this). But, for a quick one-line change to the query, it still was a good improvement in query execution time, and made the SQL much more readable.

Next time you need to find the largest (or smallest) value in a list of scalar values, give GREATEST and LEAST a try.

1 thought on “PostgreSQL’s GREATEST (and LEAST) function”

  1. Pingback: Self Join a Table to Get a Pair of Value But Could Not Receive the Expected Results

Leave a Reply