The following are scenarios where the EDB Advanced Storage Pack TAMs are useful.
Refdata example
A scenario where Refdata is useful is creating a reference table of all
the New York Stock Exchange (NYSE) stock symbols and their corporate names.
This data is expected to change very rarely and be referenced frequently from a
table tracking all stock trades for the entire market.
Consider the following two tables:
When heap is used for nyse_symbol, manipulating rows in nyse_trade causes
row locks to be created in nyse_symbol. But only row locks are used in
nyse_symbol:
However, when refdata is used for nyse_symbol, the locking pattern changes. The table is created with the USING refdata clause:
In this case, manipulating data in nyse_trade doesn't generate row locks in nyse_symbol. But manipulating nyse_symbol directly causes an EXCLUSIVE lock to be acquired on the entire relation:
Autocluster example
A scenario where Autocluster is useful is with Internet of Things (IoT) data, which are usually inserted with many rows that relate to each other and often use append-only data. When using heap instead of Autocluster, Postgres can't cluster together these related rows, so access to the set of rows touches many data blocks, can be very slow, and is input/output heavy.
This example is for an IoT thermostat that reports house temperatures and temperature settings every 60 seconds:
Using Autocluster, rows with the same thermostat_id are clustered together and are easier to access:
Note
The cols parameter specifies the table that's clustered. In this case, {1} corresponds to the first column of the table, thermostat_id, which is the most common access pattern.
Populate the table with the thermostat_id and recordtime data:
When you select the data from the IoT table, you can see from the ctid location that the data with the same thermostat_id was clustered together:
Advanced example
This is an advanced example where Refdata and Autocluster are used together. It involves referencing the NYSE table from the Refdata example and clustering together the rows in the trade table based on the stock symbol. This approach makes it easier to find the latest number of trades.
Start with the NYSE table from the Refdata example:
Create a highly updated table containing NYSE trades, referencing the mostly
static stock symbols in the Refdata table. Cluster the rows on the stock symbol
to make it easier to look up the last x trades for a given stock:
Prepopulate the static data (shortened for brevity):
Insert artificial stock trades, one trade per stock symbol, repeating the
pattern multiple times:
Given that the inserts intercalated nyse_symbol_id, a query that consults one
stock touches most pages if the table uses heap, but touches far
fewer pages using Autocluster.
The following query operates on attributes that must be fetched from the table
after an index scan and shows the number of buffers touched: