Version: 1.0.1

QuestDB

QuestDB is a relational column-based database designed for time-series and event data. It utilizes SQL along with extensions for time-series to assist with real-time analytics.

Storage Model

QuestDB uses a column-based storage model where data is stored in tables, and each column is stored in its own file in a local format. To allow for organic ingestion of data in the order it is received, new data is appended beneath each column.

Append Model

QuestDB adds one column at a time, and each is updated using the same method. The queue of the column file is mapped to a memory page in RAM, and adding a column is effectively a memory write operation at a specific address. When the memory page is exhausted, it is unmapped, and a new page is mapped.

This method ensures minimal resource overhead and consistent append delays.

Read Model

Random access is possible to table columns. Columns with fixed-size data types are read by converting a record number into a file offset through a simple bit shift. The distance in the column file is then translated into a shift in a lazy-mapped memory page where the required value is read.

Consistency and Durability

QuestDB provides table-level isolation and consistency by applying table updates atomically. Updates to a table, within the context of a table transaction that is either committed or rolled back in an atomic operation, are applied. Simultaneous queries with table updates are consistent in returning data as if the query results were either before or after the execution of the table transaction—no interim uncommitted data is shown in the query result.

To guarantee atomicity, each table maintains a last_comwed_record_count in a separate file. Generally, a table reader never reads more records than the transaction count. This enforces the isolation feature: uncommitted data cannot be read. As uncommitted data is directly added to the table, the transaction size is limited only by the available disk space.

After all data is added, QuestDB ensures the atomic update of the commit() transaction count in both multi-threaded and multi-process environments to have minimal impact on concurrent reads. It is lock-free to minimize the impact on concurrent reads.

The consistency guarantee of stored data is limited to QuestDB automatically recovering from abnormally terminated transactions. User-defined constraints, checks, and triggers are not yet supported.

By default, QuestDB relies on operating system-level data durability and leaves it to the operating system to write dirty pages to disk. Data durability can be configured with the commit() call, which can optionally invoke msync() with synchronous or asynchronous I/O options. msync() calls are made only for column files, so synchronization/asynchronous completion modes increase general durability while not guaranteeing it against operating system errors or power loss.

Designated Timestamp

QuestDB offers the option to choose a column as a designated timestamp. This allows determining which column of tables will be indexed based on time-based language features and high-performance functionalities.

The designated timestamp is selected using the Timestamp(columnName) function:

During CREATE TABLE operation
During SELECT operation (dynamic timestamp)
Through ILP when data is retrieved, for tables not already present in QuestDB, partitions are automatically applied by default per day with a timestamp column

SQL Extensions

QuestDB aims to implement standard ANSI SQL. SQL in QuestDB is extended to support the data storage model and simplify the meaning of time-series analytics.

LATEST ON

LATEST ON is a subsidiary clause provided to assist in finding the latest entry by timestamp for a specific key or key combination in a SELECT statement.

SELECT * FROM balances
WHERE balance > 800
LATEST ON ts PARTITION BY customer_id, currency;

SAMPLE BY

SAMPLE BY is used for time-based aggregations with an effective syntax. The following short query returns the average balance from an account list based on monthly partitions.

SELECT avg(balance) FROM accounts SAMPLE BY 1M

Timestamp Search

Timestamp search can be performed using standard operators like >, <=, etc. However, QuestDB provides a native notation that is faster and less verbose.

SELECT * FROM scores WHERE ts IN '2018';

Differences from Standard SQL

**SELECT * FROM is Optional**

In QuestDB, using SELECT * FROM is optional, meaning that SELECT * FROM my_table; is equivalent to my_table;. While adding SELECT * FROM may make SQL look more complete, there are examples where removing these keywords makes queries much easier to read.

my_table;
-- equivalent to:
SELECT * FROM my_table;

GROUP BY is Optional

The GROUP BY clause is optional, and QuestDB's optimizer can derive grouping from the SELECT clause. In standard SQL, users might write a query like:

SELECT a, b, c, d, sum(e) FROM tab GROUP BY a, b, c, d;

However, in QuestDB's SQL dialect, the same SQL can be written as:

SELECT a, b, c, d, sum(e) FROM tab;

Implicit HAVING

In standard SQL, consider a more complex example using HAVING:

SELECT a, b, c, d, sum(e)
FROM tab
GROUP BY a, b, c, d
HAVING sum(e) > 100;

In QuestDB's dialect, subqueries come to the rescue to create a smaller, more readable query without unnecessary repeated aggregations. The HAVING function can be indirectly achieved as follows:

(SELECT a, b, c, d, sum(e) s FROM tab) WHERE s > 100;

Partitions

QuestDB offers the option to partition tables based on time intervals, where data for each interval is stored in separate sets of files.

Features

Available partition intervals are NONE, YEAR, MONTH, DAY, and HOUR.
When using ILP ingestion with CREATE TABLE and PARTITION BY DAY, the default behavior is PARTITION BY NONE.
Partitions are defined when creating the table. For more information, refer to the CREATE TABLE section.
Naming rule for partition directories:

Table Partition Partition format
HOUR YYYY-MM-DD-HH
DAY YYYY-MM-DD
MONTH YYYY-MM
YEAR YYYY

💡 Partitioning is only possible for tables with a designated timestamp.

Table Partition	Partition format
HOUR	YYYY-MM-DD-HH
DAY	YYYY-MM-DD
MONTH	YYYY-MM
YEAR	YYYY

Advantages

Reduced disk IO for timestamp interval queries due to the SQL optimizer taking advantage of partitioning.
Substantially improved calculations and search times by leveraging data chronology and relative immutability from previous partitions.
Physical separation of data files, facilitating adherence to file storage policies or easy removal of specific intervals.

Storage Example

Each partition effectively corresponds to a directory on the host machine, representing the partitioning interval. In the example below, we assume navigating a table partitioned using PARTITION BY MONTH.

[quest-user trips]$ dir
2017-03     2017-10   2018-05     2019-02
2017-04     2017-11   2018-06     2019-03
2017-05     2017-12   2018-07     2019-04
2017-06     2018-01   2018-08   2019-05
2017-07     2018-02   2018-09   2019-06
2017-08     2018-03   2018-10
2017-09     2018-04   2018-11

Each partition on disk contains the column data files for the corresponding timestamp interval.

[quest-user 2019-06]$ dir
_archive    cab_type.v              dropoff_latitude.d     ehail_fee.d
cab_type.d  congestion_surcharge.d  dropoff_location_id.d  extra.d
cab_type.k  dropoff_datetime.d      dropoff_longitude.d    fare_amount.d

Symbol

QuestDB provides a data type called symbol to store repeating strings, implemented as an internal structure with integer tables and corresponding string values.

This section covers the concept, optional settings, and indicators for symbol types.

Advantages

Improved query performance as string operations compare and write int types instead of strings.
Enhanced storage efficiency as int types are mapped to string types.
SQL execution is indistinguishable for users, eliminating the need for additional tables or joins and reducing the complexity of database schemas.

Features

Symbol tables are stored separately from column data.
Quick conversion between string and int when reading or writing data.
Columns defined as Symbol types support indexing.
By default, QuestDB caches Symbol types in memory for advanced query speed and ILP ingestion speed. This setting is configurable.

Example

Columns can be specified as Symbol using CREATE TABLE similar to other types:

CREATE TABLE my_table
  (symb SYMBOL CAPACITY 128 NOCACHE, price DOUBLE, ts TIMESTAMP)
timestamp(ts);

Additional symbol settings, defined either globally as part of server configuration or locally when creating a table:

Symbol Capacity: An optional setting specifying how many different values this column should have. Depending on the used value, data structures will resize themselves to allow QuestDB to function properly when needed. Setting a low symbol value may result in performance degradation, while setting it too high may increase disk space and memory consumption. Symbol capacity is also used to set the initial symbol cache size when caching is enabled.
- Server-wide setting: cairo.default.symbol.capacity, default 256
- Column-specific setting for CREATE TABLE: CAPACITY option
Cache: An optional setting indicating whether a symbol should be cached. When a symbol column is cached, QuestDB uses a Java heap-based hash table to resolve symbol values and keys. If a column has a large number of different symbol values (e.g., over 100,000), heap impact can be significant and may result in OutOfMemory errors. Not caching utilizes a memory-mapped structure that can handle larger value counts but is slower.
- Server-wide setting: cairo.default.symbol.cache.flag, default true
- Column-specific setting when creating a table: CACHE | NOCACHE keyword for CREATE TABLE

Indexes

An index stores row locations for each value of the target column to provide faster read access. It allows bypassing full table scans during WHERE conditional queries.

Indexing is available for symbol columns, and support for other types will be added over time.

Creating and Deleting Index

Ways to index a symbol column:

During table creation using CREATE TABLE.
Indexing an existing symbol column using ALTER TABLE ALTER COLUMN ADD INDEX.

To delete an index:

Use ALTER TABLE ALTER COLUMN DROP INDEX.

How It Works?

An index creates a table of row positions for each distinct value for the target symbol. After the index is created, adding data to the table will update the index. Searches on indexed values will directly operate on the index table, providing the memory locations of the items, thus avoiding unnecessary table scans.

Table example and index table:

Table                                       Index
|Row ID | Symbol    | Value |             | Symbol     | Row IDs       |
| 1     | A         | 1     |             | A          | 1, 2, 4       |
| 2     | A         | 0     |             | B          | 3             |
| 3     | B         | 1     |             | C          | 5             |
| 4     | A         | 1     |
| 5     | C         | 0     |

INSERT INTO Table values(B, 1); triggers two updates, one for the Table and another for the Index.

Table                                       Index
|Row ID | Symbol    | Value |             | Symbol     | Row IDs       |
| 1     | A         | 1     |             | A          | 1, 2, 4       |
| 2     | A         | 0     |             | B          | 3, 6          |
| 3     | B         | 1     |             | C          | 5             |
| 4     | A         | 1     |
| 5     | C         | 0     |
| 6     | B         | 1     |

Index Capacity

When a symbol column is added to the index, an additional index capacity can be defined to specify how many row IDs will be stored in a single storage block on disk:

Server-wide setting: cairo.index.value.block.size, default value is 256.
Column-wide setting: [index](https://questdb.io/docs/reference/sql/create-table/#column-indexes) option for CREATE TABLE.
Column-wide setting: ALTER TABLE COLUMN ADD INDEX

Using fewer blocks to store row IDs provides better performance. Oversizing the setting will result in higher disk space usage than necessary.

Advantages

An index significantly reduces the complexity of queries covering a subset of an indexed column when using WHERE subclauses.

Consider the query applied to the table above: SELECT sum(Value) FROM Table WHERE Symbol='A';

Without an Index: The query engine will scan the entire table to execute the query, requiring 6 operations (reading each of the 6 rows once).
With an Index: The query engine first scans the relatively small index table. In our example, it will find 'A' in the first row. Then, the query engine checks the values at specific locations (1, 2, 4) in the table to read the corresponding values. As a result, it only scans the relevant rows in the table and does not touch irrelevant rows.

Trade-offs

Storage Space: An index will maintain a table for each distinct symbol value, containing the positions where these symbols can be found. As a result, there is a small storage cost associated with indexing a symbol column.
Ingestion Performance: Each new entry in the table triggers an entry in the Index table. This means that any write now requires two write operations, taking twice as long.

Index Capacity

Consider an example table over time with 200 unique stock symbols and 1,000,000,000 records. The index should store 1,000,000,000 / 200 row IDs per symbol, meaning 5,000,000 should be stored per symbol.

If the index capacity is set to 1,048,576, QuestDB will use 5 blocks to store row IDs.
If the index capacity is set to 1,024, the number of blocks will be 4,883.

Storage Model​

Append Model​

Read Model​

Consistency and Durability​

Designated Timestamp​

SQL Extensions​

LATEST ON​

SAMPLE BY​

Timestamp Search​

Differences from Standard SQL​

SELECT * FROM is Optional​

GROUP BY is Optional​

Implicit HAVING​

Partitions

Features​

Advantages​

Storage Example​

Symbol​

Advantages​

Features​

Example​

Indexes​

Creating and Deleting Index​

How It Works?

Index Capacity​

Advantages​

Trade-offs​

Index Capacity​

Storage Model

Append Model

Read Model

Consistency and Durability

Designated Timestamp

SQL Extensions

LATEST ON

SAMPLE BY

Timestamp Search

Differences from Standard SQL

**SELECT * FROM is Optional**

GROUP BY is Optional

Implicit HAVING

Features

Advantages

Storage Example

Symbol

Advantages

Features

Example

Indexes

Creating and Deleting Index

Index Capacity

Advantages

Trade-offs

Index Capacity