QuestDB
QuestDB is a relational column-based database designed for time-series and event data. It utilizes SQL along with extensions for time-series to assist with real-time analytics.
Storage Model
QuestDB uses a column-based storage model where data is stored in tables, and each column is stored in its own file in a local format. To allow for organic ingestion of data in the order it is received, new data is appended beneath each column.
Append Model
QuestDB adds one column at a time, and each is updated using the same method. The queue of the column file is mapped to a memory page in RAM, and adding a column is effectively a memory write operation at a specific address. When the memory page is exhausted, it is unmapped, and a new page is mapped.
This method ensures minimal resource overhead and consistent append delays.
Read Model
Random access is possible to table columns. Columns with fixed-size data types are read by converting a record number into a file offset through a simple bit shift. The distance in the column file is then translated into a shift in a lazy-mapped memory page where the required value is read.
Consistency and Durability
QuestDB provides table-level isolation and consistency by applying table updates atomically. Updates to a table, within the context of a table transaction that is either committed or rolled back in an atomic operation, are applied. Simultaneous queries with table updates are consistent in returning data as if the query results were either before or after the execution of the table transaction—no interim uncommitted data is shown in the query result.
To guarantee atomicity, each table maintains a last_comwed_record_count
in a separate file. Generally, a table reader never reads more records than the transaction count. This enforces the isolation feature: uncommitted data cannot be read. As uncommitted data is directly added to the table, the transaction size is limited only by the available disk space.
After all data is added, QuestDB ensures the atomic update of the commit()
transaction count in both multi-threaded and multi-process environments to have minimal impact on concurrent reads. It is lock-free to minimize the impact on concurrent reads.
The consistency guarantee of stored data is limited to QuestDB automatically recovering from abnormally terminated transactions. User-defined constraints, checks, and triggers are not yet supported.
By default, QuestDB relies on operating system-level data durability and leaves it to the operating system to write dirty pages to disk. Data durability can be configured with the commit()
call, which can optionally invoke msync()
with synchronous or asynchronous I/O options. msync()
calls are made only for column files, so synchronization/asynchronous completion modes increase general durability while not guaranteeing it against operating system errors or power loss.
Designated Timestamp
QuestDB offers the option to choose a column as a designated timestamp. This allows determining which column of tables will be indexed based on time-based language features and high-performance functionalities.
The designated timestamp is selected using the Timestamp(columnName)
function:
- During
CREATE TABLE
operation - During
SELECT
operation (dynamic timestamp) - Through ILP when data is retrieved, for tables not already present in QuestDB, partitions are automatically applied by default per day with a timestamp column
SQL Extensions
QuestDB aims to implement standard ANSI SQL. SQL in QuestDB is extended to support the data storage model and simplify the meaning of time-series analytics.
LATEST ON
LATEST ON
is a subsidiary clause provided to assist in finding the latest entry by timestamp for a specific key or key combination in a SELECT
statement.
SELECT * FROM balances
WHERE balance > 800
LATEST ON ts PARTITION BY customer_id, currency;
SAMPLE BY
SAMPLE BY
is used for time-based aggregations with an effective syntax. The following short query returns the average balance from an account list based on monthly partitions.
SELECT avg(balance) FROM accounts SAMPLE BY 1M
Timestamp Search
Timestamp search can be performed using standard operators like >
, <=
, etc. However, QuestDB provides a native notation that is faster and less verbose.
SELECT * FROM scores WHERE ts IN '2018';
Differences from Standard SQL
SELECT * FROM is Optional
In QuestDB, using SELECT * FROM
is optional, meaning that SELECT * FROM my_table;
is equivalent to my_table;
. While adding SELECT * FROM
may make SQL look more complete, there are examples where removing these keywords makes queries much easier to read.
my_table;
-- equivalent to:
SELECT * FROM my_table;
GROUP BY is Optional
The GROUP BY
clause is optional, and QuestDB's optimizer can derive grouping from the SELECT
clause. In standard SQL, users might write a query like:
SELECT a, b, c, d, sum(e) FROM tab GROUP BY a, b, c, d;
However, in QuestDB's SQL dialect, the same SQL can be written as:
SELECT a, b, c, d, sum(e) FROM tab;
Implicit HAVING
In standard SQL, consider a more complex example using HAVING
:
SELECT a, b, c, d, sum(e)
FROM tab
GROUP BY a, b, c, d
HAVING sum(e) > 100;
In QuestDB's dialect, subqueries come to the rescue to create a smaller, more readable query without unnecessary repeated aggregations. The HAVING
function can be indirectly achieved as follows:
(SELECT a, b, c, d, sum(e) s FROM tab) WHERE s > 100;
Partitions
QuestDB offers the option to partition tables based on time intervals, where data for each interval is stored in separate sets of files.
Features
-
Available partition intervals are
NONE
,YEAR
,MONTH
,DAY
, andHOUR
. -
When using ILP ingestion with
CREATE TABLE
andPARTITION BY DAY
, the default behavior isPARTITION BY NONE
. -
Partitions are defined when creating the table. For more information, refer to the
CREATE TABLE
section. -
Naming rule for partition directories:
Table Partition Partition format HOUR YYYY-MM-DD-HH DAY YYYY-MM-DD MONTH YYYY-MM YEAR YYYY
Advantages
- Reduced disk IO for timestamp interval queries due to the SQL optimizer taking advantage of partitioning.
- Substantially improved calculations and search times by leveraging data chronology and relative immutability from previous partitions.
- Physical separation of data files, facilitating adherence to file storage policies or easy removal of specific intervals.
Storage Example
Each partition effectively corresponds to a directory on the host machine, representing the partitioning interval. In the example below, we assume navigating a table partitioned using PARTITION BY MONTH
.
[quest-user trips]$ dir
2017-03 2017-10 2018-05 2019-02
2017-04 2017-11 2018-06 2019-03
2017-05 2017-12 2018-07 2019-04
2017-06 2018-01 2018-08 2019-05
2017-07 2018-02 2018-09 2019-06
2017-08 2018-03 2018-10
2017-09 2018-04 2018-11
Each partition on disk contains the column data files for the corresponding timestamp interval.
[quest-user 2019-06]$ dir
_archive cab_type.v dropoff_latitude.d ehail_fee.d
cab_type.d congestion_surcharge.d dropoff_location_id.d extra.d
cab_type.k dropoff_datetime.d dropoff_longitude.d fare_amount.d
Symbol
QuestDB provides a data type called symbol
to store repeating strings, implemented as an internal structure with integer tables and corresponding string values.
This section covers the concept, optional settings, and indicators for symbol
types.
Advantages
- Improved query performance as string operations compare and write
int
types instead of strings. - Enhanced storage efficiency as
int
types are mapped tostring
types. - SQL execution is indistinguishable for users, eliminating the need for additional tables or joins and reducing the complexity of database schemas.
Features
Symbol
tables are stored separately from column data.- Quick conversion between
string
andint
when reading or writing data. - Columns defined as
Symbol
types support indexing. - By default, QuestDB caches
Symbol
types in memory for advanced query speed and ILP ingestion speed. This setting is configurable.
Example
Columns can be specified as Symbol
using CREATE TABLE
similar to other types:
CREATE TABLE my_table
(symb SYMBOL CAPACITY 128 NOCACHE, price DOUBLE, ts TIMESTAMP)
timestamp(ts);
Additional symbol settings, defined either globally as part of server configuration or locally when creating a table:
- Symbol Capacity: An optional setting specifying how many different values this column should have. Depending on the used value, data structures will resize themselves to allow QuestDB to function properly when needed. Setting a low symbol value may result in performance degradation, while setting it too high may increase disk space and memory consumption. Symbol capacity is also used to set the initial symbol cache size when caching is enabled.
- Server-wide setting:
cairo.default.symbol.capacity
, default256
- Column-specific setting for
CREATE TABLE
:CAPACITY
option
- Server-wide setting:
- Cache: An optional setting indicating whether a symbol should be cached. When a symbol column is cached, QuestDB uses a Java heap-based hash table to resolve symbol values and keys. If a column has a large number of different symbol values (e.g., over 100,000), heap impact can be significant and may result in
OutOfMemory
errors. Not caching utilizes a memory-mapped structure that can handle larger value counts but is slower.- Server-wide setting:
cairo.default.symbol.cache.flag
, defaulttrue
- Column-specific setting when creating a table:
CACHE | NOCACHE
keyword forCREATE TABLE
- Server-wide setting:
Indexes
An index stores row locations for each value of the target column to provide faster read access. It allows bypassing full table scans during WHERE
conditional queries.
Indexing is available for symbol columns, and support for other types will be added over time.
Creating and Deleting Index
Ways to index a symbol column:
- During table creation using
CREATE TABLE
. - Indexing an existing symbol column using
ALTER TABLE ALTER COLUMN ADD INDEX
.
To delete an index:
How It Works?
An index creates a table of row positions for each distinct value for the target symbol. After the index is created, adding data to the table will update the index. Searches on indexed values will directly operate on the index table, providing the memory locations of the items, thus avoiding unnecessary table scans.
Table example and index table:
Table Index
|Row ID | Symbol | Value | | Symbol | Row IDs |
| 1 | A | 1 | | A | 1, 2, 4 |
| 2 | A | 0 | | B | 3 |
| 3 | B | 1 | | C | 5 |
| 4 | A | 1 |
| 5 | C | 0 |
INSERT INTO Table values(B, 1);
triggers two updates, one for the Table and another for the Index.
Table Index
|Row ID | Symbol | Value | | Symbol | Row IDs |
| 1 | A | 1 | | A | 1, 2, 4 |
| 2 | A | 0 | | B | 3, 6 |
| 3 | B | 1 | | C | 5 |
| 4 | A | 1 |
| 5 | C | 0 |
| 6 | B | 1 |
Index Capacity
When a symbol column is added to the index, an additional index capacity can be defined to specify how many row IDs will be stored in a single storage block on disk:
- Server-wide setting:
cairo.index.value.block.size
, default value is256
. - Column-wide setting:
[index](https://questdb.io/docs/reference/sql/create-table/#column-indexes)
option forCREATE TABLE
. - Column-wide setting: ALTER TABLE COLUMN ADD INDEX
Using fewer blocks to store row IDs provides better performance. Oversizing the setting will result in higher disk space usage than necessary.
Advantages
An index significantly reduces the complexity of queries covering a subset of an indexed column when using WHERE
subclauses.
Consider the query applied to the table above: SELECT sum(Value) FROM Table WHERE Symbol='A';
- Without an Index: The query engine will scan the entire table to execute the query, requiring 6 operations (reading each of the 6 rows once).
- With an Index: The query engine first scans the relatively small index table. In our example, it will find 'A' in the first row. Then, the query engine checks the values at specific locations (1, 2, 4) in the table to read the corresponding values. As a result, it only scans the relevant rows in the table and does not touch irrelevant rows.
Trade-offs
- Storage Space: An index will maintain a table for each distinct symbol value, containing the positions where these symbols can be found. As a result, there is a small storage cost associated with indexing a symbol column.
- Ingestion Performance: Each new entry in the table triggers an entry in the Index table. This means that any write now requires two write operations, taking twice as long.
Index Capacity
Consider an example table over time with 200 unique stock symbols and 1,000,000,000 records. The index should store 1,000,000,000 / 200 row IDs per symbol, meaning 5,000,000 should be stored per symbol.
- If the index capacity is set to 1,048,576, QuestDB will use 5 blocks to store row IDs.
- If the index capacity is set to 1,024, the number of blocks will be 4,883.