Skip to main content
Version: 1.0.1

ClickHouse

ClickHouse is a columnar, distributed, parallel-processing, horizontally scalable, and disk or memory-based data management system (DBMS) developed for Yandex.Metrica. It supports SQL (with some variations from standard SQL) but lacks true update/delete and transaction support. ClickHouse is an open-source and free project developed in C++.

Data Compression

To achieve the desired performance, ClickHouse utilizes data compression. This includes a variety of specialized codec components targeting different data types stored in separate columns, in addition to general-purpose compression.

Query Processing Across Multiple Servers

ClickHouse supports distributed query processing with data stored across different shards. Large queries are parallelized across multiple cores, utilizing the necessary resources.

SQL Query Syntax

ClickHouse supports SQL syntax similar to ANSI SQL. However, it is not identical, so translation may be required when migrating from another SQL-compatible system.

Vector Computation Engine

During data processing, ClickHouse operates with columnar arrays (referred to as vectors), performing operations on arrays of elements rather than individual values.

No Database Locks

ClickHouse continuously updates tables without looking at lock mechanisms when adding new data.

Primary and Data Skipping Indices

ClickHouse physically sorts data based on the primary key. Secondary indices (also known as "data skipping indices") pre-specify which data does not match filtering criteria and should be skipped.

Approximated Calculations

For more performance gains, ClickHouse can perform calculations on a data sample to find a compromise between accuracy and performance, especially for complex data science computations.

While ClickHouse is an excellent choice for many scenarios, it's crucial to keep its architectural features in mind. Understanding what stands behind this DBMS and how it operates is essential since ClickHouse is quite unique, making it easy to make mistakes leading to suboptimal performance. Let's start by looking at its most distinctive feature, the column-oriented structure of the storage.

When to Use ClickHouse?

When used in the right and appropriate scenarios, ClickHouse is a powerful, scalable, and fast solution that outperforms its competitors. Tailored for OLAP applications, ClickHouse includes optimizations for reading data and processing complex queries at high speeds.

You would benefit most from ClickHouse in the following cases:

  • Dealing with massive volumes of continuously written and read data (measured in terabytes).
  • Having tables with a large number of columns.
  • Needing to insert large batches of data over thousands of rows.
  • Not requiring changes to data later on.
  • When transactions are not needed.

For instance, Yandex uses more than 500 servers, processing 25 million records arriving every day. Another company using ClickHouse, Bloomberg, has over a hundred servers and accepts nearly a trillion new records every day (as of 2018 data).

When Not to Use ClickHouse?

ClickHouse is designed to be fast. However, the optimizations that make it an excellent solution for OLAP applications render it inadequate for other types of projects.

Do not use ClickHouse for OLTP. ClickHouse expects data to remain constant. While technically possible to remove large data chunks from the ClickHouse database, it's not fast. ClickHouse is not designed for data changes. Due to sparse indexing, it is inefficient in finding and retrieving individual rows with keys. Finally, ClickHouse does not fully support ACID transactions.

ClickHouse is not a key-value DBMS. Also, it is not designed as a file storage.

It is not a document-based database. ClickHouse uses a predefined schema that must be specified during table creation. The better the schema, the more effective and performant the queries become.