Pipeline Execution Engine

SinceVersion 2.0.0

Pipeline execution engine is an experimental feature added by Doris in version 2.0. The goal is to replace the current execution engine of Doris's volcano model, fully release the computing power of multi-core CPUs, and limit the number of Doris's query threads to solve the problem of Doris's execution thread bloat.

Its specific design, implementation and effects can be found in [DSIP-027](DSIP-027: Support Pipeline Exec Engine - DORIS - Apache Software Foundation)。

Principle

The current Doris SQL execution engine is designed based on the traditional volcano model, which has the following problems in a single multi-core scenario：

Inability to take full advantage of multi-core computing power to improve query performance,most scenarios require manual setting of parallelism for performance tuning, which is almost difficult to set in production environments.
Each instance of a standalone query corresponds to one thread of the thread pool, which introduces two additional problems.
- Once the thread pool is hit full. Doris' query engine will enter a pseudo-deadlock and will not respond to subsequent queries. At the same time there is a certain probability of entering a logical deadlock situation: for example, all threads are executing an instance's probe task.
- Blocking arithmetic will take up thread resources,blocking thread resources can not be yielded to instances that can be scheduled, the overall resource utilization does not go up.
Blocking arithmetic relies on the OS thread scheduling mechanism, thread switching overhead (especially in the scenario of system mixing)）

The resulting set of problems drove Doris to implement an execution engine adapted to the architecture of modern multi-core CPUs.

And as shown in the figure below (quoted from[Push versus pull-based loop fusion in query engines](jfp_1800010a (cambridge.org))），The resulting set of problems drove Doris to implement an execution engine adapted to the architecture of modern multi-core CPUs.：

Transformation of the traditional pull pull logic-driven execution process into a data-driven execution engine for the push model
Blocking operations are asynchronous, reducing the execution overhead caused by thread switching and thread blocking and making more efficient use of the CPU
Controls the number of threads to be executed and reduces the resource congestion of large queries on small queries in mixed load scenarios by controlling time slice switching

This improves the efficiency of CPU execution on mixed-load SQL and enhances the performance of SQL queries.

Usage

Set session variable

enable_pipeline_engine

This improves the efficiency of CPU execution on mixed-load SQL and enhances the performance of SQL queries

set enable_pipeline_engine = true;

parallel_pipeline_task_num

parallel_pipeline_task_num represents the concurrency of pipeline tasks of a query. Default value is 0 (e.g. half number of CPU cores). Users can adjust this value according to their own workloads.

set parallel_pipeline_task_num = 0;

You can limit the automatically configured concurrency by setting "max_instance_num."（The default value is 64)

Pipeline Execution Engine

Principle​

Usage​

Set session variable​

enable_pipeline_engine​

parallel_pipeline_task_num​

Principle

Usage

Set session variable

enable_pipeline_engine

parallel_pipeline_task_num