Pipeline Execution Engine
Pipeline Execution Engine
Pipeline execution engine is an experimental feature added by Doris in version 2.0. The goal is to replace the current execution engine of Doris's volcano model, fully release the computing power of multi-core CPUs, and limit the number of Doris's query threads to solve the problem of Doris's execution thread bloat.
Its specific design, implementation and effects can be found in [DSIP-027](DSIP-027: Support Pipeline Exec Engine - DORIS - Apache Software Foundation)。
Principle
The current Doris SQL execution engine is designed based on the traditional volcano model, which has the following problems in a single multi-core scenario:
Inability to take full advantage of multi-core computing power to improve query performance,most scenarios require manual setting of parallelism for performance tuning, which is almost difficult to set in production environments.
Each instance of a standalone query corresponds to one thread of the thread pool, which introduces two additional problems.
- Once the thread pool is hit full. Doris' query engine will enter a pseudo-deadlock and will not respond to subsequent queries. At the same time there is a certain probability of entering a logical deadlock situation: for example, all threads are executing an instance's probe task.
- Blocking arithmetic will take up thread resources,blocking thread resources can not be yielded to instances that can be scheduled, the overall resource utilization does not go up.
Blocking arithmetic relies on the OS thread scheduling mechanism, thread switching overhead (especially in the scenario of system mixing))
The resulting set of problems drove Doris to implement an execution engine adapted to the architecture of modern multi-core CPUs.
And as shown in the figure below (quoted from[Push versus pull-based loop fusion in query engines](jfp_1800010a (cambridge.org))),The resulting set of problems drove Doris to implement an execution engine adapted to the architecture of modern multi-core CPUs.:
- Transformation of the traditional pull pull logic-driven execution process into a data-driven execution engine for the push model
- Blocking operations are asynchronous, reducing the execution overhead caused by thread switching and thread blocking and making more efficient use of the CPU
- Controls the number of threads to be executed and reduces the resource congestion of large queries on small queries in mixed load scenarios by controlling time slice switching
This improves the efficiency of CPU execution on mixed-load SQL and enhances the performance of SQL queries.
Usage
Set session variable
enable_pipeline_engine
This improves the efficiency of CPU execution on mixed-load SQL and enhances the performance of SQL queries
set enable_pipeline_engine = true;
parallel_pipeline_task_num
parallel_pipeline_task_num
represents the concurrency of pipeline tasks of a query. Default value is 0
(e.g. half number of CPU cores). Users can adjust this value according to their own workloads.
set parallel_pipeline_task_num = 0;
You can limit the automatically configured concurrency by setting "max_instance_num."(The default value is 64)