Transparent Rewriting with Sync-Materialized View
Principle
A sync-materialized view is a special type of table that precomputes and stores data based on a predefined SELECT statement. Its primary purpose is to satisfy users' analytical needs for arbitrary dimensions of raw detailed data while also enabling rapid fixed-dimension analytical queries.
Materialized views are suitable for the following scenarios:
Analytical requirements cover both detailed data queries and fixed-dimension queries.
Queries only involve a small subset of columns or rows in the table.
Queries contain time-consuming processing operations, such as long aggregation operations.
Queries require matching different prefix indexes.
For queries that frequently reuse the same subquery results, a sync-materialized view can significantly enhance performance. Doris automatically maintains the data in the materialized view, ensuring data consistency between the base table and the materialized view without additional manual maintenance costs. During queries, the system automatically matches the optimal materialized view and reads data directly from it.
When using materialized views, please note the following points:
In Doris version 2.0, materialized views have enhanced features. It is recommended that users confirm in a test environment whether the expected queries can hit the desired materialized views before using them in a formal production environment.
It is not advisable to create multiple similar materialized views on the same table, as this may lead to conflicts between them, resulting in query misses.
Usage Process
The usage process for materialized views is as follows:
1 Create a Materialized View
Determine the type of materialized view to create based on the characteristics of the query statement.
Extract the common grouping and aggregation methods from multiple queries as the basis for defining the materialized view.
It is not necessary to create materialized views for all dimension combinations; only create them for commonly used dimension combinations.
Creating a materialized view is an asynchronous operation. After submitting the creation task, Doris will compute the existing data in the background until the creation is successful.
2 Automatic Query Matching
After the materialized view is successfully created, when a user queries the base table, Doris will automatically select an optimal materialized view and read data from it for computation.
Users can use the EXPLAIN command to check whether the current query is using a materialized view.
3 Update Strategy
To ensure data consistency between the materialized view and the base table, Doris synchronizes operations on the base table to the materialized view, using incremental updates to improve update efficiency and ensuring the atomicity of operations through transactions.
4 Supported Aggregation Functions
SUM, MIN, MAX (applicable to Version 0.12)
COUNT, BITMAP_UNION, HLL_UNION (applicable to Version 0.13)
General aggregation functions (applicable to Version 2.0)
Tuning Usage Case
The following is a specific example to illustrate the use of single-table materialized views:
Suppose we have a detailed sales record table sales_records
that records various information for each transaction, including transaction ID, salesperson ID, store ID, sales date, and transaction amount. Now, we frequently need to perform analytical queries on sales volumes for different stores.
To optimize the performance of these queries, we can create a materialized view store_amt
that groups by store ID and sums the sales amounts for the same store. The specific steps are as follows:
Create a Materialized View
First, we use the following SQL statement to create the materialized view store_amt
:
CREATE MATERIALIZED VIEW store_amt AS
SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;
After submitting the creation task, Doris will asynchronously build this materialized view in the background. We can view the creation progress of the materialized view through the following command:
SHOW ALTER TABLE MATERIALIZED VIEW FROM db_name;
When the State
field changes to FINISHED
, it indicates that the store_amt
materialized view has been successfully created.
Query Data
After the materialized view is created, when we query the sales volumes of different stores, Doris will automatically match the store_amt
materialized view and read the pre-aggregated data directly from it, significantly improving query efficiency.
The query statement is as follows:
SELECT store_id, SUM(sale_amt) FROM sales_records GROUP BY store_id;
We can also use the EXPLAIN
command to check whether the query successfully hits the materialized view:
EXPLAIN SELECT store_id, SUM(sale_amt) FROM sales_records GROUP BY store_id;
At the end of the execution plan, if similar content is displayed, it indicates that the query successfully hits the store_amt
materialized view:
TABLE: default_cluster:test.sales_records(store_amt), PREAGGREGATION: ON
By following these steps, we can utilize single-table materialized views to optimize query performance and improve the efficiency of data analysis.
Summary
By creating materialized views, we can significantly enhance the query speed for related aggregation analyses. Materialized views not only enable us to perform statistical analyses quickly but also flexibly support the query requirements of detailed data, making them a very powerful feature in Doris.