跳到主要内容
跳到主要内容

异步物化视图

异步物化视图

物化视图的构建和维护

创建物化视图

准备两张表和数据

use tpch;

CREATE TABLE IF NOT EXISTS orders (
o_orderkey integer not null,
o_custkey integer not null,
o_orderstatus char(1) not null,
o_totalprice decimalv3(15,2) not null,
o_orderdate date not null,
o_orderpriority char(15) not null,
o_clerk char(15) not null,
o_shippriority integer not null,
o_comment varchar(79) not null
)
DUPLICATE KEY(o_orderkey, o_custkey)
PARTITION BY RANGE(o_orderdate)(
FROM ('2023-10-17') TO ('2023-10-20') INTERVAL 1 DAY)
DISTRIBUTED BY HASH(o_orderkey) BUCKETS 3
PROPERTIES ("replication_num" = "1");

insert into orders values
(1, 1, 'o', 99.5, '2023-10-17', 'a', 'b', 1, 'yy'),
(2, 2, 'o', 109.2, '2023-10-18', 'c','d',2, 'mm'),
(3, 3, 'o', 99.5, '2023-10-19', 'a', 'b', 1, 'yy');

CREATE TABLE IF NOT EXISTS lineitem (
l_orderkey integer not null,
l_partkey integer not null,
l_suppkey integer not null,
l_linenumber integer not null,
l_quantity decimalv3(15,2) not null,
l_extendedprice decimalv3(15,2) not null,
l_discount decimalv3(15,2) not null,
l_tax decimalv3(15,2) not null,
l_returnflag char(1) not null,
l_linestatus char(1) not null,
l_shipdate date not null,
l_commitdate date not null,
l_receiptdate date not null,
l_shipinstruct char(25) not null,
l_shipmode char(10) not null,
l_comment varchar(44) not null
)
DUPLICATE KEY(l_orderkey, l_partkey, l_suppkey, l_linenumber)
PARTITION BY RANGE(l_shipdate)
(FROM ('2023-10-17') TO ('2023-10-20') INTERVAL 1 DAY)
DISTRIBUTED BY HASH(l_orderkey) BUCKETS 3
PROPERTIES ("replication_num" = "1");

insert into lineitem values
(1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-10-17', '2023-10-17', '2023-10-17', 'a', 'b', 'yyyyyyyyy'),
(2, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-10-18', '2023-10-18', '2023-10-18', 'a', 'b', 'yyyyyyyyy'),
(3, 2, 3, 6, 7.5, 8.5, 9.5, 10.5, 'k', 'o', '2023-10-19', '2023-10-19', '2023-10-19', 'c', 'd', 'xxxxxxxxx');

创建物化视图

CREATE MATERIALIZED VIEW mv1 
BUILD DEFERRED REFRESH AUTO ON MANUAL
partition by(l_shipdate)
DISTRIBUTED BY RANDOM BUCKETS 2
PROPERTIES ('replication_num' = '1')
AS
select l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total
from lineitem
left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
group by
l_shipdate,
o_orderdate,
l_partkey,
l_suppkey;

具体的语法可查看CREATE ASYNC MATERIALIZED VIEW

查看物化视图元信息

select * from mv_infos("database"="tpch") where Name="mv1";

物化视图独有的特性可以通过mv_infos()查看

和table相关的属性,仍通过SHOW TABLES来查看

刷新物化视图

物化视图支持不同刷新策略,如定时刷新和手动刷新。也支持不同的刷新粒度,如全量刷新,分区粒度的增量刷新等。这里我们以手动刷新物化视图的部分分区为例。

首先查看物化视图分区列表

SHOW PARTITIONS FROM mv1;

刷新名字为p_20231017_20231018的分区

REFRESH MATERIALIZED VIEW mv1 partitions(p_20231017_20231018);

具体的语法可查看REFRESH MATERIALIZED VIEW

任务管理

每个物化视图都会默认有一个job负责刷新数据,job用来描述物化视图的刷新策略等信息,每次触发刷新,都会产生一个task, task用来描述具体的一次刷新信息,例如刷新用的时间,刷新了哪些分区等

查看物化视图的job

select * from jobs("type"="mv") order by CreateTime;

具体的语法可查看jobs("type"="mv")

暂停物化视图job定时调度

PAUSE MATERIALIZED VIEW JOB ON mv1;

可以暂停物化视图的定时调度

具体的语法可查看PAUSE MATERIALIZED VIEW JOB

恢复物化视图job定时调度

RESUME MATERIALIZED VIEW JOB ON mv1;

可以恢复物化视图的定时调度

具体的语法可查看RESUME MATERIALIZED VIEW JOB

查看物化视图的task

select * from tasks("type"="mv");

具体的语法可查看tasks("type"="mv")

取消物化视图的task

CANCEL MATERIALIZED VIEW TASK realTaskId on mv1;

可以取消本次task的运行

具体的语法可查看CANCEL MATERIALIZED VIEW TASK

修改物化视图

修改物化视图的属性

ALTER MATERIALIZED VIEW mv1 set("grace_period"="3333");

修改物化视图的名字,物化视图的刷新方式及物化视图特有的property可通过ALTER ASYNC MATERIALIZED VIEW来修改

物化视图本身也是一个 Table,所以 Table 相关的属性,例如副本数,仍通过ALTER TABLE相关的语法来修改。

删除物化视图

DROP MATERIALIZED VIEW mv1;

物化视图有专门的删除语法,不能通过drop table来删除,

具体的语法可查看DROP ASYNC MATERIALIZED VIEW

最佳实践

基表分区过多,物化视图只关注最近一段时间的数据

创建基表,有三个分区

CREATE TABLE t1 (
`k1` INT,
`k2` DATE NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT 'OLAP'
PARTITION BY range(`k2`)
(
PARTITION p26 VALUES [("2024-03-26"),("2024-03-27")),
PARTITION p27 VALUES [("2024-03-27"),("2024-03-28")),
PARTITION p28 VALUES [("2024-03-28"),("2024-03-29"))
)
DISTRIBUTED BY HASH(`k1`) BUCKETS 2
PROPERTIES (
'replication_num' = '1'
);

创建物化视图,只关注最近一天的数据,如果当前时间为2024-03-28 xx:xx:xx, 这样物化视图会仅有一个分区[("2024-03-28"),("2024-03-29"))

CREATE MATERIALIZED VIEW mv1
BUILD DEFERRED REFRESH AUTO ON MANUAL
partition by(`k2`)
DISTRIBUTED BY RANDOM BUCKETS 2
PROPERTIES (
'replication_num' = '1',
'partition_sync_limit'='1',
'partition_sync_time_unit'='DAY'
)
AS
SELECT * FROM t1;

时间又过了一天,当前时间为2024-03-29 xx:xx:xx,t1新增一个分区[("2024-03-29"),("2024-03-30")),如果此时刷新物化视图,刷新完成后,物化视图会仅有一个分区[("2024-03-29"),("2024-03-30"))

物化视图的使用

请参阅 查询异步物化视图

注意事项

  • 异步物化视图仅支持在Nereids 优化器使用
  • 当前判断物化视图和基表是否同步仅支持OlapTable。对于其它外表,会直接认为是同步的。例如,物化视图的基表全是外表。在查询mv_infos()时,SyncWithBaseTables会永远为1(true)。在刷新物化视图时需要手动刷新指定的分区或指定complete刷新全部分区