跳到主要内容
跳到主要内容

Hint

数据库 Hint 是一种数据库查询优化技术,用于指导数据库查询优化器如何执行特定的查询。通过提供 Hint,用户可以对查询优化器的默认行为进行微调,以期望获得更好的性能或满足特定需求。

Hint 作用:

  • 性能优化:通过 Hint,可以影响查询优化器的执行计划,从而提升查询性能。

  • 控制执行计划:可以指定使用连接方法、排序方法等。

  • 调试和测试:在调试和测试查询性能时,Hint 可以帮助确定问题的根源。

Hint 概述

在数据库中,"Hint" 是一种指令,用于指导查询优化器制定执行计划。通过在 SQL 语句中嵌入 Hint,可以影响优化器的决策,从而选择期望的执行路径。

以下是一个使用 Hint 的背景示例:

假设有一个包含大量数据的表,而在某些特定情况下,你了解到在一个查询中,表的连接顺序可能会影响查询性能。此时,Leading Hint 允许你指定希望优化器遵循的表连接顺序。

以下面 SQL 查询为例:若执行效率不理想,我们希望调整 join 顺序,同时不改变原始 SQL,以免影响用户原始场景,并达到调优目的。

mysql> explain shape plan select * from t1 join t2 on t1.c1 = c2;
+-------------------------------------------+
| Explain String |
+-------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
| --------PhysicalOlapScan[t2] |
| --------PhysicalDistribute |
| ----------PhysicalOlapScan[t1] |
+-------------------------------------------+
7 rows in set (0.06 sec)

此时,我们可以使用 Leading Hint 来任意改变 tableA 和 tableB 的 Join 顺序。例如:

mysql> explain shape plan select /*+ leading(t2 t1) */ * from t1 join t2 on c1 = c2;
+-----------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+-----------------------------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() build RFs:RF0 c1->[c2] |
| --------PhysicalOlapScan[t2] apply RFs: RF0 |
| --------PhysicalDistribute |
| ----------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(t2 t1) |
| UnUsed: |
| SyntaxError: |
+-----------------------------------------------------------------------------------------------------+
12 rows in set (0.06 sec)

在此示例中,使用了 /*+ leading(t2 t1) */ 这类 Hint。这类 Hint 会告知优化器在执行计划中使用指定表(t2)作为驱动表,并将其置于(t1)之前。

目前,Doris 主要支持与 Join 相关的 Hint 来指定连接操作的顺序或方式,包括:

  • LeadingHint:主要用于控制连接操作的顺序。

  • OrderedHint:主要用于固定连接操作的顺序,可以直接生成基于 SQL 写法的文本序的连接操作、

  • DistributeHint:主要用于固定连接操作右表的分布属性。

  • SetVarHint:主要用于设置在单条 SQL 里面使用的 sessionVariables,其作用仅在该条 SQL 的生命周期内生效。

接下来,将详细阐述如何在 Doris 中使用上述四类 JoinHint。

LeadingHint 使用说明

Leading Hint 是一种强大的查询优化技术,允许用户指导 Doris 优化器确定查询计划中的表连接顺序。正确使用 Leading Hint 可以显著提高复杂查询的性能。本文将详细介绍如何在 Doris 中使用 Leading Hint 来控制 join 顺序。

基本语法

Leading Hint 允许你指定希望优化器遵循的表连接顺序。在 Doris 里面,Leading Hint 的基本语法如下:

SELECT /*+ LEADING(tablespec [tablespec]...) */ ...

其中,需要注意的是:

  1. Leading Hint 由 /*+*/ 包围,并置于 SQL 语句中 SELECT 的正后方。

  2. tablespec 是表名或表别名,至少需要指定两个表。

  3. 多个表之间用空格分隔。

  4. 可以使用大括号{}来显式地指定 Join Tree 的形状。

注意

Leading Hint 后方的 / 和 SELECT 列表需要隔开至少一个分隔符,例如空格。至少需要写两个以上的表才认为这个 Leading Hint 是合理的。且任意的 Join 里面可以用大括号括起来,来显式地指定 Join Tree 的形状。

举例说明:

mysql> explain shape plan select /*+ leading(t2 t1) */ * from t1 join t2 on c1 = c2;
+------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(t2 t1) |
| UnUsed: |
| SyntaxError: |
+------------------------------------------------------------------------------+
12 rows in set (0.01 sec)

1. 当 Leading Hint 不生效的时候会走正常的流程生成计划,EXPLAIN 会显示使用的 Hint 是否生效,主要分三种来显示:

  • Used:Leading Hint 正常生效

  • Unused:这里不支持的情况包含 Leading Hint 指定的 join order 与原 SQL 不等价或本版本暂不支持特性(详见限制)

  • SyntaxError:指 Leading Hint 语法错误,如找不到对应的表等

2. Leading Hint 语法默认构造出左深树:

查询语句如下

select /leading(t1 t2 t3)/ * from t1 join t2 on... 

查询结果如下:

      join
/ \
join t3
/ \
t1 t2

mysql> explain shape plan select /*+ leading(t1 t2 t3) */ * from t1 join t2 on c1 = c2 join t3 on c2=c3;
+--------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: leading(t1 t2 t3) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------+
15 rows in set (0.00 sec)

3. 同时允许使用大括号指定 Join 树形状:

mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 join t2 on c1 = c2 join t3 on c2=c3;
+----------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+----------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t1] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ------------PhysicalOlapScan[t2] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: leading(t1 { t2 t3 }) |
| UnUsed: |
| SyntaxError: |
+----------------------------------------------------------------------------------+
15 rows in set (0.02 sec)

4. 当有 View 作为别名参与 JoinReorder 的时候可以指定对应的 View 作为 Leading Hint 的参数。例:

mysql> explain shape plan select /*+ leading(alias t1) */ count(*) from t1 join (select c2 from t2 join t3 on t2.c2 = t3.c3) as alias on t1.c1 = alias.c2;
+--------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------+
| PhysicalResultSink |
| --hashAgg[GLOBAL] |
| ----PhysicalDistribute[DistributionSpecGather] |
| ------hashAgg[LOCAL] |
| --------PhysicalProject |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = alias.c2)) otherCondition=() |
| ------------PhysicalProject |
| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ----------------PhysicalProject |
| ------------------PhysicalOlapScan[t2] |
| ----------------PhysicalDistribute[DistributionSpecHash] |
| ------------------PhysicalProject |
| --------------------PhysicalOlapScan[t3] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalProject |
| ----------------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(alias t1) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------------+
21 rows in set (0.06 sec)

基本用例

提示

在接下来的案例中,列命名和表命名相关,例如当只有 t1 中有 c1 字段,为了简化会将 t1.c1 直接写成 c1

建表语句如下:

CREATE DATABASE testleading;
USE testleading;

create table t1 (c1 int, c11 int) distributed by hash(c1) buckets 3 properties('replication_num' = '1');
create table t2 (c2 int, c22 int) distributed by hash(c2) buckets 3 properties('replication_num' = '1');
create table t3 (c3 int, c33 int) distributed by hash(c3) buckets 3 properties('replication_num' = '1');
create table t4 (c4 int, c44 int) distributed by hash(c4) buckets 3 properties('replication_num' = '1');

原始 plan:

mysql> explain shape plan select * from t1 join t2 on t1.c1 = c2;
+-------------------------------------------+
| Explain String |
+-------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
| --------PhysicalOlapScan[t2] |
| --------PhysicalDistribute |
| ----------PhysicalOlapScan[t1] |
+-------------------------------------------+
7 rows in set (0.06 sec)

当我们需要交换 t1 和 t2 的 join 顺序时,只需在前面加上 leading(t2 t1) 即可。在执行 explain 时,会显示是否使用了这个 hint。如下 Leading plan:Used 表示 Hint 正常生效

mysql> explain shape plan select /*+ leading(t2 t1) */ * from t1 join t2 on c1 = c2;
+------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| --------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(t2 t1) |
| UnUsed: |
| SyntaxError: |
+------------------------------------------------------------------------------+
12 rows in set (0.00 sec)

如果 Leading Hint 存在语法错误,explain 时会在SyntaxError里显示相应信息,但计划仍能照常生成,只是不会使用 Leading 而已。例如:

mysql> explain shape plan select /*+ leading(t2 t3) */ * from t1 join t2 on t1.c1 = c2;
+--------------------------------------------------------+
| Explain String |
+--------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
| --------PhysicalOlapScan[t1] |
| --------PhysicalDistribute |
| ----------PhysicalOlapScan[t2] |
| |
| Used: |
| UnUsed: |
| SyntaxError: leading(t2 t3) Msg:can not find table: t3 |
+--------------------------------------------------------+
11 rows in set (0.01 sec)

扩展场景

1. 左深树

上文我们提及,Doris 在查询语句不使用任何括号的情况下,Leading 会默认生成左深树。

mysql> explain shape plan select /*+ leading(t1 t2 t3) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3;
+--------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: leading(t1 t2 t3) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------+
15 rows in set (0.10 sec)

2. 右深树

当需要将计划的形状做成右深树、Bushy 树或者 zig-zag 树时,只需加上大括号来限制 plan 的形状即可,无需像 Oracle 使用 swap 从左深树一步步调整。

mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3;
+-----------------------------------------------+
| Explain String |
+-----------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
| --------PhysicalOlapScan[t1] |
| --------PhysicalDistribute |
| ----------hashJoin[INNER_JOIN](t2.c2 = t3.c3) |
| ------------PhysicalOlapScan[t2] |
| ------------PhysicalDistribute |
| --------------PhysicalOlapScan[t3] |
| |
| Used: leading(t1 { t2 t3 }) |
| UnUsed: |
| SyntaxError: |
+-----------------------------------------------+
14 rows in set (0.02 sec)

3. Bushy 树

mysql> explain shape plan select /*+ leading({t1 t2} {t3 t4}) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3 join t4 on c3 = c4;
+-----------------------------------------------+
| Explain String |
+-----------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN](t2.c2 = t3.c3) |
| --------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute |
| ----------hashJoin[INNER_JOIN](t3.c3 = t4.c4) |
| ------------PhysicalOlapScan[t3] |
| ------------PhysicalDistribute |
| --------------PhysicalOlapScan[t4] |
| |
| Used: leading({ t1 t2 } { t3 t4 }) |
| UnUsed: |
| SyntaxError: |
+-----------------------------------------------+
17 rows in set (0.02 sec)

4. Zig-Zag 树

mysql> explain shape plan select /*+ leading(t1 {t2 t3} t4) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3 join t4 on c3 = c4;
+--------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t3.c3 = t4.c4)) otherCondition=() |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ------------PhysicalOlapScan[t1] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ----------------PhysicalOlapScan[t2] |
| ----------------PhysicalDistribute[DistributionSpecHash] |
| ------------------PhysicalOlapScan[t3] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t4] |
| |
| Hint log: |
| Used: leading(t1 { t2 t3 } t4) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------------+
19 rows in set (0.02 sec)

5. Non-inner Join

当遇到非 inner-join(如 Outer Join 或 Semi/Anti Join)时,Leading Hint 会根据原始 SQL 语义自动推导各个 Join 的方式。若 Leading Hint 与原始 SQL 语义不同或无法生成,则会将其放入 UnUsed 中,但这并不影响计划正常流程的生成。

以下是一个不能交换的例子:

-------- test outer join which can not swap
-- t1 leftjoin (t2 join t3 on (P23)) on (P12) != (t1 leftjoin t2 on (P12)) join t3 on (P23)
mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 join t3 on c2 = c3;
+--------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalOlapScan[t1] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalOlapScan[t2] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: |
| UnUsed: leading(t1 { t2 t3 }) |
| SyntaxError: |
+--------------------------------------------------------------------------------+
15 rows in set (0.01 sec)

下面是一些可以交换的例子和不能交换的例子,读者可自行验证

-------- test outer join which can swap
-- (t1 leftjoin t2 on (P12)) innerjoin t3 on (P13) = (t1 innerjoin t3 on (P13)) leftjoin t2 on (P12)
explain shape plan select * from t1 left join t2 on c1 = c2 join t3 on c1 = c3;
explain shape plan select /*+ leading(t1 t3 t2) */ * from t1 left join t2 on c1 = c2 join t3 on c1 = c3;

-- (t1 leftjoin t2 on (P12)) leftjoin t3 on (P13) = (t1 leftjoin t3 on (P13)) leftjoin t2 on (P12)
explain shape plan select * from t1 left join t2 on c1 = c2 left join t3 on c1 = c3;
explain shape plan select /*+ leading(t1 t3 t2) */ * from t1 left join t2 on c1 = c2 left join t3 on c1 = c3;

-- (t1 leftjoin t2 on (P12)) leftjoin t3 on (P23) = t1 leftjoin (t2 leftjoin t3 on (P23)) on (P12)
select /*+ leading(t2 t3 t1) SWAP_INPUT(t1) */ * from t1 left join t2 on c1 = c2 left join t3 on c2 = c3;
explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 left join t3 on c2 = c3;
explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 left join t3 on c2 = c3;

-------- test outer join which can not swap
-- t1 leftjoin (t2 join t3 on (P23)) on (P12) != (t1 leftjoin t2 on (P12)) join t3 on (P23)
-- eliminated to inner join
explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 join t3 on c2 = c3;
explain graph select /*+ leading(t1 t2 t3) */ * from t1 left join (select * from t2 join t3 on c2 = c3) on c1 = c2;

-- test semi join
explain shape plan select * from t1 where c1 in (select c2 from t2);
explain shape plan select /*+ leading(t2 t1) */ * from t1 where c1 in (select c2 from t2);

-- test anti join
explain shape plan select * from t1 where exists (select c2 from t2);

6. View

在涉及别名(Alias)的情况下,可以将别名作为一个完整独立的子树进行指定,并在这些子树内部根据文本序生成 Join 顺序

mysql>  explain shape plan select /*+ leading(alias t1) */ count(*) from t1 join (select c2 from t2 join t3 on t2.c2 = t3.c3) as alias on t1.c1 = alias.c2;
+--------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------+
| PhysicalResultSink |
| --hashAgg[GLOBAL] |
| ----PhysicalDistribute[DistributionSpecGather] |
| ------hashAgg[LOCAL] |
| --------PhysicalProject |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = alias.c2)) otherCondition=() |
| ------------PhysicalProject |
| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| ----------------PhysicalProject |
| ------------------PhysicalOlapScan[t2] |
| ----------------PhysicalDistribute[DistributionSpecHash] |
| ------------------PhysicalProject |
| --------------------PhysicalOlapScan[t3] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalProject |
| ----------------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: leading(alias t1) |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------------+
21 rows in set (0.02 sec)

OrderedHint 使用说明

OrderedHint 用于固定 Join Tree 的形状,使其按照查询中表的文本顺序进行显示和执行。这在需要精确控制查询计划时特别有用。

OrderedHint 的语法为 /*+ ORDERED */,它应该被放置在 SELECT 语句中的 SELECT 关键字之后,紧接着查询的其余部分。

以下是一个使用 OrderedHint 的示例:

mysql> explain shape plan select /*+ ORDERED */ t1.c1 from t2 join t1 on t1.c1 = t2.c2 join t3 on c2 = c3;
+--------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalProject |
| ------------PhysicalOlapScan[t2] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalProject |
| --------------PhysicalOlapScan[t1] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalProject |
| ------------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: ORDERED |
| UnUsed: |
| SyntaxError: |
+--------------------------------------------------------------------------------+
18 rows in set (0.02 sec)

与 LeadingHint 的关系

当 OrderedHint 和 LeadingHint 同时使用时,OrderedHint 将优先于 LeadingHint。这意味着,即使指定了 LeadingHint,如果同时存在 OrderedHint,查询计划将按照 OrderedHint 的规则来执行,而 LeadingHint 将被忽略。

以下是一个示例,展示了当两者同时使用时的情况:

mysql> explain shape plan select /*+ ORDERED LEADING(t1 t2 t3) */ t1.c1 from t2 join t1 on t1.c1 = t2.c2 join t3 on c2 = c3;
+--------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------+
| PhysicalResultSink |
| --PhysicalDistribute[DistributionSpecGather] |
| ----PhysicalProject |
| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ----------PhysicalProject |
| ------------PhysicalOlapScan[t2] |
| ----------PhysicalDistribute[DistributionSpecHash] |
| ------------PhysicalProject |
| --------------PhysicalOlapScan[t1] |
| --------PhysicalDistribute[DistributionSpecHash] |
| ----------PhysicalProject |
| ------------PhysicalOlapScan[t3] |
| |
| Hint log: |
| Used: ORDERED |
| UnUsed: leading(t1 t2 t3) |
| SyntaxError: |
+--------------------------------------------------------------------------------+
18 rows in set (0.02 sec)

DistrbuteHint 使用说明

  • 目前只能指定右表的 Distribute Type,而且只有 [shuffle][broadcast] 两种。需写在 Join 右表前面,且允许使用中括号 []/+/两种写法。

  • 目前能使用任意个 DistributeHint。

  • 当遇到无法正确生成计划的 DistributeHint 时,系统不会显示错误,会按最大努力原则生效,最终以 EXPLAIN 显示的 Distribute 方式为准。

1. 与 OrderedHint 混用

利用文本序把 Join 顺序固定下来,然后再指定相应的 Join 里面我们预期使用的 Distribute 方式。例如:

使用前:

mysql> explain shape plan select count(*) from t1 join t2 on t1.c1 = t2.c2;
+----------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+----------------------------------------------------------------------------------+
| PhysicalResultSink |
| --hashAgg[GLOBAL] |
| ----PhysicalDistribute[DistributionSpecGather] |
| ------hashAgg[LOCAL] |
| --------PhysicalProject |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ------------PhysicalProject |
| --------------PhysicalOlapScan[t1] |
| ------------PhysicalDistribute[DistributionSpecHash] |
| --------------PhysicalProject |
| ----------------PhysicalOlapScan[t2] |
+----------------------------------------------------------------------------------+
11 rows in set (0.01 sec)

使用后:

mysql> explain shape plan select /*+ ordered */ count(*) from t2 join[broadcast] t1 on t1.c1 = t2.c2;
+----------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+----------------------------------------------------------------------------------+
| PhysicalResultSink |
| --hashAgg[GLOBAL] |
| ----PhysicalDistribute[DistributionSpecGather] |
| ------hashAgg[LOCAL] |
| --------PhysicalProject |
| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
| ------------PhysicalProject |
| --------------PhysicalOlapScan[t2] |
| ------------PhysicalDistribute[DistributionSpecReplicated] |
| --------------PhysicalProject |
| ----------------PhysicalOlapScan[t1] |
| |
| Hint log: |
| Used: ORDERED |
| UnUsed: |
| SyntaxError: |
+----------------------------------------------------------------------------------+
16 rows in set (0.01 sec)

Explain Shape Plan 里面会显示 Distribute 算子相关的信息。其中:

  • DistributionSpecReplicated 表示该算子将对应的数据复制到所有 BE 节点;

  • DistributionSpecGather 表示将数据 Gather 到 FE 节点;

  • DistributionSpecHash 表示将数据按照特定的 hashKey 以及算法打散到不同的 BE 节点。

2. 与 LeadingHint 混用

在编写 SQL 查询时,我们可以在使用 LEADING 提示的同时,为每个 JOIN 操作指定相应的 DISTRIBUTE 方式。以下是一个具体的例子,展示了如何在 SQL 查询中混合使用 DistributeHintLeadingHint

explain shape plan
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
/*+ leading(orders shuffle {lineitem shuffle part} shuffle {supplier broadcast nation} shuffle partsupp) */
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%green%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;

附录

1 Hint Log

Hint Log 主要用于在执行 EXPLAIN 时显示提示是否生效。其显示位置通常位于 EXPLAIN 输出的最下方。

Hint Log 分为三个状态:

+---------------------------------+
| Hint log: |
| Used: |
| UnUsed: |
| SyntaxError: |
+---------------------------------+
  • Used:表明该提示生效了。

  • UnUsedSyntaxError:都表明该提示未生效。但 SyntaxError 表示提示语法使用错误或该语法不支持,同时会附加不支持的原因信息。