跳到主要内容
跳到主要内容

持久化存储 与 ConfigMap

Doris-Operator 支持 Doris 各个组件的 pod 挂载 PV(Persistent Volume)。

PV 一般由 kubernetes 系统管理员创建,Doris-Operator 部署 Doris 服务的时候不直接使用 PV,而是通过 PVC 声明一组资源来向 kubernetes 集群申请 PV。 当 PVC 被创建时,Kubernetes 将尝试将其与符合要求的可用 PV 进行绑定。 StorageClass 屏蔽了管理员手动创建 PV 的过程,对于没有现成的 PV 满足 PVC 需求时,可以根据 StorageClass 动态分配 PV。 PV 提供多种存储类型,主要分为两大类:网络存储、本地存储。两者基于各自原理和实现,为用户提供不同的性能和使用方式的体验,用户可以依据自己的容器化的服务类型和自身需求选择。

如果部署时未对 PVC 进行配置,Doris-Operator 默认 使用 emptyDir 模式来存储 元数据 数据文件 和 运行日志。当 pod 重新启动时,相关数据将会丢失。

建议持久化存储的节点目录类型:

  • FE:doris-meta、log
  • BE:storage、log
  • CN:storage、log
  • BROKER:log

Doris-Operator 同时将日志输出到 console 和 指定目录下。如果用户的 Kubernetes 系统有完整的日志收集能力,可通过 console 输出来收集 Doris INFO 级别(默认)的日志信息。 但是这里仍然推荐配置 PVC 来持久化日志文件,因为除了 INFO 级别日志还会有诸如 fe.out、be.out、audit.log 以及 垃圾回收日志,便于快速定位问题和审计日志回溯。

ConfigMap 是 Kubernetes 中用于存储配置文件的资源对象,它允许动态挂载配置文件,并将配置文件与应用程序解耦,使得配置的管理更加灵活和可维护。 像 PVC 一样 ConfigMap 可以被 Pod 引用,以便在应用程序中使用配置数据。

StorageClass

Doris-Operator 提供了使用 Kubernetes 默认 StorageClass 模式来支持 FE 和 BE 数据存储,其中存储路径(mountPath)使用镜像里的默认配置。 如果用户需要自己指定 StorageClass 则需要在 spec.feSpec.persistentVolumes 内修改 persistentVolumeClaimSpec.storageClassName,参考如下:

apiVersion: doris.selectdb.com/v1
kind: DorisCluster
metadata:
labels:
app.kubernetes.io/name: doriscluster
name: doriscluster-sample-storageclass1
spec:
feSpec:
replicas: 3
image: selectdb/doris.fe-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
persistentVolumes:
- mountPath: /opt/apache-doris/fe/doris-meta
name: storage0
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
storageClassName: ${your_storageclass}
accessModes:
- ReadWriteOnce
resources:
# notice: if the storage size less 5G, fe will not start normal.
requests:
storage: 100Gi
- mountPath: /opt/apache-doris/fe/log
name: storage1
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
storageClassName: ${your_storageclass}
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
beSpec:
replicas: 3
image: selectdb/doris.be-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
persistentVolumes:
- mountPath: /opt/apache-doris/be/storage
name: storage2
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
storageClassName: ${your_storageclass}
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
- mountPath: /opt/apache-doris/be/log
name: storage3
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
storageClassName: ${your_storageclass}
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi

定制化 ConfigMap

Doris 在 Kubernetes 使用 ConfigMap 实现配置文件和服务解耦。 在部署 doriscluster 之前需要提前在同 namespace 下部署想要使用的 ConfigMap,以下样例展示了 FE 使用名称为 fe-configmap 的 ConfigMap, BE 使用名称为 be-configmap 的 ConfigMap 的集群相关 yaml:

FE 的 ConfigMap 样例

apiVersion: v1
kind: ConfigMap
metadata:
name: fe-configmap
labels:
app.kubernetes.io/component: fe
data:
fe.conf: |
CUR_DATE=`date +%Y%m%d-%H%M%S`

# the output dir of stderr and stdout
LOG_DIR = ${DORIS_HOME}/log

JAVA_OPTS="-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx8192m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xloggc:$DORIS_HOME/log/fe.gc.log.$CUR_DATE"

# For jdk 9+, this JAVA_OPTS will be used as default JVM options
JAVA_OPTS_FOR_JDK_9="-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx8192m -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xlog:gc*:$DORIS_HOME/log/fe.gc.log.$CUR_DATE:time"

# INFO, WARN, ERROR, FATAL
sys_log_level = INFO

# NORMAL, BRIEF, ASYNC
sys_log_mode = NORMAL

# Default dirs to put jdbc drivers,default value is ${DORIS_HOME}/jdbc_drivers
# jdbc_drivers_dir = ${DORIS_HOME}/jdbc_drivers

http_port = 8030
rpc_port = 9020
query_port = 9030
edit_log_port = 9010

enable_fqdn_mode = true

注意,使用 FE 的 ConfigMap ,必须为 fe.conf 添加 enable_fqdn_mode = true,具体原因可参考 此处文档

BE 的 ConfigMap 样例

apiVersion: v1
kind: ConfigMap
metadata:
name: be-configmap
labels:
app.kubernetes.io/component: be
data:
be.conf: |
CUR_DATE=`date +%Y%m%d-%H%M%S`

PPROF_TMPDIR="$DORIS_HOME/log/"

JAVA_OPTS="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xloggc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"

# For jdk 9+, this JAVA_OPTS will be used as default JVM options
JAVA_OPTS_FOR_JDK_9="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xlog:gc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"

# since 1.2, the JAVA_HOME need to be set to run BE process.
# JAVA_HOME=/path/to/jdk/

# https://github.com/apache/doris/blob/master/docs/zh-CN/community/developer-guide/debug-tool.md#jemalloc-heap-profile
# https://jemalloc.net/jemalloc.3.html
JEMALLOC_CONF="percpu_arena:percpu,background_thread:true,metadata_thp:auto,muzzy_decay_ms:15000,dirty_decay_ms:15000,oversize_threshold:0,lg_tcache_max:20,prof:false,lg_prof_interval:32,lg_prof_sample:19,prof_gdump:false,prof_accum:false,prof_leak:false,prof_final:false"
JEMALLOC_PROF_PRFIX=""

# INFO, WARNING, ERROR, FATAL
sys_log_level = INFO

# ports for admin, web, heartbeat service
be_port = 9060
webserver_port = 8040
heartbeat_service_port = 9050
brpc_port = 8060

使用以上两个 ConfigMapdoriscluster 部署样例:

apiVersion: doris.selectdb.com/v1
kind: DorisCluster
metadata:
labels:
app.kubernetes.io/name: doriscluster
name: doriscluster-sample-configmap
spec:
feSpec:
replicas: 3
image: selectdb/doris.fe-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
configMapInfo:
# use kubectl create configmap fe-configmap --from-file=fe.conf
configMapName: fe-configmap
resolveKey: fe.conf
beSpec:
replicas: 3
image: selectdb/doris.be-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
configMapInfo:
# use kubectl create configmap be-configmap --from-file=be.conf
configMapName: be-configmap
resolveKey: be.conf
brokerSpec:
replicas: 3
image: selectdb/doris.broker-ubuntu:2.0.2
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 2
memory: 4Gi
configMapInfo:
# use kubectl create configmap broker-configmap --from-file=apache_hdfs_broker.conf
configMapName: broker-configmap
resolveKey: apache_hdfs_broker.conf

这里的 resolveKey 是传入配置文件名(必须是fe.confbe.confapache_hdfs_broker.conf,cn 节点也是 be.conf) 用以解析传入的 Doris 集群配置的文件,doris-operator 会去解析该文件去指导 doriscluster 的定制化部署。

为 conf 目录添加特殊配置文件

本段落用来供参考 需要在 Doris 节点的 conf 目录放置配置其他文件的容器化部署方案。比如常见的 数据湖联邦查询 的 hdfs 配置文件映射。

这里以 BE 的 ConfigMap 和 需要添加的 core-site.xml 文件为例:

apiVersion: v1
kind: ConfigMap
metadata:
name: be-configmap
labels:
app.kubernetes.io/component: be
data:
be.conf: |
be_port = 9060
webserver_port = 8040
heartbeat_service_port = 9050
brpc_port = 8060
core-site.xml: |
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
</configuration>
...

注意,data 内数据结构如下键值对映射:

data:
文件名_1:
文件文本内容_1
文件名_2:
文件文本内容_2
文件名_3:
文件文本内容_3

BE 多盘配置

Doris 的 BE 服务支持多盘挂载,在服务器时代能够很好满足一个计算资源和存储资源不匹配的问题,同时使用多盘也能够很好提高 Doris 的存储效率。在 Kubernetes 上 Doris 同样可以挂载多盘来实现存储效益最大化。在 Kubernetes 上使用多盘需要配合配置文件一起使用。 为实现服务和配置解耦,Doris 采用 ConfigMap 来作为配置的承载,实现配置文件动态挂载给服务使用。 以下为 BE 服务使用 ConfigMap 来承载配置文件,挂载两块盘供BE使用的 doriscluster 配置:

apiVersion: doris.selectdb.com/v1
kind: DorisCluster
metadata:
labels:
app.kubernetes.io/name: doriscluster
name: doriscluster-sample-storageclass1
spec:
feSpec:
replicas: 3
image: selectdb/doris.fe-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
persistentVolumes:
- mountPath: /opt/apache-doris/fe/doris-meta
name: storage0
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
#storageClassName: openebs-jiva-csi-default
accessModes:
- ReadWriteOnce
resources:
# notice: if the storage size less 5G, fe will not start normal.
requests:
storage: 100Gi
- mountPath: /opt/apache-doris/fe/log
name: storage1
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
#storageClassName: openebs-jiva-csi-default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
beSpec:
replicas: 3
image: selectdb/doris.be-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
configMapInfo:
configMapName: be-configmap
resolveKey: be.conf
persistentVolumes:
- mountPath: /opt/apache-doris/be/storage
name: storage2
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
#storageClassName: openebs-jiva-csi-default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
- mountPath: /opt/apache-doris/be/storage1
name: storage3
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
#storageClassName: openebs-jiva-csi-default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
- mountPath: /opt/apache-doris/be/log
name: storage4
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
#storageClassName: openebs-jiva-csi-default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi

与默认样例相比增加了 configMapInfo 的配置,同时也增加了一个 persistentVolumeClaimSpec 的配置,persistentVolumeClaimSpec 完全遵循 Kubernetes 原生资源 PVC spec 的定义格式。 样例中 configMapInfo 标识 BE 部署后使用同 namespace 下哪一个 ConfigMap 以及 哪一个 key 对应的内容作为配置文件启动,其中 key 为必须为 be.conf。以下为需要预先部署的配合上述 doriscluster ConfigMap 样例:

apiVersion: v1
kind: ConfigMap
metadata:
name: be-configmap
labels:
app.kubernetes.io/component: be
data:
be.conf: |
CUR_DATE=`date +%Y%m%d-%H%M%S`

PPROF_TMPDIR="$DORIS_HOME/log/"

JAVA_OPTS="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xloggc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"

# For jdk 9+, this JAVA_OPTS will be used as default JVM options
JAVA_OPTS_FOR_JDK_9="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xlog:gc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"

# since 1.2, the JAVA_HOME need to be set to run BE process.
# JAVA_HOME=/path/to/jdk/

# https://github.com/apache/doris/blob/master/docs/zh-CN/community/developer-guide/debug-tool.md#jemalloc-heap-profile
# https://jemalloc.net/jemalloc.3.html
JEMALLOC_CONF="percpu_arena:percpu,background_thread:true,metadata_thp:auto,muzzy_decay_ms:15000,dirty_decay_ms:15000,oversize_threshold:0,lg_tcache_max:20,prof:false,lg_prof_interval:32,lg_prof_sample:19,prof_gdump:false,prof_accum:false,prof_leak:false,prof_final:false"
JEMALLOC_PROF_PRFIX=""

# INFO, WARNING, ERROR, FATAL
sys_log_level = INFO

# ports for admin, web, heartbeat service
be_port = 9060
webserver_port = 8040
heartbeat_service_port = 9050
brpc_port = 8060

storage_root_path = /opt/apache-doris/be/storage,medium:ssd;/opt/apache-doris/be/storage1,medium:ssd

在使用多盘时,ConfigMapstorage_root_path 对应值中的路径要与 dorisclusterpersistentVolume 各个挂载路径对应。storage_root_path 对应的书写规则请参考链接中文档。 在使用云盘的情形下,介质统一使用 SSD