Alibaba Cloud DLF
Data Lake Formation (DLF) is the unified metadata management service of Alibaba Cloud. It is compatible with the Hive Metastore protocol.
Doris can access DLF the same way as it accesses Hive Metastore.
Connect to DLF
Create a DLF Catalog.
CREATE CATALOG dlf PROPERTIES (
"type"="hms",
"hive.metastore.type" = "dlf",
"dlf.proxy.mode" = "DLF_ONLY",
"dlf.endpoint" = "datalake-vpc.cn-beijing.aliyuncs.com",
"dlf.region" = "cn-beijing",
"dlf.uid" = "uid",
"dlf.catalog.id" = "catalog_id", //optional
"dlf.access_key" = "ak",
"dlf.secret_key" = "sk"
);
type
should always be hms
. If you need to access Alibaba Cloud OSS on the public network, can add "dlf.access.public"="true"
.
dlf.endpoint
: DLF Endpoint. See Regions and Endpoints of DLF.dlf.region
: DLF Region. See Regions and Endpoints of DLF.dlf.uid
: Alibaba Cloud account. You can find the "Account ID" in the upper right corner on the Alibaba Cloud console.dlf.catalog.id
: Optional. Used to specify the dlf catalog, if not specified, the default Catalog ID will be used.dlf.access_key
:AccessKey, which you can create and manage on the Alibaba Cloud console.dlf.secret_key
:SecretKey, which you can create and manage on the Alibaba Cloud console.
Other configuration items are fixed and require no modifications.
After the above steps, you can access metadata in DLF the same way as you access Hive MetaStore.
Doris supports accessing Hive/Iceberg/Hudi metadata in DLF.
Use OSS-HDFS as the datasource
-
Enable OSS-HDFS. Grant access to OSS or OSS-HDFS
-
Download the SDK. JindoData SDK. If the Jindo SDK directory already exists on the cluster, skip this step.
-
Decompress the jindosdk.tar.gz or locate the Jindo SDK directory on the cluster, and then enter its lib directory and put
jindo-core.jar, jindo-sdk.jar
to both${DORIS_HOME}/fe/lib
and${DORIS_HOME}/be/lib/java_extensions/preload-extensions
. -
Create DLF Catalog, set
oss.hdfs.enabled
astrue
:CREATE CATALOG dlf_oss_hdfs PROPERTIES (
"type"="hms",
"hive.metastore.type" = "dlf",
"dlf.proxy.mode" = "DLF_ONLY",
"dlf.endpoint" = "datalake-vpc.cn-beijing.aliyuncs.com",
"dlf.region" = "cn-beijing",
"dlf.uid" = "uid",
"dlf.catalog.id" = "catalog_id", //optional
"dlf.access_key" = "ak",
"dlf.secret_key" = "sk",
"oss.hdfs.enabled" = "true"
); -
When the Jindo SDK version is inconsistent with the version used on the EMR cluster, will reported
Plugin not found
and the Jindo SDK needs to be replaced with the corresponding version.
DLF Iceberg Catalog
CREATE CATALOG dlf_iceberg PROPERTIES (
"type"="iceberg",
"iceberg.catalog.type" = "dlf",
"dlf.proxy.mode" = "DLF_ONLY",
"dlf.endpoint" = "datalake-vpc.cn-beijing.aliyuncs.com",
"dlf.region" = "cn-beijing",
"dlf.uid" = "uid",
"dlf.catalog.id" = "catalog_id", //optional
"dlf.access_key" = "ak",
"dlf.secret_key" = "sk"
);
Column type mapping
Consistent with Hive Catalog, please refer to the column type mapping section in Hive Catalog.