Cluster Upgrade
Overviewβ
To upgrade, please use the steps recommended in this chapter to upgrade the cluster. The Doris cluster upgrade can be upgraded using the rolling upgrade method, which does not require all cluster nodes to be shut down for upgrade, which greatly reduces the impact on upper-layer applications.
Doris Release Notesβ
When upgrading Doris, please follow the principle of not skipping two minor versions and upgrade sequentially.
For example, if you are upgrading from version 0.15.x to 2.0.x, it is recommended to first upgrade to the latest version of 1.1, then upgrade to the latest version of 1.2, and finally upgrade to the latest version of 2.0.
Upgrade Stepsβ
Upgrade Instructionsβ
-
During the upgrade process, since Doris's RoutineLoad, Flink-Doris-Connector, and Spark-Doris-Connector have implemented a retry mechanism in the code, in a multi-BE node cluster, the rolling upgrade will not cause the task to fail .
-
The StreamLoad task requires you to implement a retry mechanism in your own code, otherwise the task will fail.
-
The cluster copy repair and balance function must be closed before and opened after the completion of a single upgrade task, regardless of whether all your cluster nodes have been upgraded.
Overview of the Upgrade Processβ
-
Metadata backup
-
Turn off the cluster copy repair and balance function
-
Compatibility testing
-
Upgrade BE
-
Upgrade FE
-
Turn on the cluster replica repair and balance function
Upgrade Pre-workβ
Please perform the upgrade in sequence according to the upgrade process
01 Metadata Backup (Important)
Make a full backup of the doris-meta
directory of the FE-Master node!
02 Turn off the cluster replica repair and balance function
There will be node restart during the upgrade process, so unnecessary cluster balancing and replica repair logic may be triggered, first close it with the following command:
admin set frontend config("disable_balance" = "true");
admin set frontend config("disable_colocate_balance" = "true");
admin set frontend config("disable_tablet_scheduler" = "true");
03 Compatibility Testing
Metadata compatibility is very important, if the upgrade fails due to incompatible metadata, it may lead to data loss! It is recommended to perform a metadata compatibility test before each upgrade!
- FE Compatibility Test
-
It is recommended to do FE compatibility test on your local development machine or BE node.
-
It is not recommended to test on Follower or Observer nodes to avoid link exceptions
-
If it must be on the Follower or Observer node, the started FE process needs to be stopped
a. Use the new version alone to deploy a test FE process
```shell
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon
```
b. Modify the FE configuration file fe.conf for testing
```shell
vi ${DORIS_NEW_HOME}/conf/fe.conf
```
Modify the following port information, set all ports to different from online
```shell
...
http_port = 18030
rpc_port = 19020
query_port = 19030
arrow_flight_sql_port = 19040
edit_log_port = 19010
...
```
Save and exit
c. Modify fe.conf
- Add ClusterID configuration in fe.conf
```shell
echo "cluster_id=123456" >> ${DORIS_NEW_HOME}/conf/fe.conf
```
- Add metadata failover configuration in fe.conf (>=2.0.2 + version does not require this operation)
echo "metadata_failure_recovery=true" >> ${DORIS_NEW_HOME}/conf/fe.conf
d. Copy the metadata directory doris-meta of the online environment Master FE to the test environment
```shell
cp ${DORIS_OLD_HOME}/fe/doris-meta/* ${DORIS_NEW_HOME}/fe/doris-meta
```
e. Change the cluster_id in the VERSION file copied to the test environment to 123456 (that is, the same as in step 3)
```shell
vi ${DORIS_NEW_HOME}/fe/doris-meta/image/VERSION
clusterId=123456
```
f. In the test environment, run the startup FE
-
If the version is greater than or equal to 2.0.2, run the following command
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon --metadata_failure_recovery
-
If the version is less than 2.0.2, run the following command
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon
g. Observe whether the startup is successful through the FE log fe.log
```shell
tail -f ${DORIS_NEW_HOME}/log/fe.log
```
h. If the startup is successful, it means that there is no problem with the compatibility, stop the FE process of the test environment, and prepare for the upgrade
```
sh ${DORIS_NEW_HOME}/bin/stop_fe.sh
```
2. BE Compatibility Test
You can use the grayscale upgrade scheme to upgrade a single BE first. If there is no exception or error, the compatibility is considered normal, and subsequent upgrade actions can be performed
Upgrade processβ
Upgrade BE first, then FE
Generally speaking, Doris only needs to upgrade /bin
and /lib
under the FE directory and /bin
and /lib
under the BE directory
In versions 2.0.2 and later, the custom_lib/
directory is added to the FE and BE deployment paths (if not, it can be created manually). The custom_lib/
directory is used to store some user-defined third-party jar packages, such as hadoop-lzo-*.jar
, orai18n.jar
, etc.
This directory does not need to be replaced during upgrade.
However, when a major version is upgraded, new features may be added or old functions refactored. These modifications may require replace/add more directories during the upgrade to ensure the availability of all new features. Please Carefully pay attention to the Release-Note of this version when upgrading the version to avoid upgrade failures
Upgrade BEβ
In order to ensure the safety of your data, please use 3 copies to store your data to avoid data loss caused by misoperation or failure of the upgrade
-
Under the premise of multiple copies, select a BE node to stop running and perform grayscale upgrade
sh ${DORIS_OLD_HOME}/be/bin/stop_be.sh
-
Rename the
/bin
,/lib
directories under the BE directorymv ${DORIS_OLD_HOME}/be/bin ${DORIS_OLD_HOME}/be/bin_back
mv ${DORIS_OLD_HOME}/be/lib ${DORIS_OLD_HOME}/be/lib_back -
Copy the new version of
/bin
,/lib
directory to the original BE directorycp -r ${DORIS_NEW_HOME}/be/bin ${DORIS_OLD_HOME}/be/bin
cp -r ${DORIS_NEW_HOME}/be/lib ${DORIS_OLD_HOME}/be/lib -
Start the BE node
sh ${DORIS_OLD_HOME}/be/bin/start_be.sh --daemon
-
Link the cluster to view the node information
show backends\G
If the
alive
status of the BE node istrue
, and the value ofVersion
is the new version, the node upgrade is successful -
Complete the upgrade of other BE nodes in sequence
05 Upgrade FE
Upgrade the non-Master nodes first, and then upgrade the Master nodes.
-
In the case of multiple FE nodes, select a non-Master node to upgrade and stop running first
sh ${DORIS_OLD_HOME}/fe/bin/stop_fe.sh
-
Rename the
/bin
,/lib
directories under the FE directorymv ${DORIS_OLD_HOME}/fe/bin ${DORIS_OLD_HOME}/fe/bin_back
mv ${DORIS_OLD_HOME}/fe/lib ${DORIS_OLD_HOME}/fe/lib_back -
Copy the new version of
/bin
,/lib
directory to the original FE directorycp -r ${DORIS_NEW_HOME}/fe/bin ${DORIS_OLD_HOME}/fe/bin
cp -r ${DORIS_NEW_HOME}/fe/lib ${DORIS_OLD_HOME}/fe/lib -
Start the FE node
sh ${DORIS_OLD_HOME}/fe/bin/start_fe.sh --daemon
-
Link the cluster to view the node information
show frontends\G
If the FE node
alive
status istrue
, and the value ofVersion
is the new version, the node is upgraded successfully -
Complete the upgrade of other FE nodes in turn, finally complete the upgrade of the Master node
06 Turn on the cluster replica repair and balance function
After the upgrade is complete and all BE nodes become Alive
, enable the cluster copy repair and balance function:
admin set frontend config("disable_balance" = "false");
admin set frontend config("disable_colocate_balance" = "false");
admin set frontend config("disable_tablet_scheduler" = "false");