Standard deployment
Installation and Deployment
This topic is about the hardware and software environment needed to deploy Doris, the recommended deployment mode, cluster scaling, and common problems occur in creating and running clusters.
Before continue reading, you might want to compile Doris following the instructions in the General Compile topic.
Software and Hardware Requirementsβ
Overviewβ
Doris, as an open source OLAP database with an MPP architecture, can run on most mainstream commercial servers. For you to take full advantage of the high concurrency and high availability of Doris, we recommend that your computer meet the following requirements:
Linux Operating System Version Requirementsβ
Linux System | Version |
---|---|
Centos | 7.1 and above |
Ubuntu | 16.04 and above |
Software Requirementsβ
Soft | Version |
---|---|
Java | 1.8 and above |
GCC | 4.8.2 and above |
OS Installation Requirementsβ
Set the maximum number of open file descriptors in the system
vi /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
Clock synchronizationβ
The metadata in Doris requires a time precision of less than 5000ms, so all machines in all clusters need to synchronize their clocks to avoid service exceptions caused by inconsistencies in metadata caused by clock problems.
Close the swap partitionβ
The Linux swap partition can cause serious performance problems for Doris, so you need to disable the swap partition before installation.
Linux file systemβ
Both ext4 and xfs file systems are supported.
Development Test Environmentβ
Module | CPU | Memory | Disk | Network | Number of Instances |
---|---|---|---|---|---|
Frontend | 8 core + | 8GB + | SSD or SATA, 10GB + * | Gigabit Network Card | 1 |
Backend | 8 core + | 16GB + | SSD or SATA, 50GB + * | Gigabit Network Card | 1-3* |
Production Environmentβ
Module | CPU | Memory | Disk | Network | Number of Instances (Minimum Requirements) |
---|---|---|---|---|---|
Frontend | 16 core + | 64GB + | SSD or RAID card, 100GB + * | 10,000 Mbp network card | 1-3* |
Backend | 16 core + | 64GB + | SSD or SATA, 100G + * | 10-100 Mbp network card | 3 * |
Note 1:
- The disk space of FE is mainly used to store metadata, including logs and images. It usually ranges from several hundred MB to several GB.
- The disk space of BE is mainly used to store user data. The total disk space taken up is 3 times the total user data (3 copies). Then an additional 40% of the space is reserved for background compaction and intermediate data storage.
- On one single machine, you can deploy multiple BE instances but only one FE instance. If you need 3 copies of the data, you need to deploy 3 BE instances on 3 machines (1 instance per machine) instead of 3 BE instances on one machine). Clocks of the FE servers must be consistent (allowing a maximum clock skew of 5 seconds).
- The test environment can also be tested with only 1 BE instance. In the actual production environment, the number of BE instances directly determines the overall query latency.
- Disable swap for all deployment nodes.
Note 2: Number of FE nodes
- FE nodes are divided into Followers and Observers based on their roles. (Leader is an elected role in the Follower group, hereinafter referred to as Follower, too.)
- The number of FE nodes should be at least 1 (1 Follower). If you deploy 1 Follower and 1 Observer, you can achieve high read availability; if you deploy 3 Followers, you can achieve high read-write availability (HA).
- The number of Followers must be odd, and there is no limit on the number of Observers.
- According to past experience, for business that requires high cluster availability (e.g. online service providers), we recommend that you deploy 3 Followers and 1-3 Observers; for offline business, we recommend that you deploy 1 Follower and 1-3 Observers.
- Usually we recommend 10 to 100 machines to give full play to Doris' performance (deploy FE on 3 of them (HA) and BE on the rest).
- The performance of Doris is positively correlated with the number of nodes and their configuration. With a minimum of four machines (one FE, three BEs; hybrid deployment of one BE and one Observer FE to provide metadata backup) and relatively low configuration, Doris can still run smoothly.
- In hybrid deployment of FE and BE, you might need to be watchful for resource competition and ensure that the metadata catalogue and data catalogue belong to different disks.
Broker Deploymentβ
Broker is a process for accessing external data sources, such as hdfs. Usually, deploying one broker instance on each machine should be enough.
Network Requirementsβ
Doris instances communicate directly over the network. The following table shows all required ports.
Instance Name | Port Name | Default Port | Communication Direction | Description |
---|---|---|---|---|
BE | be_port | 9060 | FE --> BE | Thrift server port on BE for receiving requests from FE |
BE | webserver_port | 8040 | BE <--> BE | HTTP server port on BE |
BE | heartbeat_service_port | 9050 | FE --> BE | Heart beat service port (thrift) on BE, used to receive heartbeat from FE |
BE | brpc_port | 8060 | FE <--> BE, BE <--> BE | BRPC port on BE for communication between BEs |
FE | http_port | 8030 | FE <--> FE, user <--> FE | HTTP server port on FE |
FE | rpc_port | 9020 | BE --> FE, FE <--> FE | Thrift server port on FE; The configurations of each FE should be consistent. |
FE | query_port | 9030 | user <--> FE | MySQL server port on FE |
FE | edit_log_port | 9010 | FE <--> FE | Port on FE for BDBJE communication |
Broker | broker ipc_port | 8000 | FE --> Broker, BE --> Broker | Thrift server port on Broker for receiving requests |
Note:
- When deploying multiple FE instances, make sure that the http_port configuration of each FE is consistent.
- Make sure that each port has access in its proper direction before deployment.
IP Bindingβ
Because of the existence of multiple network cards, or the existence of virtual network cards caused by the installation of docker and other environments, the same host may have multiple different IPs. Currently Doris does not automatically identify available IPs. So when you encounter multiple IPs on the deployment host, you must specify the correct IP via the priority_networks
configuration item.
priority_networks
is a configuration item that both FE and BE have. It needs to be written in fe.conf and be.conf. It is used to tell the process which IP should be bound when FE or BE starts. Examples are as follows:
priority_networks=10.1.3.0/24
This is a representation of CIDR. FE or BE will find the matching IP based on this configuration item as their own local IP.
Note: Configuring priority_networks
and starting FE or BE only ensure the correct IP binding of FE or BE. You also need to specify the same IP in ADD BACKEND or ADD FRONTEND statements, otherwise the cluster cannot be created. For example:
BE is configured as priority_networks = 10.1.3.0/24'.
.
If you use the following IP in the ADD BACKEND statement: ALTER SYSTEM ADD BACKEND "192.168.0.1:9050";
Then FE and BE will not be able to communicate properly.
At this point, you must DROP the wrong BE configuration and use the correct IP to perform ADD BACKEND.
The same works for FE.
Broker currently does not have the priority_networks
configuration item, nor does it need. Broker's services are bound to 0.0.0.0 by default. You can simply execute the correct accessible BROKER IP when using ADD BROKER.
Table Name Case Sensitivityβ
By default, table names in Doris are case-sensitive. If you need to change that, you may do it before cluster initialization. The table name case sensitivity cannot be changed after cluster initialization is completed.
See the lower_case_table_names
section in Variables for details.
Cluster Deploymentβ
Manual Deploymentβ
Deploy FEβ
Copy the FE deployment file into the specified node
Find the Fe folder under the output generated by source code compilation, copy it into the specified deployment path of FE nodes and put it the corresponding directory.
Configure FE
The configuration file is conf/fe.conf. Note:
meta_dir
indicates the metadata storage location. The default value is${DORIS_HOME}/doris-meta
. The directory needs to be created manually.Note: For production environments, it is better not to put the directory under the Doris installation directory but in a separate disk (SSD would be the best); for test and development environments, you may use the default configuration.
The default maximum Java heap memory of JAVA_OPTS in fe.conf is 4GB. For production environments, we recommend that it be adjusted to more than 8G.
Start FE
bin/start_fe.sh --daemon
The FE process starts and enters the background for execution. Logs are stored in the log/ directory by default. If startup fails, you can view error messages by checking out log/fe.log or log/fe.out.
For details about deployment of multiple FEs, see the FE scaling section.
Deploy BEβ
Copy the BE deployment file into all nodes that are to deploy BE on
Find the BE folder under the output generated by source code compilation, copy it into to the specified deployment paths of the BE nodes.
Modify all BE configurations
Modify be/conf/be.conf, which mainly involves configuring
storage_root_path
: data storage directory. By default, under be/storage, the directory needs to be created manually. Use;
to separate multiple paths (do not add;
after the last directory).
You may specify the directory storage medium in the path: HDD or SSD. You may also add capacility limit to the end of every path and use ,
for separation. Unless you use a mix of SSD and HDD disks, you do not need to follow the configuration methods in Example 1 and Example 2 below, but only need to specify the storage directory; you do not need to modify the default storage medium configuration of FE, either.
Example 1:
Note: For SSD disks, add .SSD
to the end of the directory; for HDD disks, add .HDD
.
`storage_root_path=/home/disk1/doris.HDD;/home/disk2/doris.SSD;/home/disk2/doris`
Description
* 1./home/disk1/doris.HDD: The storage medium is HDD;
* 2./home/disk2/doris.SSD: The storage medium is SSD;
* 3./home/disk2/doris: The storage medium is HDD (default).
Example 2:
Note: You do not need to add the .SSD
or .HDD
suffix, but to specify the medium in the storage_root_path
parameter
`storage_root_path=/home/disk1/doris,medium:HDD;/home/disk2/doris,medium:SSD`
Description
* 1./home/disk1/doris,medium:HDD : The storage medium is HDD;
* 2./home/disk2/doris,medium:SSD : The storage medium is SSD.
BE webserver_port configuration
If the BE component is installed in hadoop cluster, you need to change configuration
webserver_port=8040
to avoid port used.Set JAVA_HOME environment variable
Java UDF is supported since version 1.2, so BEs are dependent on the Java environment. It is necessary to set the `JAVA_HOME` environment variable before starting. You can also do this by adding `export JAVA_HOME=your_java_home_path` to the first line of the `start_be.sh` startup script.Install Java UDF
Because Java UDF is supported since version 1.2, you need to download the JAR package of Java UDF from the official website and put them under the lib directory of BE, otherwise it may fail to start.Add all BE nodes to FE
BE nodes need to be added in FE before they can join the cluster. You can use mysql-client (Download MySQL 5.7) to connect to FE:
./mysql-client -h fe_host -P query_port -uroot
fe_host
is the node IP where FE is located;query_port
is in fe/conf/fe.conf; the root account is used by default and no password is required in login.After login, execute the following command to add all the BE host and heartbeat service port:
ALTER SYSTEM ADD BACKEND "be_host:heartbeat_service_port";
be_host
is the node IP where BE is located;heartbeat_service_port
is in be/conf/be.conf.Start BE
bin/start_be.sh --daemon
The BE process will start and go into the background for execution. Logs are stored in be/log/directory by default. If startup fails, you can view error messages by checking out be/log/be.log or be/log/be.out.
View BE status
Connect to FE using mysql-client and execute
SHOW PROC '/backends';
to view BE operation status. If everything goes well, theisAlive
column should betrue
.
(Optional) FS_Broker Deploymentβ
Broker is deployed as a plug-in, which is independent of Doris. If you need to import data from a third-party storage system, you need to deploy the corresponding Broker. By default, Doris provides fs_broker for HDFS reading and object storage (supporting S3 protocol). fs_broker is stateless and we recommend that you deploy a Broker for each FE and BE node.
Copy the corresponding Broker directory in the output directory of the source fs_broker to all the nodes that need to be deployed. It is recommended to keep the Broker directory on the same level as the BE or FE directories.
Modify the corresponding Broker configuration
You can modify the configuration in the corresponding broker/conf/directory configuration file.
Start Broker
bin/start_broker.sh --daemon
Add Broker
To let Doris FE and BE know which nodes Broker is on, add a list of Broker nodes by SQL command.
Use mysql-client to connect the FE started, and execute the following commands:
ALTER SYSTEM ADD BROKER broker_name "broker_host1:broker_ipc_port1","broker_host2:broker_ipc_port2",...;
broker\_host
is the Broker node ip;broker_ipc_port
is in conf/apache_hdfs_broker.conf in the Broker configuration file.View Broker status
Connect any started FE using mysql-client and execute the following command to view Broker status:
SHOW PROC '/brokers';
Note: In production environments, daemons should be used to start all instances to ensure that processes are automatically pulled up after they exit, such as Supervisor. For daemon startup, in Doris 0.9.0 and previous versions, you need to remove the last &
symbol in the start_xx.sh scripts. In Doris 0.10.0 and the subsequent versions, you may just call sh start_xx.sh
directly to start.
FAQβ
Process-Related Questionsβ
How can we know whether the FE process startup succeeds?
After the FE process starts, metadata is loaded first. Based on the role of FE, you can see
transfer from UNKNOWN to MASTER/FOLLOWER/OBSERVER
in the log. Eventually, you will see thethrift server started
log and can connect to FE through MySQL client, which indicates that FE started successfully.You can also check whether the startup was successful by connecting as follows:
http://fe_host:fe_http_port/api/bootstrap
If it returns:
{"status":"OK","msg":"Success"}
The startup is successful; otherwise, there may be problems.
Note: If you can't see the information of boot failure in fe. log, you may check in fe. out.
How can we know whether the BE process startup succeeds?
After the BE process starts, if there have been data there before, it might need several minutes for data index loading.
If BE is started for the first time or the BE has not joined any cluster, the BE log will periodically scroll the words
waiting to receive first heartbeat from frontend
, meaning that BE has not received the Master's address through FE's heartbeat and is waiting passively. Such error log will disappear after sending the heartbeat by ADD BACKEND in FE. If the wordmaster client', get client from cache failed. host:, port: 0, code: 7
appears after receiving the heartbeat, it indicates that FE has successfully connected BE, but BE cannot actively connect FE. You may need to check the connectivity of rpc_port from BE to FE.If BE has been added to the cluster, the heartbeat log from FE will be scrolled every five seconds:
get heartbeat, host:xx. xx.xx.xx, port:9020, cluster id:xxxxxxx
, indicating that the heartbeat is normal.Secondly, if the word
finish report task success. return code: 0
is scrolled every 10 seconds in the log, that indicates that the BE-to-FE communication is normal.Meanwhile, if there is a data query, you will see the rolling logs and the
execute time is xxx
logs, indicating that BE is started successfully, and the query is normal.You can also check whether the startup was successful by connecting as follows:
http://be_host:be_http_port/api/health
If it returns:
{"status": "OK","msg": "To Be Added"}
That means the startup is successful; otherwise, there may be problems.
Note: If you can't see the information of boot failure in be.INFO, you may see it in be.out.
How can we confirm that the connectivity of FE and BE is normal after building the system?
Firstly, you need to confirm that FE and BE processes have been started separately and worked normally. Then, you need to confirm that all nodes have been added through
ADD BACKEND
orADD FOLLOWER/OBSERVER
statements.If the heartbeat is normal, BE logs will show
get heartbeat, host:xx.xx.xx.xx, port:9020, cluster id:xxxxx
; if the heartbeat fails, you will seebackend [10001] got Exception: org.apache.thrift.transport.TTransportException
in FE's log, or other thrift communication abnormal log, indicating that the heartbeat from FE to 10001 BE fails. Here you need to check the connectivity of the FE to BE host heartbeat port.If the BE-to-FE communication is normal, the BE log will display the words
finish report task success. return code: 0
. Otherwise, the wordsmaster client, get client from cache failed
will appear. In this case, you need to check the connectivity of BE to the rpc_port of FE.What is the Doris node authentication mechanism?
In addition to Master FE, the other role nodes (Follower FE, Observer FE, Backend) need to register to the cluster through the
ALTER SYSTEM ADD
statement before joining the cluster.When Master FE is started for the first time, a cluster_id is generated in the doris-meta/image/VERSION file.
When FE joins the cluster for the first time, it first retrieves the file from Master FE. Each subsequent reconnection between FEs (FE reboot) checks whether its cluster ID is the same as that of other existing FEs. If it is not the same, the FE will exit automatically.
When BE first receives the heartbeat of Master FE, it gets the cluster ID from the heartbeat and records it in the
cluster_id
file of the data directory. Each heartbeat after that compares that to the cluster ID sent by FE. If the cluster IDs are not matched, BE will refuse to respond to FE's heartbeat.The heartbeat also contains Master FE's IP. If the Master FE changes, the new Master FE will send the heartbeat to BE together with its own IP, and BE will update the Master FE IP it saved.
priority_network
priority_network is a configuration item that both FE and BE have. It is used to help FE or BE identify their own IP addresses in the cases of multi-network cards. priority_network uses CIDR notation: RFC 4632
If the connectivity of FE and BE is confirmed to be normal, but timeout still occurs in creating tables, and the FE log shows the error message
backend does not find. host:xxxx.xxx.XXXX
, this means that there is a problem with the IP address automatically identified by Doris and that the priority_network parameter needs to be set manually.The explanation to this error is as follows. When the user adds BE through the
ADD BACKEND
statement, FE recognizes whether the statement specifies hostname or IP. If it is a hostname, FE automatically converts it to an IP address and stores it in the metadata. When BE reports on the completion of the task, it carries its own IP address. If FE finds that the IP address reported by BE is different from that in the metadata, the above error message will show up.Solutions to this error: 1) Set priority_network in FE and BE separately. Usually FE and BE are in one network segment, so their priority_network can be set to be the same. 2) Put the correct IP address instead of the hostname in the
ADD BACKEND
statement to prevent FE from getting the wrong IP address.What is the number of file descriptors of a BE process?
The number of file descriptor of a BE process is determined by two parameters:
min_file_descriptor_number
/max_file_descriptor_number
.If it is not in the range [
min_file_descriptor_number
,max_file_descriptor_number
], error will occurs when starting a BE process. You may use the ulimit command to reset the parameters.The default value of
min_file_descriptor_number
is 65536.The default value of
max_file_descriptor_number
is 131072.For example, the command
ulimit -n 65536;
means to set the number of file descriptors to 65536.After starting a BE process, you can use cat /proc/$pid/limits to check the actual number of file descriptors of the process.
If you have used
supervisord
and encountered a file descriptor error, you can fix it by modifyingminfds
in supervisord.conf.vim /etc/supervisord.conf
minfds=65535 ; (min. avail startup file descriptors;default 1024)