Skip to main content

TPC-DS Benchmark

TPC-DS (Transaction Processing Performance Council Decision Support Benchmark) is a benchmark test that focuses on decision support and aims to evaluate the performance of data warehousing and analytics systems. It was developed by the Transaction Processing Performance Council (TPC) organization to compare the capabilities of different systems in handling complex queries and large-scale data analysis.

The design goal of TPC-DS is to simulate complex decision support workloads in the real world. It tests the performance of systems through a series of complex queries and data operations, including joins, aggregations, sorting, filtering, subqueries, and more. These query patterns cover various scenarios ranging from simple to complex, such as report generation, data mining, and OLAP (Online Analytical Processing).

This document mainly introduces the performance of Doris on the TPC-DS 1000G test set.

On 99 queries on the TPC-DS standard test data set, we conducted a comparison test based on Apache Doris 2.1.7-rc03 and Apache Doris 2.0.15.1 versions.

TPCDS_1000G

1. Hardware Environment​

HardwareConfiguration Instructions
Number of Machines4 Aliyun Virtual Machine (1FE,3BEs)
CPUIntel Xeon (Ice Lake) Platinum 8369B 32C
Memory128G
DiskEnterprise SSD (PL0)

2. Software Environment​

  • Doris Deployed 3BEs and 1FE
  • Kernel Version: Linux version 5.15.0-101-generic
  • OS version: Ubuntu 20.04 LTS (Focal Fossa)
  • Doris software version: Apache Doris 2.1.7-rc03, Apache Doris 2.0.15.1
  • JDK: openjdk version "1.8.0_352-352"

3. Test Data Volume​

The TPC-DS 1000G data generated by the simulation of the entire test are respectively imported into Apache Doris 2.1.7-rc03 and Apache Doris 2.0.15.1 for testing. The following is the relevant description and data volume of the table.

TPC-DS Table NameRows
customer_demographics1,920,800
reason65
warehouse20
date_dim73,049
catalog_sales1,439,980,416
call_center42
inventory783,000,000
catalog_returns143,996,756
household_demographics7,200
customer_address6,000,000
income_band20
catalog_page30,000
item300,000
web_returns71,997,522
web_site54
promotion1,500
web_sales720,000,376
store1,002
web_page3,000
time_dim86,400
store_returns287,999,764
store_sales2,879,987,999
ship_mode20
customer12,000,000

4. Test SQL​

TPC-DS 99 test query statements : TPC-DS-Query-SQL

5. Test Results​

Here we use Apache Doris 2.1.7-rc03 and Apache Doris 2.0.15.1 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows: (Apache Doris 2.0.15.1 q78 q79 failed to execute due to lack of latest memory optimization and was removed when calculating the total sum)

QueryApache Doris 2.1.7-rc03 (ms)Apache Doris 2.0.15.1 (ms)
query01630890
query0249306930
query03360460
query041107042320
query0562015360
query062201020
query07550750
query08330670
query0968307550
query103702900
query11696027380
query1210080
query137902860
query141347042340
query15510940
query16520550
query1713102650
query18560820
query19200400
query20100190
query218080
query2223003070
query233824075260
query24834026580
query257801190
query26200220
query27530750
query2859407400
query299401250
query30270490
query3118902530
query326070
query33350450
query347501380
query3513708970
query36530570
query376060
query3875208710
query395601010
query40150180
query415040
query42100140
query4311501960
query4420203220
query45430960
query4612502760
query4726605790
query486302570
query49730800
query5016402200
query5164306270
query52110160
query53250490
query5412807790
query55110160
query56290410
query5714803510
query58240550
query59776011870
query60380490
query61540670
query627401560
query63210460
query6457906840
query6549007960
query66480810
query672732046110
query6816002380
query69380800
query7034805330
query71460790
query7231605390
query736601250
query74599016450
query7546108410
query7615902950
query77300480
query7817970-
query793040-
query80570910
query81460760
query82270330
query83220290
query84130110
query85520470
query867601220
query878008760
query8855609690
query89430750
query90150400
query91150120
query924040
query9324402670
query94340310
query953501810
query966601680
query97502014990
query98190330
query9915603230
Total261320507380

6. Environmental Preparation​

Please refer to the official document to install and deploy Doris to obtain a normal running Doris cluster (at least 1 FE 1 BE, 1 FE 3 BE is recommended).

7. Data Preparation​

7.1 Download and Install TPC-DS Data Generation Tool​

Execute the following script to download and compile the tpcds-tools tool.

sh bin/build-tpcds-dbgen.sh

7.2 Generating the TPC-DS Test Set​

Execute the following script to generate the TPC-H dataset:

sh bin/gen-tpcds-data.sh -s 1000

Note 1: Check the script help via sh gen-tpcds-data.sh -h.

Note 2: The data will be generated under the tpcds-data/ directory with the suffix .dat. The total file size is about 1000GB and may need a few minutes to an hour to generate.

Note 3: A standard test data set of 100G is generated by default.

7.3 Create Table​

7.3.1 Prepare the doris-cluster.conf File​

Before import the script, you need to write the FE’s ip port and other information in the doris-cluster.conf file.

The file is located under ${DORIS_HOME}/tools/tpcds-tools/conf/ .

The content of the file includes FE's ip, HTTP port, user name, password and the DB name of the data to be imported:

# Any of FE host
export FE_HOST='127.0.0.1'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD=''
# The database where TPC-H tables located
export DB='tpcds'

Execute the Following Script to Generate and Create TPC-H Table​

sh bin/create-tpcds-tables.sh -s 1000

Or copy the table creation statement in create-tpcds-tables and excute it in Doris.

7.4 Import Data​

Please perform data import with the following command:

sh bin/load-tpcds-data.sh

7.5 Query Test​

7.5.1 Executing Query Scripts​

Execute the above test SQL or execute the following command

sh bin/run-tpcds-queries.sh -s 1000

7.5.2 Single SQL Execution​

You can also retrieve the latest SQL from the code repository. The address for the latest test query statements of TPC-DS.