Release 3.0.2
Dear community members, the Apache Doris 3.0.2 version was officially released on October 15, 2024, featuring updates and improvements in compute-storage decoupling, data storage, lakehouse, query optimizer, query execution and more.
Quick Download: https://doris.apache.org/download/
GitHub Release: https://github.com/apache/doris/releases
Behavioral Changes
Storage
- Limited the number of tablets in a single backup task to prevent FE memory overflow. #40518
- The
SHOW PARTITIONS
command now displays theCommittedVersion
of partitions. #28274
Other
- The default printing mode (asynchronous) of
fe.log
now includes file line number information. If performance issues are encountered due to line number output, please switch to BRIEF mode. #39419 - The default value of the session variable
ENABLE_PREPARED_STMT_AUDIT_LOG
has been changed fromtrue
tofalse
, and the audit log of prepare statements will no longer be printed. #38865 - The default value of the session variable
max_allowed_packet
has been adjusted from 1MB to 16MB to align with MySQL 8.4. #38697 - The JVM of FE and BE defaults to using the UTF-8 character set. #39521
New Features
Storage
- Backup and recovery now support clearing tables or partitions that are not in the backup. #39028
Compute-Storage Decoupled
- Support for parallel recycling of expired data on multiple tablets. #37630
- Support for changing storage vaults through
ALTER
statements. #38685 #37606 - Support for importing a large number of tablets (5000+) in a single transaction (experimental feature). #38243
- Support for automatically aborting pending transactions caused by reasons such as node restarts, solving the issue of pending transactions blocking decommission or schema change. #37669
- A new session variable
enable_segment_cache
has been added to control whether to use segment cache during queries (default istrue
). #37141 - Resolved the issue of not being able to import a large amount of data during schema changes in compute-storage decoupled mode. #39558
- Support for adding multiple follower roles of FE in compute-storage decoupled mode. #38388
- Support for using memory as file cache to accelerate queries in environments with no disks or low-performance HDDs. #38811
Lakehouse
- New Lakesoul Catalog has been added. Apache Doris Docs
- A new system table
catalog_meta_cache_statistics
has been added to view the usage of various metadata caches in external catalog. #40155
Query Optimizer
- Support for
is [not] true/false
expressions. #38623
Query Execution
- A new CRC32 function has been added. #38204
- New aggregate functions skew and kurt have been added. #41277
- Profiles are now persisted to the FE's disk to retain more profiles. #33690
- A new system table
workload_group_privileges
has been added to view permission information related to workload groups. #38436 - A new system table
workload_group_resource_usage
has been added to monitor resource statistics of workload groups. #39177 - Workload groups now support limiting reads of local IO and remote IO. #39012
- Workload groups now support cgroupv2 to limit CPU usage. #39374
- A new system table
information_schema.partitions
has been added to view some table creation attributes. #40636
Other
- Support for using the
SHOW
statement to display BE's configuration information, such asSHOW BACKEND CONFIG LIKE ${pattern}
. #36525
Improvements
Load
- Improved the import efficiency of routine load when encountering frequent EOFs from Kafka. #39975
- The stream load result now includes the time taken to read HTTP data,
ReceiveDataTimeMs
, which can quickly determine slow stream load issues caused by network reasons. #40735 - Optimized the routine load timeout logic to avoid frequent timeouts during inverted index and mow writes. #40818
Storage
- Support for batch addition of partitions. #37114
Compute-Storage Decoupled
- Added the meta-service HTTP interface
/MetaService/http/show_meta_ranges
to facilitate the statistics of KV distribution in FDB. #39208 - The meta-service/recycler stop script ensures that the process fully exits before returning. #40218
- Support for using the session variable
version_comment
(Cloud Mode) to display the current deployment mode as compute-storage decoupled. #38269 - Fixed the detailed message returned when transaction submission fails. #40584
- Support for using one meta-service process to provide both metadata services and data recycling services. #40223
- Optimized the default configuration of file_cache to avoid potential issues when not set. #41421 #41507
- Improved query performance by batch retrieving the version of multiple partitions. #38949
- Delayed the redistribution of tablets to avoid query performance issues caused by temporary network fluctuations. #40371
- Optimized the read-write lock logic in the balance. #40633
- Enhanced the robustness of file cache in handling TTL filenames during restarts/crashes. #40226
- Added the BE HTTP interface
/api/file_cache?op=hash
to facilitate the calculation of the hash file names of segment files on disk. #40831 - Optimized the unified naming to be compatible with using compute group to represent BE groups (original cloud cluster). #40767
- Optimized the waiting time for obtaining locks when calculating delete bitmaps in primary key tables. #40341
- When there are many delete bitmaps in primary key tables, optimized the high CPU consumption during queries by pre-merging multiple delete bitmaps. #40204
- Support for managing FE/BE nodes in compute-storage decoupled mode through SQL statements, hiding the logic of direct interaction with meta-service when deploying in compute-storage decoupled mode. #40264
- Added a script for rapid deployment of FDB. #39803
- Optimized the output of
SHOW CACHE HOTSPOT
to unify the column name style with otherSHOW
statements. #41322 - When using a storage vault as the storage backend, disallowed the use of
latest_fs()
to avoid binding different storage backends to the same table. #40516 - Optimized the timeout strategy for calculating delete bitmaps when importing mow tables. #40562 #40333
- The enable_file_cache in be.conf is now enabled by default in compute-storage decoupled mode. #41502
Lakehouse
- When reading tables in CSV format, support for the session
keep_carriage_return
setting to control the reading behavior of the\r
symbol. #39980 - The default maximum memory of BE's JVM has been adjusted to 2GB (affecting only new deployments). #41403
- Hive Catalog has added
hive.recursive_directories_table
andhive.ignore_absent_partitions
properties to specify whether to recursively traverse data directories and whether to ignore missing partitions. #39494 - Optimized the Catalog refresh logic to avoid generating a large number of connections during refresh. #39205
SHOW CREATE DATABASE
andSHOW CREATE TABLE
for external data sources now display location information. #39179- The new optimizer supports inserting data into JDBC external tables using the
INSERT INTO
statement. #41511 - MaxCompute Catalog now supports complex data types. #39259
- Optimized the logic for reading and merging data shards of external tables. #38311
- Optimized some refresh strategies for metadata caches of external tables. #38506
- Paimon tables now support pushing down
IN/NOT IN
predicates. #38390 - Compatible with tables created in Parquet format by Paimon version 0.9. #41020
Asynchronous Materialized Views
- Building asynchronous materialized views now supports the use of both immediate and starttime. #39573
- Asynchronous materialized views based on external tables will refresh the metadata cache of the external tables before refreshing the materialized views, ensuring construction based on the latest external table data. #38212
- Partition incremental construction now supports rolling up according to weekly and quarterly granularities. #39286
Query Optimizer
- The aggregate function
GROUP_CONCAT
now supports the use of bothDISTINCT
andORDER BY
. #38080 - Optimized the collection and use of statistical information, as well as the logic for estimating row counts and cost calculations, to generate more efficient and stable execution plans.
- Window function partition data pre-filtering now supports cases containing multiple window functions. #38393
Query Execution
- Reduced query latency by running prepare pipeline tasks in parallel. #40874
- Display Catalog information in Profile. #38283
- Optimized the computational performance of
IN
filtering conditions. #40917 - Supported cgroupv2 in K8S to limit Doris's memory usage. #39256
- Optimized the performance of converting strings to datetime types. #38385
- When a
string
is a decimal number, support casting it to anint
, which will be more compatible with certain behaviors of MySQL. #38847
Semi-Structured Data Management
- Optimized the performance of inverted index matching. #41122
- Temporarily prohibited the creation of inverted indexes with tokenization on arrays. #39062
explode_json_array
now supports binary JSON types. #37278- IP data types now support bloomfilter indexes. #39253
- IP data types now support row storage. #39258
- Nested data types such as ARRAY, MAP, and STRUCT now support schema changes. #39210
- When creating MTMV, automatically truncate KEYs encountered in VARIANT data types. #39988
- Lazy loading of inverted indexes during queries to improve performance. #38979
add inverted index file size for open file
. #37482- Reduced access to object storage interfaces during compaction to improve performance. #41079
- Added three new query profile metrics related to inverted indexes. #36696
- Reduced cache overhead for non-PreparedStatement SQL to improve performance. #40910
- Pre-warming cache now supports inverted indexes. #38986
- Inverted indexes are now cached immediately after writing. #39076
Compatibility
- Fixed the issue of Thrift ID incompatibility on the master with branch-2.1. #41057
Other
- BE HTTP API now supports authentication; set config::enable_all_http_auth to true (default is false) when authentication is required. #39577
- Optimized the user permissions required for the REFRESH operation. Permissions have been relaxed from ALTER to SHOW. #39008
- Reduced the range of nextId when calling advanceNextId(). #40160
- Optimized the caching mechanism for Java UDFs. #40404
Bug Fixes
Load
- Fixed the issue where
abortTransaction
did not handle return codes. #41275 - Fixed the issue where transactions failed to commit or abort in compute-storage decoupled mode without calling
afterCommit/afterAbort
. #41267 - Fixed the issue where Routine Load could not work properly when modifying consumer offsets in compute-storage decoupled mode. #39159
- Fixed the issue of repeatedly closing file handles when obtaining error log file paths. #41320
- Fixed the issue of incorrect job progress caching for Routine Load in compute-storage decoupled mode. #39313
- Fixed the issue where Routine Load could get stuck when failing to commit transactions in compute-storage decoupled mode. #40539
- Fixed the issue where Routine Load kept reporting data quality check errors in compute-storage decoupled mode. #39790
- Fixed the issue where Routine Load did not check transactions before committing in compute-storage decoupled mode. #39775
- Fixed the issue where Routine Load did not check transactions before aborting in compute-storage decoupled mode. #40463
- Fixed the issue where cluster keys did not support certain data types. #38966
- Fixed the issue of transactions being repeatedly committed. #39786
- Fixed the issue of use after free with WAL when BE exits. #33131
- Fixed the issue where WAL playback did not skip completed import transactions in compute-storage decoupled mode. #41262
- Fixed the logic for selecting BE in group commit in compute-storage decoupled mode. #39986 #38644
- Fixed the issue where BE might crash when group commit was enabled for insert into. #39339
- Fixed the issue where insert into with group commit enabled might get stuck. #39391
- Fixed the issue where not enabling the group commit option during import might result in a table not found error. #39731
- Fixed the issue of transaction submission timeouts due to too many tablets. #40031
- Fixed the issue of concurrent opens with Auto Partition. #38605
- Fixed the issue of import lock granularity being too large. #40134
- Fixed the issue of coredumps caused by zero-length varchars. #40940
- Fixed the issue of incorrect index Id values in log prints. #38790
- Fixed the issue of memtable shifting not closing BRPC streaming. #40105
- Fixed the issue of inaccurate bvar statistics during memtable shifting. #39075
- Fixed the issue of multi-replication fault tolerance during memtable shifting. #38003
- Fixed the issue of incorrect message length calculations for Routine Load with multiple tables in one stream. #40367
- Fixed the issue of inaccurate progress reporting for Broker Load. #40325
- Fixed the issue of inaccurate data scan volume reporting for Broker Load. #40694
- Fixed the issue of concurrency with Routine Load in compute-storage decoupled mode. #39242
- Fixed the issue of Routine Load jobs being canceled in compute-storage decoupled mode. #39514
- Fixed the issue of progress not being reset when deleting Kafka topics. #38474
- Fixed the issue of updating progress during transaction state transitions in Routine Load. #39311
- Fixed the issue of Routine Load switching from a paused state to a paused state. #40728
- Fixed the issue of Stream Load records being missed due to database deletion. #39360
Storage
- Fixed the issue of missing storage policies. #38700
- Fixed the issue of errors during cross-version backup and recovery. #38370
- Fixed the NPE issue with ccr binlog. #39909
- Fixed potential issues with duplicate keys in mow. #41309 #39791 #39958 #38369 #38331
- Fixed the issue of not being able to write after backup and recovery in high-frequency write scenarios. #40118 #38321
- Fixed the issue of data errors potentially triggered by deleting empty strings and schema changes. #41064
- Fixed the issue of incorrect statistics due to column updates. #40880
- Limited the size of tablet meta pb to prevent BE crashes due to oversized meta. #39455
- Fixed the potential column misalignment issue with the new optimizer in
begin; insert into values; commit
. #39295
Compute-Storage Decoupled
- Fixed the issue where the tablet distribution might be inconsistent across multiple FEs in compute-storage decoupled mode. #41458
- Fixed the issue where TVF might not work in multi-computing group environments. #39249
- Fixed the issue where compaction used resources that had already been released when BE exited in compute-storage decoupled mode. #39302
- Fixed the issue where automatic start-stop might cause FE replay to get stuck. #40027
- Fixed the issue where the BE status and the stored status in meta-service were inconsistent. #40799
- Fixed the issue where the FE->meta-service connection pool could not automatically expire and reconnect. #41202 #40661
- Fixed the issue where some tablets might repeatedly undergo unexpected balance processes during rebalance. #39792
- Fixed the issue where storage vault permissions were lost after FE restarted. #40260
- Fixed the issue where tablet row counts and other statistical information might be incomplete due to FDB scan range pagination. #40494
- Fixed the performance issue caused by a large number of aborted transactions associated with the same label. #40606
- Fixed the issue where
commit_txn
did not automatically re-enter, maintaining consistent behavior between compute-storage decoupled and integrated modes. #39615 - Fixed the issue where the number of projected columns increased when dropping columns. #40187
- Fixed the issue where delete statements did not correctly handle return values, causing data to still be visible after deletion. #39428
- Fixed the coredump issue caused by rowset metadata competition during file cache preheating. #39361
- Fixed the issue where the entire cache space would be used up when TTL cache enabled LRU eviction. #39814
- Fixed the issue where temporary files could not be recycled when importing commit rowset failed with HDFS storage backend. #40215
Lakehouse
- Fixed some issues with predicate pushdown in JDBC Catalog. #39064
- Fixed the issue of not being able to read when
S``TRUCT
type columns are missing in Parquet format. #38718 - Fixed the issue of FileSystem leaks on the FE side in some cases. #38610
- Fixed the issue of metadata cache information being inconsistent when Hive/Iceberg tables write back in some cases. #40729
- Fixed the issue of unstable partition ID generation for external tables in some cases. #39325
- Fixed the issue of external table queries selecting BE nodes in the blacklist in some cases. #39451
- Optimized the timeout time for batch retrieval of external table partition information to avoid long-term thread occupation. #39346
- Fixed the issue of memory leaks when querying Hudi tables in some cases. #41256
- Fixed the issue of connection pool connection leaks in JDBC Catalog in some cases. #39582
- Fixed the issue of BE memory leaks in JDBC Catalog in some cases. #41041
- Fixed the issue of not being able to query Hudi data on Alibaba Cloud OSS. #41316
- Fixed the issue of not being able to read empty partitions in MaxCompute. #40046
- Fixed the issue of poor performance when querying Oracle through JDBC Catalog. #41513
- Fixed the issue of BE crashes when querying deletion vector of Paimon tables after enabling file cache features. #39877
- Fixed the issue of not being able to access Paimon tables on HDFS clusters with HA enabled. #39806
- Temporarily disabled the page index filtering feature of Parquet to avoid potential issues. #38691
- Fixed the issue of not being able to read unsigned types in Parquet files. #39926
- Fixed the issue of potential infinite loops when reading Parquet files in some cases. #39523
Asynchronous Materialized Views
- Fixed the issue where partition construction might select the wrong table to track partitions if both sides have the same column names. #40810
- Fixed the issue where transparent rewrite partition compensation might result in incorrect results. #40803
- Fixed the issue where transparent rewrite did not take effect on external tables. #38909
- Fixed the issue where nested materialized views might not refresh properly. #40433
Synchronous Materialized Views
- Fixed the issue where creating synchronous materialized views on MOW tables might result in incorrect query results. #39171
Query Optimizer
- Fixed the issue where existing synchronous materialized views might not be usable after upgrading. #41283
- Fixed the issue of not correctly handling milliseconds when comparing datetime literals. #40121
- Fixed the issue of potential errors in conditional function partition pruning. #39298
- Fixed the issue where MOW tables with synchronous materialized views could not perform delete operations. #39578
- Fixed the issue where the nullable of slots in JDBC external table query predicates might be incorrectly planned, causing query errors. #41014
Query Execution
- Fixed the memory leak issue caused by the use of runtime filters. #39155
- Fixed the issue of excessive memory usage by window functions. #39581
- Fixed a series of function compatibility issues during rolling upgrades. #41023 #40438 #39648
- Fixed the issue of incorrect results with
encryption_function
when used with constants. #40201 - Fixed the issue of errors when importing single-table materialized views. #39061
- Fixed the issue of incorrect partition result calculations for window functions. #39100 #40761
- Fixed the issue of incorrect calculations for topn when null values are present. #39497
- Fixed the issue of incorrect results with the
map_agg
function. #39743 - Fixed the issue of incorrect messages returned by cancel. #38982
- Fixed the issue of BE core dumps caused by encrypt and decrypt functions. #40726
- Fixed the issue of queries getting stuck due to too many scanners in high-concurrency scenarios. #40495
- Supported time types in runtime filters. #38258
- Fixed the issue of incorrect results with window funnel functions. #40960
Semi-Structured Data Management
- Fixed the issue of match function errors when no indexes were present. #38989
- Fixed the issue of crashes when ARRAY data types were used as parameters for array_min/array_max functions. #39492
- Fixed the issue of nullable with the
array_enumerate_uniq
function. #38384 - Fixed the issue of bloomfilter indexes not being updated when adding or deleting columns. #38431
- Fixed the issue of es-catalog parsing exceptions with array data. #39104
- Fixed the issue of improper predicate push-down in es-catalog. #40111
- Fixed the issue of exceptions caused by modifying input data with
map()
andstruct()
functions. #39699 - Fixed the issue of index compaction crashes in special cases. #40294
- Fixed the issue of ARRAY type inverted indexes missing nullbitmaps. #38907
- Fixed the issue of incorrect results with the
count()
function on inverted indexes. #41152 - Fixed the issue of correct results with the
explode_map
function when using aliases. #39757 - Fixed the issue of VARIANT type not being able to use row storage for exceptional JSON data. #39394
- Fixed the issue of memory leaks when returning ARRAY results with VARIANT type. #41358
- Fixed the issue of changing column names with VARIANT type. #40320
- Fixed the issue of potential precision loss when converting VARIANT type to DECIMAL type. #39650
- Fixed the issue of nullable handling with VARIANT type. #39732
- Fixed the issue of sparse column reading with VARIANT type. #40295
Other
- Fixed the compatibility issue between new and old audit log plugins. #41401
- Fixed the issue where users could see processes of others in certain cases. #39747
- Fixed the issue where users with permissions could not export. #38365
- Fixed the issue where create table like required create permissions for the existing table. #37879
- Fixed the issue where some features did not verify permissions. #39726
- Fixed the issue of not correctly closing connections when using SSL. #38587
- Fixed the issue where executing ALTER VIEW operations in some cases caused FE to fail to start. #40872