Release 3.0.2

Dear community members, the Apache Doris 3.0.2 version was officially released on October 15, 2024, featuring updates and improvements in compute-storage decoupling, data storage, lakehouse, query optimizer, query execution and more.

Quick Download: https://doris.apache.org/download/

GitHub Release: https://github.com/apache/doris/releases

Behavioral Changes

Storage

Limited the number of tablets in a single backup task to prevent FE memory overflow. #40518
The SHOW PARTITIONS command now displays the CommittedVersion of partitions. #28274

Other

The default printing mode (asynchronous) of fe.log now includes file line number information. If performance issues are encountered due to line number output, please switch to BRIEF mode. #39419
The default value of the session variable ENABLE_PREPARED_STMT_AUDIT_LOG has been changed from true to false, and the audit log of prepare statements will no longer be printed. #38865
The default value of the session variable max_allowed_packet has been adjusted from 1MB to 16MB to align with MySQL 8.4. #38697
The JVM of FE and BE defaults to using the UTF-8 character set. #39521

New Features

Storage

Backup and recovery now support clearing tables or partitions that are not in the backup. #39028

Compute-Storage Decoupled

Support for parallel recycling of expired data on multiple tablets. #37630
Support for changing storage vaults through ALTER statements. #38685 #37606
Support for importing a large number of tablets (5000+) in a single transaction (experimental feature). #38243
Support for automatically aborting pending transactions caused by reasons such as node restarts, solving the issue of pending transactions blocking decommission or schema change. #37669
A new session variable enable_segment_cache has been added to control whether to use segment cache during queries (default is true). #37141
Resolved the issue of not being able to import a large amount of data during schema changes in compute-storage decoupled mode. #39558
Support for adding multiple follower roles of FE in compute-storage decoupled mode. #38388
Support for using memory as file cache to accelerate queries in environments with no disks or low-performance HDDs. #38811

Lakehouse

New Lakesoul Catalog has been added. Apache Doris Docs
A new system table catalog_meta_cache_statistics has been added to view the usage of various metadata caches in external catalog. #40155

Query Optimizer

Support for is [not] true/false expressions. #38623

Query Execution

A new CRC32 function has been added. #38204
New aggregate functions skew and kurt have been added. #41277
Profiles are now persisted to the FE's disk to retain more profiles. #33690
A new system table workload_group_privileges has been added to view permission information related to workload groups. #38436
A new system table workload_group_resource_usage has been added to monitor resource statistics of workload groups. #39177
Workload groups now support limiting reads of local IO and remote IO. #39012
Workload groups now support cgroupv2 to limit CPU usage. #39374
A new system table information_schema.partitions has been added to view some table creation attributes. #40636

Other

Support for using the SHOW statement to display BE's configuration information, such as SHOW BACKEND CONFIG LIKE ${pattern}. #36525

Improvements

Load

Improved the import efficiency of routine load when encountering frequent EOFs from Kafka. #39975
The stream load result now includes the time taken to read HTTP data, ReceiveDataTimeMs, which can quickly determine slow stream load issues caused by network reasons. #40735
Optimized the routine load timeout logic to avoid frequent timeouts during inverted index and mow writes. #40818

Storage

Support for batch addition of partitions. #37114

Compute-Storage Decoupled

Added the meta-service HTTP interface /MetaService/http/show_meta_ranges to facilitate the statistics of KV distribution in FDB. #39208
The meta-service/recycler stop script ensures that the process fully exits before returning. #40218
Support for using the session variable version_comment (Cloud Mode) to display the current deployment mode as compute-storage decoupled. #38269
Fixed the detailed message returned when transaction submission fails. #40584
Support for using one meta-service process to provide both metadata services and data recycling services. #40223
Optimized the default configuration of file_cache to avoid potential issues when not set. #41421 #41507
Improved query performance by batch retrieving the version of multiple partitions. #38949
Delayed the redistribution of tablets to avoid query performance issues caused by temporary network fluctuations. #40371
Optimized the read-write lock logic in the balance. #40633
Enhanced the robustness of file cache in handling TTL filenames during restarts/crashes. #40226
Added the BE HTTP interface /api/file_cache?op=hash to facilitate the calculation of the hash file names of segment files on disk. #40831
Optimized the unified naming to be compatible with using compute group to represent BE groups (original cloud cluster). #40767
Optimized the waiting time for obtaining locks when calculating delete bitmaps in primary key tables. #40341
When there are many delete bitmaps in primary key tables, optimized the high CPU consumption during queries by pre-merging multiple delete bitmaps. #40204
Support for managing FE/BE nodes in compute-storage decoupled mode through SQL statements, hiding the logic of direct interaction with meta-service when deploying in compute-storage decoupled mode. #40264
Added a script for rapid deployment of FDB. #39803
Optimized the output of SHOW CACHE HOTSPOT to unify the column name style with other SHOW statements. #41322
When using a storage vault as the storage backend, disallowed the use of latest_fs() to avoid binding different storage backends to the same table. #40516
Optimized the timeout strategy for calculating delete bitmaps when importing mow tables. #40562 #40333
The enable_file_cache in be.conf is now enabled by default in compute-storage decoupled mode. #41502

Lakehouse

When reading tables in CSV format, support for the session keep_carriage_return setting to control the reading behavior of the \r symbol. #39980
The default maximum memory of BE's JVM has been adjusted to 2GB (affecting only new deployments). #41403
Hive Catalog has added hive.recursive_directories_table and hive.ignore_absent_partitions properties to specify whether to recursively traverse data directories and whether to ignore missing partitions. #39494
Optimized the Catalog refresh logic to avoid generating a large number of connections during refresh. #39205
SHOW CREATE DATABASE and SHOW CREATE TABLE for external data sources now display location information. #39179
The new optimizer supports inserting data into JDBC external tables using the INSERT INTO statement. #41511
MaxCompute Catalog now supports complex data types. #39259
Optimized the logic for reading and merging data shards of external tables. #38311
Optimized some refresh strategies for metadata caches of external tables. #38506
Paimon tables now support pushing down IN/NOT IN predicates. #38390
Compatible with tables created in Parquet format by Paimon version 0.9. #41020

Asynchronous Materialized Views

Building asynchronous materialized views now supports the use of both immediate and starttime. #39573
Asynchronous materialized views based on external tables will refresh the metadata cache of the external tables before refreshing the materialized views, ensuring construction based on the latest external table data. #38212
Partition incremental construction now supports rolling up according to weekly and quarterly granularities. #39286

Query Optimizer

The aggregate function GROUP_CONCAT now supports the use of both DISTINCT and ORDER BY. #38080
Optimized the collection and use of statistical information, as well as the logic for estimating row counts and cost calculations, to generate more efficient and stable execution plans.
Window function partition data pre-filtering now supports cases containing multiple window functions. #38393

Query Execution

Reduced query latency by running prepare pipeline tasks in parallel. #40874
Display Catalog information in Profile. #38283
Optimized the computational performance of IN filtering conditions. #40917
Supported cgroupv2 in K8S to limit Doris's memory usage. #39256
Optimized the performance of converting strings to datetime types. #38385
When a string is a decimal number, support casting it to an int, which will be more compatible with certain behaviors of MySQL. #38847

Semi-Structured Data Management

Optimized the performance of inverted index matching. #41122
Temporarily prohibited the creation of inverted indexes with tokenization on arrays. #39062
explode_json_array now supports binary JSON types. #37278
IP data types now support bloomfilter indexes. #39253
IP data types now support row storage. #39258
Nested data types such as ARRAY, MAP, and STRUCT now support schema changes. #39210
When creating MTMV, automatically truncate KEYs encountered in VARIANT data types. #39988
Lazy loading of inverted indexes during queries to improve performance. #38979
add inverted index file size for open file. #37482
Reduced access to object storage interfaces during compaction to improve performance. #41079
Added three new query profile metrics related to inverted indexes. #36696
Reduced cache overhead for non-PreparedStatement SQL to improve performance. #40910
Pre-warming cache now supports inverted indexes. #38986
Inverted indexes are now cached immediately after writing. #39076

Compatibility

Fixed the issue of Thrift ID incompatibility on the master with branch-2.1. #41057

Other

BE HTTP API now supports authentication; set config::enable_all_http_auth to true (default is false) when authentication is required. #39577
Optimized the user permissions required for the REFRESH operation. Permissions have been relaxed from ALTER to SHOW. #39008
Reduced the range of nextId when calling advanceNextId(). #40160
Optimized the caching mechanism for Java UDFs. #40404

Bug Fixes

Load

Fixed the issue where abortTransaction did not handle return codes. #41275
Fixed the issue where transactions failed to commit or abort in compute-storage decoupled mode without calling afterCommit/afterAbort. #41267
Fixed the issue where Routine Load could not work properly when modifying consumer offsets in compute-storage decoupled mode. #39159
Fixed the issue of repeatedly closing file handles when obtaining error log file paths. #41320
Fixed the issue of incorrect job progress caching for Routine Load in compute-storage decoupled mode. #39313
Fixed the issue where Routine Load could get stuck when failing to commit transactions in compute-storage decoupled mode. #40539
Fixed the issue where Routine Load kept reporting data quality check errors in compute-storage decoupled mode. #39790
Fixed the issue where Routine Load did not check transactions before committing in compute-storage decoupled mode. #39775
Fixed the issue where Routine Load did not check transactions before aborting in compute-storage decoupled mode. #40463
Fixed the issue where cluster keys did not support certain data types. #38966
Fixed the issue of transactions being repeatedly committed. #39786
Fixed the issue of use after free with WAL when BE exits. #33131
Fixed the issue where WAL playback did not skip completed import transactions in compute-storage decoupled mode. #41262
Fixed the logic for selecting BE in group commit in compute-storage decoupled mode. #39986 #38644
Fixed the issue where BE might crash when group commit was enabled for insert into. #39339
Fixed the issue where insert into with group commit enabled might get stuck. #39391
Fixed the issue where not enabling the group commit option during import might result in a table not found error. #39731
Fixed the issue of transaction submission timeouts due to too many tablets. #40031
Fixed the issue of concurrent opens with Auto Partition. #38605
Fixed the issue of import lock granularity being too large. #40134
Fixed the issue of coredumps caused by zero-length varchars. #40940
Fixed the issue of incorrect index Id values in log prints. #38790
Fixed the issue of memtable shifting not closing BRPC streaming. #40105
Fixed the issue of inaccurate bvar statistics during memtable shifting. #39075
Fixed the issue of multi-replication fault tolerance during memtable shifting. #38003
Fixed the issue of incorrect message length calculations for Routine Load with multiple tables in one stream. #40367
Fixed the issue of inaccurate progress reporting for Broker Load. #40325
Fixed the issue of inaccurate data scan volume reporting for Broker Load. #40694
Fixed the issue of concurrency with Routine Load in compute-storage decoupled mode. #39242
Fixed the issue of Routine Load jobs being canceled in compute-storage decoupled mode. #39514
Fixed the issue of progress not being reset when deleting Kafka topics. #38474
Fixed the issue of updating progress during transaction state transitions in Routine Load. #39311
Fixed the issue of Routine Load switching from a paused state to a paused state. #40728
Fixed the issue of Stream Load records being missed due to database deletion. #39360

Storage

Fixed the issue of missing storage policies. #38700
Fixed the issue of errors during cross-version backup and recovery. #38370
Fixed the NPE issue with ccr binlog. #39909
Fixed potential issues with duplicate keys in mow. #41309 #39791 #39958 #38369 #38331
Fixed the issue of not being able to write after backup and recovery in high-frequency write scenarios. #40118 #38321
Fixed the issue of data errors potentially triggered by deleting empty strings and schema changes. #41064
Fixed the issue of incorrect statistics due to column updates. #40880
Limited the size of tablet meta pb to prevent BE crashes due to oversized meta. #39455
Fixed the potential column misalignment issue with the new optimizer in begin; insert into values; commit. #39295

Compute-Storage Decoupled

Fixed the issue where the tablet distribution might be inconsistent across multiple FEs in compute-storage decoupled mode. #41458
Fixed the issue where TVF might not work in multi-computing group environments. #39249
Fixed the issue where compaction used resources that had already been released when BE exited in compute-storage decoupled mode. #39302
Fixed the issue where automatic start-stop might cause FE replay to get stuck. #40027
Fixed the issue where the BE status and the stored status in meta-service were inconsistent. #40799
Fixed the issue where the FE->meta-service connection pool could not automatically expire and reconnect. #41202 #40661
Fixed the issue where some tablets might repeatedly undergo unexpected balance processes during rebalance. #39792
Fixed the issue where storage vault permissions were lost after FE restarted. #40260
Fixed the issue where tablet row counts and other statistical information might be incomplete due to FDB scan range pagination. #40494
Fixed the performance issue caused by a large number of aborted transactions associated with the same label. #40606
Fixed the issue where commit_txn did not automatically re-enter, maintaining consistent behavior between compute-storage decoupled and integrated modes. #39615
Fixed the issue where the number of projected columns increased when dropping columns. #40187
Fixed the issue where delete statements did not correctly handle return values, causing data to still be visible after deletion. #39428
Fixed the coredump issue caused by rowset metadata competition during file cache preheating. #39361
Fixed the issue where the entire cache space would be used up when TTL cache enabled LRU eviction. #39814
Fixed the issue where temporary files could not be recycled when importing commit rowset failed with HDFS storage backend. #40215

Lakehouse

Fixed some issues with predicate pushdown in JDBC Catalog. #39064
Fixed the issue of not being able to read when S``TRUCT type columns are missing in Parquet format. #38718
Fixed the issue of FileSystem leaks on the FE side in some cases. #38610
Fixed the issue of metadata cache information being inconsistent when Hive/Iceberg tables write back in some cases. #40729
Fixed the issue of unstable partition ID generation for external tables in some cases. #39325
Fixed the issue of external table queries selecting BE nodes in the blacklist in some cases. #39451
Optimized the timeout time for batch retrieval of external table partition information to avoid long-term thread occupation. #39346
Fixed the issue of memory leaks when querying Hudi tables in some cases. #41256
Fixed the issue of connection pool connection leaks in JDBC Catalog in some cases. #39582
Fixed the issue of BE memory leaks in JDBC Catalog in some cases. #41041
Fixed the issue of not being able to query Hudi data on Alibaba Cloud OSS. #41316
Fixed the issue of not being able to read empty partitions in MaxCompute. #40046
Fixed the issue of poor performance when querying Oracle through JDBC Catalog. #41513
Fixed the issue of BE crashes when querying deletion vector of Paimon tables after enabling file cache features. #39877
Fixed the issue of not being able to access Paimon tables on HDFS clusters with HA enabled. #39806
Temporarily disabled the page index filtering feature of Parquet to avoid potential issues. #38691
Fixed the issue of not being able to read unsigned types in Parquet files. #39926
Fixed the issue of potential infinite loops when reading Parquet files in some cases. #39523

Asynchronous Materialized Views

Fixed the issue where partition construction might select the wrong table to track partitions if both sides have the same column names. #40810
Fixed the issue where transparent rewrite partition compensation might result in incorrect results. #40803
Fixed the issue where transparent rewrite did not take effect on external tables. #38909
Fixed the issue where nested materialized views might not refresh properly. #40433

Synchronous Materialized Views

Fixed the issue where creating synchronous materialized views on MOW tables might result in incorrect query results. #39171

Query Optimizer

Fixed the issue where existing synchronous materialized views might not be usable after upgrading. #41283
Fixed the issue of not correctly handling milliseconds when comparing datetime literals. #40121
Fixed the issue of potential errors in conditional function partition pruning. #39298
Fixed the issue where MOW tables with synchronous materialized views could not perform delete operations. #39578
Fixed the issue where the nullable of slots in JDBC external table query predicates might be incorrectly planned, causing query errors. #41014

Query Execution

Fixed the memory leak issue caused by the use of runtime filters. #39155
Fixed the issue of excessive memory usage by window functions. #39581
Fixed a series of function compatibility issues during rolling upgrades. #41023 #40438 #39648
Fixed the issue of incorrect results with encryption_function when used with constants. #40201
Fixed the issue of errors when importing single-table materialized views. #39061
Fixed the issue of incorrect partition result calculations for window functions. #39100 #40761
Fixed the issue of incorrect calculations for topn when null values are present. #39497
Fixed the issue of incorrect results with the map_agg function. #39743
Fixed the issue of incorrect messages returned by cancel. #38982
Fixed the issue of BE core dumps caused by encrypt and decrypt functions. #40726
Fixed the issue of queries getting stuck due to too many scanners in high-concurrency scenarios. #40495
Supported time types in runtime filters. #38258
Fixed the issue of incorrect results with window funnel functions. #40960

Semi-Structured Data Management

Fixed the issue of match function errors when no indexes were present. #38989
Fixed the issue of crashes when ARRAY data types were used as parameters for array_min/array_max functions. #39492
Fixed the issue of nullable with the array_enumerate_uniq function. #38384
Fixed the issue of bloomfilter indexes not being updated when adding or deleting columns. #38431
Fixed the issue of es-catalog parsing exceptions with array data. #39104
Fixed the issue of improper predicate push-down in es-catalog. #40111
Fixed the issue of exceptions caused by modifying input data withmap() and struct() functions. #39699
Fixed the issue of index compaction crashes in special cases. #40294
Fixed the issue of ARRAY type inverted indexes missing nullbitmaps. #38907
Fixed the issue of incorrect results with the count() function on inverted indexes. #41152
Fixed the issue of correct results with the explode_map function when using aliases. #39757
Fixed the issue of VARIANT type not being able to use row storage for exceptional JSON data. #39394
Fixed the issue of memory leaks when returning ARRAY results with VARIANT type. #41358
Fixed the issue of changing column names with VARIANT type. #40320
Fixed the issue of potential precision loss when converting VARIANT type to DECIMAL type. #39650
Fixed the issue of nullable handling with VARIANT type. #39732
Fixed the issue of sparse column reading with VARIANT type. #40295

Other

Fixed the compatibility issue between new and old audit log plugins. #41401
Fixed the issue where users could see processes of others in certain cases. #39747
Fixed the issue where users with permissions could not export. #38365
Fixed the issue where create table like required create permissions for the existing table. #37879
Fixed the issue where some features did not verify permissions. #39726
Fixed the issue of not correctly closing connections when using SSL. #38587
Fixed the issue where executing ALTER VIEW operations in some cases caused FE to fail to start. #40872

Behavioral Changes​

Storage​

Other​

New Features​

Storage​

Compute-Storage Decoupled​

Lakehouse​

Query Optimizer​

Query Execution​

Other​

Improvements​

Load​

Storage​

Compute-Storage Decoupled​

Lakehouse​

Asynchronous Materialized Views​

Query Optimizer​

Query Execution​

Semi-Structured Data Management​

Compatibility​

Other​

Bug Fixes​

Load​

Storage​

Compute-Storage Decoupled​

Lakehouse​

Asynchronous Materialized Views​

Synchronous Materialized Views​

Query Optimizer​

Query Execution​

Semi-Structured Data Management​

Other​

Behavioral Changes

Storage

Other

New Features

Storage

Compute-Storage Decoupled

Lakehouse

Query Optimizer

Query Execution

Other

Improvements

Load

Storage

Compute-Storage Decoupled

Lakehouse

Asynchronous Materialized Views

Query Optimizer

Query Execution

Semi-Structured Data Management

Compatibility

Other

Bug Fixes

Load

Storage

Compute-Storage Decoupled

Lakehouse

Asynchronous Materialized Views

Synchronous Materialized Views

Query Optimizer

Query Execution

Semi-Structured Data Management

Other