Updating Data on Aggregate Key Model
Update for Aggregate Load
This guide is about ingestion-based data updates for the Aggregate Key model in Doris.
Update all columns
When importing data into an Aggregate Key model in Doris by methods like Stream Load, Broker Load, Routine Load, and Insert Into, the new values are combined with the old values to produce new aggregated values based on the column's aggregation function. These values might be generated during insertion or produced asynchronously during compaction. However, when querying, users will always receive the same returned values.
Partial column update for Aggregate Key model
Tables in the Aggregate Key model are primarily used in cases with pre-aggregation requirements rather than data updates, but Doris allows partial column updates for them, too. Simply set the aggregation function to REPLACE_IF_NOT_NULL
.
Create table
For the columns that need to be updated, set the aggregation function to REPLACE_IF_NOT_NULL
.
CREATE TABLE order_tbl (
order_id int(11) NULL,
order_amount int(11) REPLACE_IF_NOT_NULL NULL,
order_status varchar(100) REPLACE_IF_NOT_NULL NULL
) ENGINE=OLAP
AGGREGATE KEY(order_id)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(order_id) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
+----------+--------------+-----------------+
| order_id | order_amount | order_status |
+----------+--------------+-----------------+
| 1 | 100 | Pending Payment |
+----------+--------------+-----------------+
1 row in set (0.01 sec)
Ingest data
For Stream Load, Broker Load, Routine Load, or INSERT INTO, you can directly write the updates to the fields.
Example
Using the same example as above, the corresponding Stream Load command would be (no additional headers required):
curl --location-trusted -u root: -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:48037/api/db1/order_tbl/_stream_load
The corresponding INSERT INTO
statement would be (no additional session variables required):
INSERT INTO order_tbl (order_id, order_status) values (1,'Delivery Pending');
Note
The Aggregate Key model does not perform additional data processing during data writing, so the writing performance in this model is the same as other models. However, aggregation during queries can result in performance loss. Typical aggregation queries can be 5~10 times slower than queries on Merge-on-Write tables in the Unique Key model.
Under this circumstance, users cannot set a field from non-NULL to NULL, because NULL values written will be automatically neglected by the REPLACE_IF_NOT_NULL aggregation function.