Skip to main content
Skip to main content

Unique Key Model

When users have data update requirements, they can choose to use the Unique data model. The unique model can ensure the uniqueness of the primary key. When a user updates a piece of data, the newly written data will overwrite the old data with the same primary key.

The Unique data model provides two implementation methods:

  • Merge-on-read. In the merge-on-read implementation, users will not trigger any data deduplication-related operations when writing data. All data deduplication operations are performed during queries or compaction. Therefore, the write performance of merge-on-read is better, the query performance is poor, and the memory consumption is also higher.
  • Merge-on-write. In version 1.2, we introduced the merge-on-write implementation, which completes all data deduplication tasks during the data writing phase, thus providing excellent query performance. Since version 2.0, merge-on-write has become very mature and stable. Due to its excellent query performance, we recommend most users to choose this implementation. Since version 2.1, merge-on-write has become the default implementation for the Unique model.

The default update semantics of the Unique model is a full-row UPSERT, which stands for UPDATE OR INSERT. If the key of the row data exists, an update will be performed; if it does not exist, new data will be inserted. Under the full-row UPSERT semantics, even if the user uses INSERT INTO to specify partial columns for writing, Doris will fill the unprovided columns with NULL values or default values in the Planner.

If a user wishes to update partial fields, they need to use the merge-on-write implementation and enable support for partial column updates through specific parameters. Please refer to the Data Operate/Data Update section for more details.

Let's look at how to create a Unique model table with merge-on-read and merge-on-write using a typical user basic information table as an example. This table does not have aggregation requirements and only needs to ensure the uniqueness of the primary key (The primary key is user_id + username).

ColumnNameTypeIsKeyComment
user_idBIGINTYesUser ID
usernameVARCHAR (50)YesUsername
cityVARCHAR (20)NoUser location city
ageSMALLINTNoUser age
sexTINYINTNoUser gender
phoneLARGEINTNoUser phone number
addressVARCHAR (500)NoUser address
register_timeDATETIMENoUser registration time

Merge-on-Read

The table creation statement for Merge-on-read is as follows:

CREATE TABLE IF NOT EXISTS example_tbl_unique
(
`user_id` LARGEINT NOT NULL COMMENT "User ID",
`username` VARCHAR(50) NOT NULL COMMENT "Username",
`city` VARCHAR(20) COMMENT "User location city",
`age` SMALLINT COMMENT "User age",
`sex` TINYINT COMMENT "User gender",
`phone` LARGEINT COMMENT "User phone number",
`address` VARCHAR(500) COMMENT "User address",
`register_time` DATETIME COMMENT "User registration time"
)
UNIQUE KEY(`user_id`, `username`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

Merge-on-Write

The table creation statement for Merge-on-write is as follows:

CREATE TABLE IF NOT EXISTS example_tbl_unique_merge_on_write
(
`user_id` LARGEINT NOT NULL COMMENT "User ID",
`username` VARCHAR(50) NOT NULL COMMENT "Username",
`city` VARCHAR(20) COMMENT "User location city",
`age` SMALLINT COMMENT "User age",
`sex` TINYINT COMMENT "User gender",
`phone` LARGEINT COMMENT "User phone number",
`address` VARCHAR(500) COMMENT "User address",
`register_time` DATETIME COMMENT "User registration time"
)
UNIQUE KEY(`user_id`, `username`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"enable_unique_key_merge_on_write" = "true"
);

Users need to add the property with enable_unique_key_merge_on_write" = "true" when creating the table to enable Merge-on-write.

"enable_unique_key_merge_on_write" = "true"
caution

In version 2.1, merge-on-write will be the default method for the primary key model.

For new users, it is highly recommended to use version 2.0 or above. In version 2.0, the performance and stability of merge-on-write have been significantly improved and optimized.

For users of version 1.2:

  • It is recommended to use version 1.2.4 or above, which fixes some bugs and stability issues.
  • Add the configuration item disable_storage_page_cache=false to be.conf. Failing to add this configuration item may have a significant impact on data import performance.

Use attention

  • The implementation of the Unique model can only be determined during table creation and cannot be modified through schema changes.
  • The Merge-on-read table cannot be seamlessly upgraded to the Merge-on-write table (due to completely different data organization methods). If you need to switch to Merge-on-write, you must manually perform an INSERT INTO unique-mow-table SELECT * FROM source_table to re-import the data.
  • Whole-row Updates: The default update semantics for the Unique model is a whole-row UPSERT, which stands for UPDATE OR INSERT. If the key for a row of data exists, it will be updated; if it does not exist, new data will be inserted. Under the whole-row UPSERT semantics, even if the user specifies only certain columns for insertion using INSERT INTO, Doris will fill in the unprovided columns with NULL values or default values during the planning phase.
  • Partial Column Updates: If the user wishes to update only certain fields, they must use Merge-on-write and enable support for partial column updates through specific parameters. Please refer to the documentation on partial column updates for relevant usage recommendations.