Skip to main content

Load Memory Analysis

Doris data load is divided into two stages: fragment reading and channel writing. The execution logic of fragment and query fragment is the same, but Stream Load usually has only Scan Operator. Channel mainly writes data to the temporary data structure Memtable, and then Delta Writer compresses the data and writes it to the file.

Load memory view

If you see a large value of Label=load, Type=overview Memory Tracker anywhere, it means that the load memory is used a lot.

MemTrackerLimiter Label=load, Type=overview, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)

The memory load by Doris is divided into two parts. The first part is the memory used by fragment execution, and the second part is the memory used in the construction and flushing process of MemTable.

The Memory Tracker with Label=AllMemTableMemory, Parent Label=DetailsTrackerSet found in the BE web page http://{be_host}:{be_web_server_port}/mem_tracker?type=global is the memory used by all load tasks to construct and flush MemTable on this BE node. When the error process memory exceeds the limit or the available memory is insufficient, this Memory Tracker can also be found in the Memory Tracker Summary in the be.INFO log.

MemTracker Label=AllMemTableMemory, Parent Label=DetailsTrackerSet, Used=25.08 MB(26303456 B), Peak=25.08 MB(26303456 B)

Load Memory Analysis

If the value of ``Label=AllMemTableMemory` is small, the main memory used by the load task is the execution fragment. The analysis method is the same as Query Memory Analysis, so it will not be repeated here.

If the value of Label=AllMemTableMemory is large, MemTable may not be flushed in time. You can consider reducing the values ​​of load_process_max_memory_limit_percent and load_process_soft_mem_limit_percent in be.conf. This can make MemTable flush more frequently, so that fewer MemTables are cached in memory, but the number of files written will increase. If too many small files are written, the pressure of compaction will increase. If compaction is not timely, the metadata memory will increase, the query will slow down, and even the load will report an error after the number of files exceeds the limit.

During the load execution process, check the BE web page /mem_tracker?type=load. According to the values ​​of the two groups of memory trackers Label=MemTableManualInsert and Label=MemTableHookFlush, you can locate LoadID and TabletID with large MemTable memory usage.