内存超限错误分析
当查询或导入报错Memory limit exceeded
时,可能的原因:进程内存超限、系统剩余可用内存不足、超过单次查询执行的内存上限。
ERROR 1105 (HY000): errCode = 2, detailMessage = Memory limit exceeded:<consuming tracker:<xxx>, xxx. backend 172.1.1.1 process memory used xxx GB, limit xxx GB. If query tracker exceed, `set exec_mem_limit=8G` to change limit, details mem usage see be.INFO.
自 Doris 1.2 版本开始支持该功能
有关内存管理的更多介绍参考文档:Say-Goodbye-to-OOM-Crashes-en, Say-Goodbye-to-OOM-Crashes-zh-CN
进程内存超限 OR 系统剩余可用内存不足
当返回如下报错时,说明进程内存超限,或者系统剩余可用内存不足,具体原因看内存统计值。
ERROR 1105 (HY000): errCode = 2, detailMessage = Memory limit exceeded:<consuming tracker:<Query#Id=3c88608cf35c461d-95fe88969aa6fc30>, process memory used 2.68 GB exceed limit 2.47 GB or sys mem available 50.95 GB less than low water mark 3.20 GB, failed alloc size 2.00 MB>, executing msg:<execute:<ExecNode:VAGGREGATION_NODE (id=7)>>. backend 172.1.1.1 process memory used 2.68 GB, limit 2.47 GB. If query tracker exceed, `set exec_mem_limit=8G` to change limit, details mem usage see be.INFO
错误信息分析
错误信息分为三部分:
Memory limit exceeded:<consuming tracker:<Query#Id=3c88608cf35c461d-95fe88969aa6fc30>
:当前正在执行 query3c88608cf35c461d-95fe88969aa6fc30
的内存申请过程中发现内存超限。process memory used 2.68 GB exceed limit 2.47 GB or sys mem available 50.95 GB less than low water mark 3.20 GB, failed alloc size 2.00 MB
:超限的原因是 BE 进程使用的内存 2.68GB 超过了 2.47GB 的 limit,limit 的值来自 be.conf 中的 mem_limit * system MemTotal,默认等于操作系统总内存的 80%,当前操作系统剩余可用内存 50.95 GB 仍高于最低水位 3.2GB,本次尝试申请 2MB 的内存。executing msg:<execute:<ExecNode:VAGGREGATION_NODE (id=7)>>, backend 172.24.47.117 process memory used 2.68 GB, limit 2.47 GB
:本次内存申请的位置是ExecNode:VAGGREGATION_NODE (id=7)>
,当前 BE 节点的 IP 是 172.1.1.1,以及再次打印 BE 节点的内存统计。
日志分析
同时可以在 log/be.INFO 中找到如下日志,确认当前进程内存使用是否符合预期,日志同样分为三部分:
Process Memory Summary
:进程内存统计。Alloc Stacktrace
:触发内存超限检测的栈,这不一定是大内存申请的位置。Memory Tracker Summary
:进程 memory tracker 统计,参考 Memory Tracker 分析使用内存的位置。
注意:
进程内存超限日志的打印间隔是 1s,进程内存超限后,BE 大多数位置的内存申请都会感知,并尝试做出预定的回调方法,并打印进程内存超限日志,所以如果日志中 Try Alloc 的值很小,则无须关注
Alloc Stacktrace
,直接分析Memory Tracker Summary
即可。当进程内存超限后,BE 会触发内存 GC。
W1127 17:23:16.372572 19896 mem_tracker_limiter.cpp:214] System Mem Exceed Limit Check Failed, Try Alloc: 1062688
Process Memory Summary:
process memory used 2.68 GB limit 2.47 GB, sys mem available 50.95 GB min reserve 3.20 GB, tc/jemalloc allocator cache 51.97 MB
Alloc Stacktrace:
@ 0x50028e8 doris::MemTrackerLimiter::try_consume()
@ 0x50027c1 doris::ThreadMemTrackerMgr::flush_untracked_mem<>()
@ 0x595f234 malloc
@ 0xb888c18 operator new()
@ 0x8f316a2 google::LogMessage::Init()
@ 0x5813fef doris::FragmentExecState::coordinator_callback()
@ 0x58383dc doris::PlanFragmentExecutor::send_report()
@ 0x5837ea8 doris::PlanFragmentExecutor::update_status()
@ 0x58355b0 doris::PlanFragmentExecutor::open()
@ 0x5815244 doris::FragmentExecState::execute()
@ 0x5817965 doris::FragmentMgr::_exec_actual()
@ 0x581fffb std::_Function_handler<>::_M_invoke()
@ 0x5a6f2c1 doris::ThreadPool::dispatch_thread()
@ 0x5a6843f doris::Thread::supervise_thread()
@ 0x7feb54f931ca start_thread
@ 0x7feb5576add3 __GI___clone
@ (nil) (unknown)
Memory Tracker Summary:
Type=consistency, Used=0(0 B), Peak=0(0 B)
Type=batch_load, Used=0(0 B), Peak=0(0 B)
Type=clone, Used=0(0 B), Peak=0(0 B)
Type=schema_change, Used=0(0 B), Peak=0(0 B)
Type=compaction, Used=0(0 B), Peak=0(0 B)
Type=load, Used=0(0 B), Peak=0(0 B)
Type=query, Used=206.67 MB(216708729 B), Peak=565.26 MB(592723181 B)
Type=global, Used=930.42 MB(975614571 B), Peak=1017.42 MB(1066840223 B)
Type=tc/jemalloc_cache, Used=51.97 MB(54494616 B), Peak=-1.00 B(-1 B)
Type=process, Used=1.16 GB(1246817916 B), Peak=-1.00 B(-1 B)
MemTrackerLimiter Label=Orphan, Type=global, Limit=-1.00 B(-1 B), Used=474.20 MB(497233597 B), Peak=649.18 MB(680718208 B)
MemTracker Label=BufferAllocator, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
MemTracker Label=LoadChannelMgr, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
MemTracker Label=StorageEngine, Parent Label=Orphan, Used=320.56 MB(336132488 B), Peak=322.56 MB(338229824 B)
MemTracker Label=SegCompaction, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
MemTracker Label=SegmentMeta, Parent Label=Orphan, Used=948.64 KB(971404 B), Peak=943.64 KB(966285 B)
MemTracker Label=TabletManager, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
MemTrackerLimiter Label=DataPageCache, Type=global, Limit=-1.00 B(-1 B), Used=455.22 MB(477329882 B), Peak=454.18 MB(476244180 B)
MemTrackerLimiter Label=IndexPageCache, Type=global, Limit=-1.00 B(-1 B), Used=1.00 MB(1051092 B), Peak=0(0 B)
MemTrackerLimiter Label=SegmentCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
MemTrackerLimiter Label=DiskIO, Type=global, Limit=2.47 GB(2655423201 B), Used=0(0 B), Peak=0(0 B)
MemTrackerLimiter Label=ChunkAllocator, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
MemTrackerLimiter Label=LastestSuccessChannelCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
MemTrackerLimiter Label=DeleteBitmap AggCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
系统剩余可用内存计算
当错误信息中系统可用内存小于低水位线时,同样当作进程内存超限处理,其中系统可用内存的值来自于/proc/meminfo
中的MemAvailable
,当MemAvailable
不足时继续内存申请可能返回 std::bad_alloc 或者导致 BE 进程 OOM,因为刷新进程内存统计和 BE 内存 GC 都具有一定的滞后性,所以预留小部分内存 buffer 作为低水位线,尽可能避免 OOM。
其中MemAvailable
是操作系统综合考虑当前空闲的内存、buffer、cache、内存碎片等因素给出的一个在尽可能不触发 swap 的情况下可以提供给用户进程使用的内存总量,一个简单的计算公式:MemAvailable = MemFree - LowWaterMark + (PageCache - min(PageCache / 2, LowWaterMark)),和 cmd free
看到的available
值相同,具体可参考:
https://serverfault.com/questions/940196/why-is-memavailable-a-lot-less-than-memfreebufferscached
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34e431b0ae398fc54ea69ff85ec700722c9da773
低水位线默认最大 1.6G,根据MemTotal
、vm/min_free_kbytes
、confg::mem_limit
、config::max_sys_mem_available_low_water_mark_bytes
共同算出,并避免浪费过多内存。其中MemTotal
是系统总内存,取值同样来自/proc/meminfo
;vm/min_free_kbytes
是操作系统给内存 GC 过程预留的 buffer,取值通常在 0.4% 到 5% 之间,某些云服务器上vm/min_free_kbytes
可能为 5%,这会导致直观上系统可用内存比真实值少;调大config::max_sys_mem_available_low_water_mark_bytes
将在大于 16G 内存的机器上,为 Full GC 预留更多的内存 buffer,反之调小将尽可能充分使用内存。
查询或导入单次执行内存超限
当返回如下报错时,说明超过单次执行内存限制。
ERROR 1105 (HY000): errCode = 2, detailMessage = Memory limit exceeded:<consuming tracker:<Query#Id=f78208b15e064527-a84c5c0b04c04fcf>, failed alloc size 1.03 MB, exceeded tracker:<Query#Id=f78208b15e064527-a84c5c0b04c04fcf>, limit 100.00 MB, peak used 99.29 MB, current used 99.25 MB>, executing msg:<execute:<ExecNode:VHASH_JOIN_NODE (id=4)>>. backend 172.24.47.117 process memory used 1.13 GB, limit 98.92 GB. If query tracker exceed, `set exec_mem_limit=8G` to change limit, details mem usage see log/be.INFO.
错误信息分析
错误信息分为三部分:
Memory limit exceeded:<consuming tracker:<Query#Id=f78208b15e064527-a84c5c0b04c04fcf>
:当前正在执行 queryf78208b15e064527-a84c5c0b04c04fcf
的内存申请过程中发现内存超限。failed alloc size 1.03 MB, exceeded tracker:<Query#Id=f78208b15e064527-a84c5c0b04c04fcf>, limit 100.00 MB, peak used 99.29 MB, current used 99.25 MB
:本次尝试申请 1.03MB 的内存,但此时 queryf78208b15e064527-a84c5c0b04c04fcf
memory tracker 的当前 consumption 为 99.28MB 加上 1.03MB 后超过了 100MB 的 limit,limit 的值来自 session variables 中的exec_mem_limit
,默认 4G。executing msg:<execute:<ExecNode:VHASH_JOIN_NODE (id=4)>>. backend 172.24.47.117 process memory used 1.13 GB, limit 98.92 GB. If query tracker exceed,
set exec_mem_limit=8Gto change limit, details mem usage see be.INFO.
:本次内存申请的位置是VHASH_JOIN_NODE (id=4)
,并提示可通过set exec_mem_limit
来调高单次查询的内存上限。
日志分析
set global enable_profile=true
后,可以在单次查询内存超限时,在 log/be.INFO 中打印日志,用于确认当前查询内存使用是否符合预期。
同时可以在 log/be.INFO 中找到如下日志,确认当前查询内存使用是否符合预期,日志同样分为三部分:
Process Memory Summary
:进程内存统计。Alloc Stacktrace
:触发内存超限检测的栈,这不一定是大内存申请的位置。Memory Tracker Summary
:当前查询的 memory tracker 统计,可以看到查询每个算子当前使用的内存和峰值,具体可参考 Memory Tracker。 注意:一个查询在内存超限后只会打印一次日志,此时查询的多个线程都会感知,并尝试等待内存释放,或者 cancel 当前查询,如果日志中 Try Alloc 的值很小,则无须关注Alloc Stacktrace
,直接分析Memory Tracker Summary
即可。
W1128 01:34:11.016165 357796 mem_tracker_limiter.cpp:191] Memory limit exceeded:<consuming tracker:<Query#Id=78208b15e064527-a84c5c0b04c04fcf>, failed alloc size 4.00 MB, exceeded tracker:<Query#Id=78208b15e064527-a84c5c0b04c04fcf>, limit 100.00 MB, peak used 98.59 MB,
current used 96.88 MB>, executing msg:<execute:<ExecNode:VHASH_JOIN_NODE (id=2)>>. backend 172.24.47.117 process memory used 1.13 GB, limit 98.92 GB. If query tracker exceed, `set exec_mem_limit=8G` to change limit, details mem usage see be.INFO.
Process Memory Summary:
process memory used 1.13 GB limit 98.92 GB, sys mem available 45.15 GB min reserve 3.20 GB, tc/jemalloc allocator cache 27.62 MB
Alloc Stacktrace:
@ 0x66cf73a doris::vectorized::HashJoinNode::_materialize_build_side()
@ 0x69cb1ee doris::vectorized::VJoinNodeBase::open()
@ 0x66ce27a doris::vectorized::HashJoinNode::open()
@ 0x5835dad doris::PlanFragmentExecutor::open_vectorized_internal()
@ 0x58351d2 doris::PlanFragmentExecutor::open()
@ 0x5815244 doris::FragmentExecState::execute()
@ 0x5817965 doris::FragmentMgr::_exec_actual()
@ 0x581fffb std::_Function_handler<>::_M_invoke()
@ 0x5a6f2c1 doris::ThreadPool::dispatch_thread()
@ 0x5a6843f doris::Thread::supervise_thread()
@ 0x7f6faa94a1ca start_thread
@ 0x7f6fab121dd3 __GI___clone
@ (nil) (unknown)
Memory Tracker Summary:
MemTrackerLimiter Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Type=query, Limit=100.00 MB(104857600 B), Used=64.75 MB(67891182 B), Peak=104.70 MB(109786406 B)
MemTracker Label=Scanner#QueryId=78208b15e064527-a84c5c0b04c04fcf, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
MemTracker Label=RuntimeFilterMgr, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=2.09 KB(2144 B), Peak=0(0 B)
MemTracker Label=BufferedBlockMgr2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
MemTracker Label=ExecNode:VHASH_JOIN_NODE (id=2), Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=-61.44 MB(-64426656 B), Peak=290.33 KB(297296 B)
MemTracker Label=ExecNode:VEXCHANGE_NODE (id=9), Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=6.12 KB(6264 B), Peak=5.84 KB(5976 B)
MemTracker Label=VDataStreamRecvr:78208b15e064527-a84c5c0b04c04fd2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=6.12 KB(6264 B), Peak=5.84 KB(5976 B)
MemTracker Label=ExecNode:VEXCHANGE_NODE (id=10), Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=-41.20 MB(-43198024 B), Peak=1.46 MB(1535656 B)
MemTracker Label=VDataStreamRecvr:78208b15e064527-a84c5c0b04c04fd2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=-41.20 MB(-43198024 B), Peak=1.46 MB(1535656 B)
MemTracker Label=VDataStreamSender:78208b15e064527-a84c5c0b04c04fd2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=2.34 KB(2400 B), Peak=0(0 B)
MemTracker Label=Scanner#QueryId=78208b15e064527-a84c5c0b04c04fcf, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=58.12 MB(60942224 B), Peak=57.41 MB(60202848 B)
MemTracker Label=RuntimeFilterMgr, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
MemTracker Label=BufferedBlockMgr2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
MemTracker Label=ExecNode:VNewOlapScanNode(customer) (id=1), Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=9.55 MB(10013424 B), Peak=10.20 MB(10697136 B)
MemTracker Label=VDataStreamSender:78208b15e064527-a84c5c0b04c04fd1, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=59.80 MB(62701880 B), Peak=59.16 MB(62033048 B)
MemTracker Label=Scanner#QueryId=78208b15e064527-a84c5c0b04c04fcf, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
MemTracker Label=RuntimeFilterMgr, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=13.62 KB(13952 B), Peak=0(0 B)
MemTracker Label=BufferedBlockMgr2, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=0(0 B), Peak=0(0 B)
MemTracker Label=ExecNode:VNewOlapScanNode(lineorder) (id=0), Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=6.03 MB(6318064 B), Peak=4.02 MB(4217664 B)
MemTracker Label=VDataStreamSender:78208b15e064527-a84c5c0b04c04fd0, Parent Label=Query#Id=78208b15e064527-a84c5c0b04c04fcf, Used=2.34 KB(2400 B), Peak=0(0 B)