云林县网站建设_网站建设公司_Tailwind CSS_seo优化
2026/1/7 10:04:05 网站建设 项目流程

一:背景

1. 讲故事

这是训练营里的一位朋友找到我的,说他们的系统会有偶发的内存暴涨情况,自己也没分析出来,让我帮忙看下怎么回事,拿了一个20G+的dump文件,这文件是够大的,我个人建议一般是不超过10G,不然的话windbg分析起来很吃力。

二:内存暴涨分析

1. 为什么会内存暴涨

还是老办法,使用!address -summary观察提交内存,输出如下:

/* by yours.tools - online tools website : yours.tools/zh/excel2json.html */ 0:000> !address -summary --- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal Free 1870 5ff8`c8447000 ( 95.972 TB) 74.98% <unknown> 1064 2005`7faca000 ( 32.021 TB) 99.98% 25.02% Heap 3594 1`56a34000 ( 5.354 GB) 0.02% 0.00% Image 4747 0`35dfb000 ( 861.980 MB) 0.00% 0.00% Stack 522 0`2b440000 ( 692.250 MB) 0.00% 0.00% Other 314 0`00313000 ( 3.074 MB) 0.00% 0.00% TEB 174 0`0015c000 ( 1.359 MB) 0.00% 0.00% PEB 1 0`00001000 ( 4.000 kB) 0.00% 0.00% --- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal MEM_FREE 1870 5ff8`c8447000 ( 95.972 TB) 74.98% MEM_RESERVE 2326 2001`b95a7000 ( 32.007 TB) 99.93% 25.01% MEM_COMMIT 8090 5`7e602000 ( 21.975 GB) 0.07% 0.02% 0:000> !eeheap -gc Number of GC Heaps: 1 generation 0 starts at 0x0000013e0f5919d8 generation 1 starts at 0x0000013e0f49a8b0 generation 2 starts at 0x0000013e09f21000 ephemeral segment allocation context: none segment begin allocated size 0000013e09f20000 0000013e09f21000 0000013e0fb15b20 0x5bf4b20(96422688) Large object heap starts at 0x0000013e19f21000 segment begin allocated size 0000013e19f20000 0000013e19f21000 0000013e211b6f50 0x7295f50(120151888) ... 00000143d6850000 00000143d6851000 00000143db009118 0x47b8118(75202840) Total Size: Size: 0x33bd0f148 (13888450888) bytes. ------------------------------ GC Heap Size: Size: 0x33bd0f148 (13888450888) bytes.

从卦中可以看到提交内存是21.9G, Heap堆是5.3G,托管堆是13.8G,既然占了一半多的提交内存,看样子要从托管堆入手了。

2. 托管堆怎么了

看托管内存的占用,可以借助强大的perfview做一个快速识别,看看哪些gcroot根占用比较大,截图如下:

从卦中可以清晰的看到FinalizerQueue吃了几乎所有的托管内存,如果大家对FinalizerQueue有所了解,应该知道下一步的追踪方向了。

接下来使用!fq命令观察终结器队列情况,参考输出如下:

/* by yours.tools - online tools website : yours.tools/zh/excel2json.html */ 0:000> !fq SyncBlocks to be cleaned up: 0 Free-Threaded Interfaces to be released: 0 MTA Interfaces to be released: 0 STA Interfaces to be released: 0 ---------------------------------- generation 0 has 2722 finalizable objects (0000013f4c737e08->0000013f4c73d318) generation 1 has 73 finalizable objects (0000013f4c737bc0->0000013f4c737e08) generation 2 has 20328 finalizable objects (0000013f4c710080->0000013f4c737bc0) Ready for finalization 34482 objects (0000013f4c73d318->0000013f4c7808a8) Statistics for all finalizable objects (including all objects ready for finalization):

上面的Ready for finalization即 终结器队列的Freachable区域,也就是终结器线程提取数据的地方,可以看到此时这个小节里积压了3.4w的数据,也就表明此时的终结器线程应该出了问题。

3. 终结器线程怎么了

要想找到终结器线程,可以先用!t切过去再观察调用栈即可。

0:000> !t ThreadCount: 104 UnstartedThread: 0 BackgroundThread: 40 PendingThread: 0 DeadThread: 63 Hosted Runtime: no Lock ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 0 1 3854 0000013e082beb60 26020 Preemptive 0000013E0F6E63A0:0000013E0F6E79D8 0000013e08293ef0 0 STA 5 2 708 0000013e082e7bd0 2b220 Preemptive 0000000000000000:0000000000000000 0000013e08293ef0 0 MTA (Finalizer) 0:000> ~~[708]s win32u!NtUserMessageCall+0x14: 00007ff8`6b151124 c3 ret 0:005> k # Child-SP RetAddr Call Site 00 00000029`dbdfea38 00007ff8`6cce1082 win32u!NtUserMessageCall+0x14 01 00000029`dbdfea40 00007fff`9879b2d0 user32!SendMessageTimeoutW+0x102 02 00000029`dbdfead0 00007fff`985c4dc7 halcon!IOWIN32DumpToTexture+0xc90 03 00000029`dbdfef60 00007fff`974bff0e halcon!IPGenImaMask+0xae7 04 00000029`dbdfefd0 00007fff`9739d0ca halcon!HHandleClear+0x10e 05 00000029`dbdff050 00007ff7`f5d5a1a2 halcon!HLIClearHandle+0x2a 06 00000029`dbdff090 00007ff7`f5d5b571 halcondotnet!HalconDotNet.HHandleBase.ClearHandleInternal+0x92 07 00000029`dbdff140 00007ff7`f5ddf865 halcondotnet!HalconDotNet.HHandleBase.Dispose+0x21 08 00000029`dbdff180 00007ff8`542d67b6 halcondotnet!HalconDotNet.HHandleBase.Finalize+0x15 09 00000029`dbdff1c0 00007ff8`544934a1 clr!FastCallFinalizeWorker+0x6 0a 00000029`dbdff1f0 00007ff8`54493429 clr!FastCallFinalize+0x55 0b 00000029`dbdff240 00007ff8`54493358 clr!MethodTable::CallFinalizer+0xb5 0c 00000029`dbdff290 00007ff8`5449318b clr!CallFinalizer+0x5e 0d 00000029`dbdff2d0 00007ff8`544930a4 clr!FinalizerThread::DoOneFinalization+0x95 0e 00000029`dbdff3b0 00007ff8`544923fa clr!FinalizerThread::FinalizeAllObjects+0xbf 0f 00000029`dbdff3f0 00007ff8`542d7be8 clr!FinalizerThread::FinalizerThreadWorker+0xba 10 00000029`dbdff440 00007ff8`542d7b53 clr!ManagedThreadBase_DispatchInner+0x40 11 00000029`dbdff480 00007ff8`542d7a92 clr!ManagedThreadBase_DispatchMiddle+0x6c 12 00000029`dbdff580 00007ff8`5441c316 clr!ManagedThreadBase_DispatchOuter+0x4c 13 00000029`dbdff5f0 00007ff8`542dbcc5 clr!FinalizerThread::FinalizerThreadStart+0x116 14 00000029`dbdff690 00007ff8`6b3a7374 clr!Thread::intermediateThreadProc+0x8b 15 00000029`dbdff750 00007ff8`6d35cc91 kernel32!BaseThreadInitThunk+0x14 16 00000029`dbdff780 00000000`00000000 ntdll!RtlUserThreadStart+0x21 0:005> !clrstack OS Thread Id: 0x708 (5) Child SP IP Call Site 00000029dbdff0b8 00007ff86b151124 [InlinedCallFrame: 00000029dbdff0b8] HalconDotNet.HalconAPI.ClearHandle(IntPtr) 00000029dbdff0b8 00007ff7f5d5a1a2 [InlinedCallFrame: 00000029dbdff0b8] HalconDotNet.HalconAPI.ClearHandle(IntPtr) 00000029dbdff090 00007ff7f5d5a1a2 HalconDotNet.HHandleBase.ClearHandleInternal() 00000029dbdff140 00007ff7f5d5b571 HalconDotNet.HHandleBase.Dispose(Boolean) 00000029dbdff180 00007ff7f5ddf865 HalconDotNet.HHandleBase.Finalize() 00000029dbdff5d0 00007ff8542d67b6 [DebuggerU2MCatchHandlerFrame: 00000029dbdff5d0]

从卦象看,真尼玛坑爹呀,halcon的释放居然还要和某一个窗口通讯,即底层的NtUserMessageCall方法,窗口句柄记录在 rcx 寄存器里,输出如下:

0:005> r rax=0000000000001007 rbx=00000029dbdfef10 rcx=00000000000f3736 rdx=000000000000c258 rsi=000000000000c258 rdi=0000000000000000 rip=00007ff86b151124 rsp=00000029dbdfea38 rbp=00007fff985c4ed0 r8=0000000000000015 r9=0000000000000000 r10=00007fff96d40000 r11=0000000000000000 r12=00000029dbdfefb0 r13=00007fff985c4ed0 r14=0000000000000e20 r15=00000000000f3736 iopl=0 nv up ei pl zr na po nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246 win32u!NtUserMessageCall+0x14: 00007ff8`6b151124 c3 ret

接下来的问题如何找到 rcx 对应的窗口是哪一个,这个需要借助强大的 spy++ 探测,这个在我之前的文章都有所介绍,截图如下:

到这里所有的来龙去脉都搞清楚了,即窗体无响应导致的终结器线程卡死,进而引发灾难性的后果,最后让朋友重点关注下 halcon 以及用 spy++ 的探测。

三:总结

作为一个调试师,要善用多个分析工具,往往在解决问题时事半功倍。

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询