调试托管堆损坏

时间:2013-08-14 06:43:23

标签: .net memory clr

我有一个有趣的问题,我有两个转储显示托管堆损坏的两个进程。我在Windows 7 x64上使用x64中的clr.dll 4.0.30319.1008(RTMGDR.030319-1000)。 使用VerifyHeap我知道我有腐败:

0:016> !VerifyHeap
object 000000000367ec60: bad member 0000000004fba740 at 000000000367ec78
curr_object:      000000000528CF90
Last good object: 000000000367ec40

该对象是一个包含两个元素的数组

0:016> !DumpObj /d 000000000367ec60
Name:        System.Object[]
MethodTable: 000007feedf6adf8
EEClass:     000007feedaefc68
Size:        48(0x30) bytes
Array:       Rank 1, Number of elements 2, Type CLASS (Print Array)
Element Type:System.Object
Fields:
None

0:016> !DumpArray /d 000000000367ec60
Name:        System.Object[]
MethodTable: 000007feedf6adf8
EEClass:     000007feedaefc68
Size:        48(0x30) bytes
Array:       Rank 1, Number of elements 2, Type CLASS
Element Methodtable: 000007feedf65a48
[0] 0000000004fba740
[1] 000000000367ec90

第一个指针是损坏的值,它确实指向值为1的bool值,该值不是托管对象。这就是GC拯救的原因。

0:016> db 0000000004fba740-10
00000000`04fba730  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba740  **01 00** 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba750  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba760  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba770  00 00 00 00 00 00 00 00-b8 1b f7 ed fe 07 00 00  ................
00000000`04fba780  d0 a7 fb 04 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba790  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`04fba7a0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................

0:016> !lno 04fba740 
Before:  0000000004fba718 System.Collections.Hashtable+bucket[] 
After:   0000000004fba778 System.Collections.Hashtable 
Heap local consistency confirmed.

周围的物体并不重要,因为根据转储,它们会随机变化。

!GCRoot 0000000367ec60
Scan Thread 16 OSTHread 5fd0
r10:Root:  000000000367ec60(System.Object[])
Scan Thread 17 OSTHread 10cc
RSP:1de4cd58:Root:  000000000367ec60(System.Object[])

数组本身没有root,表示可以收集它。有趣的是,数组中的第二个对象是来自已经退出的线程的ThreadLocal数据。看起来CLR确实将ThreadLocal对象存储在每个线程的对象数组中,该线程在退出时可以收集。 线程号17执行实际集合,它会抛出ExecutionEngineException。但是线程16似乎确实将线程本地数据保存到一个应该固定的数组(它不是)它应该没有访问权限。

线程nr 16似乎保存已经退出的线程的TLS数据,并且可能写入它。

OS Thread Id: 0x5fd0 (16)
Child SP         IP               Call Site
000000001dffdfe8 0000000076eb135a [NDirectMethodFrameStandalone: 000000001dffdfe8] MS.Win32.UnsafeNativeMethods.MsgWaitForMultipleObjects(Int32, IntPtr[], Boolean, Int32, Int32)
000000001dffdfa0 000007fecfa7e1bd DomainBoundILStubClass.IL_STUB_PInvoke(Int32, IntPtr[], Boolean, Int32, Int32)*** WARNING: Unable to verify checksum for UIAutomationClientsideProviders.ni.dll

000000001dffe090 000007fecfa7b28d MS.Internal.AutomationProxies.Misc.MsgWaitForMultipleObjects(Microsoft.Win32.SafeHandles.SafeWaitHandle, Boolean, Int32, Int32)
000000001dffe110 000007fecfab5cdd MS.Internal.AutomationProxies.QueueProcessor.WaitForWork()
000000001dffe1b0 000007feede22f78 System.Threading.ExecutionContext.runTryCode(System.Object)*** WARNING: Unable to verify checksum for mscorlib.ni.dll

000000001dffe8d8 000007fef08044c4 [HelperMethodFrame_PROTECTOBJ: 000000001dffe8d8] System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode, CleanupCode, System.Object)
000000001dffea00 000007feede11661 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000001dffea60 000007feede115ab System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
000000001dffeab0 000007feedea6d8d System.Threading.ThreadHelper.ThreadStart()
000000001dffef08 000007fef08044c4 [GCFrame: 000000001dffef08] 
000000001dfff2f0 000007fef08044c4 [DebuggerU2MCatchHandlerFrame: 000000001dfff2f0] 

这是GC收集的堆栈:

0:017> !DumpStack
OS Thread Id: 0x10cc (17)
Current frame: clr!WKS::gc_heap::mark_object_simple+0x75
Child-SP         RetAddr          Caller, Callee
000000001de4cce0 000007fef0877fb2 clr!WKS::gc_heap::mark_through_cards_for_segments+0x36b
000000001de4ce50 000007fef0873980 clr!WKS::gc_heap::mark_phase+0x160, calling clr!WKS::gc_heap::mark_through_cards_for_segments
000000001de4ce80 000007fef086fce7 clr!EEJitManager::CleanupCodeHeaps+0x57, calling clr!CrstBase::Leave
000000001de4cea0 000007fef07e3dc1 clr!CrstBase::Leave+0x31, calling clr!GetThread
000000001de4ced0 000007fef0873f3d clr!WKS::gc_heap::gc1+0xae, calling clr!WKS::gc_heap::mark_phase
000000001de4cef0 000007fef0874786 clr!WKS::gc_heap::update_collection_counts+0x16, calling 000000000065006e
000000001de4cf20 000007fef0a1fa56 clr!WKS::gc_heap::garbage_collect+0x42e, calling clr!WKS::gc_heap::gc1
000000001de4cf60 000007feede2d774 (MethodDesc 000007feedaa93b8 +0x124 System.TimeZoneInfo.GetDateTimeNowUtcOffsetFromUtc(System.DateTime, Boolean ByRef)), calling (MethodDesc 000007feedaa8708 +0 System.TimeSpan.Add(System.TimeSpan))
000000001de4cfa0 000007fef07fd4ff clr!SystemNative::__GetSystemTimeAsFileTime+0xf, calling kernel32!GetSystemTimeAsFileTimeStub
000000001de4cff0 000007fef087452e clr!WKS::GCHeap::GarbageCollectGeneration+0x14e, calling clr!WKS::gc_heap::garbage_collect
000000001de4d040 000007fef08734ce clr!WKS::gc_heap::try_allocate_more_space+0x25f, calling clr!WKS::GCHeap::GarbageCollectGeneration
000000001de4d080 000007fef0872f43 clr!WKS::gc_heap::allocate_small+0x158, calling clr!WKS::gc_heap::a_fit_segment_end_p
000000001de4d110 000007fef08731fe clr!FastAllocateObject+0x73e, calling clr!WKS::gc_heap::try_allocate_more_space
000000001de4d1f0 000007fef07fc8b8 clr!JIT_NewFast+0xb8, calling clr!FastAllocateObject
000000001de4d2c8 000007feede3fa80 (MethodDesc 000007feedaaa8e8 +0x40 System.Text.StringBuilder.ExpandByABlock(Int32)), calling clr!JIT_TrialAllocSFastMP_InlineGetThread

0:016> !Threads
ThreadCount:      17
UnstartedThread:  0
BackgroundThread: 13
PendingThread:    0
DeadThread:       1
Hosted Runtime:   no
                                           PreEmptive                                                   Lock
       ID  OSID        ThreadOBJ     State GC       GC Alloc Context                  Domain           Count APT Exception
   0    1  58e4 0000000000498ba0   2006020 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 STA
   2    2  4190 000000000049ee80      b220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA (Finalizer)
   6    3  48d4 000000001ac8bb60   1000220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 Ukn (Threadpool Worker)
   8    5  5fbc 000000001aca1970   a009220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA (Threadpool Completion Port)
   9    6  615c 000000001c4b2880      b020 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA
  10    7  5818 000000001c4e7bd0   200b220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA
  11    8  6e14 000000001c4f0850      7020 Enabled  0000000000000000:0000000000000000 0000000000481df0     2 STA
  12    a  683c 000000001c512610      7220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 STA
  14    b  6f40 000000001c521120      7220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 STA
  15    c  5070 000000001c564760   100a220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA (Threadpool Worker)
  16    d  5fd0 000000000049bc10      b220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA
  17    e  10cc 000000001c62e370      b220 Enabled  0000000000000000:0000000000000000 0000000000481df0     2 MTA (GC) System.ExecutionEngineException (0000000002441228)
XXXX    f       000000001e102c80     15820 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 Ukn
  22   10  158c 000000001e103aa0   1009220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA (Threadpool Worker)
  23   12  47e8 000000001e1048c0   8019220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 Ukn (Threadpool Completion Port)
  24    4  58a8 000000001e103390   8019220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 Ukn (Threadpool Completion Port)
  25    9  2874 000000001e102570   8009220 Enabled  0000000000000000:0000000000000000 0000000000481df0     0 MTA (Threadpool Completion Port)

这一切都很有趣但我不确定如何继续进行。由于错误确实发生在自动测试机上,测试控制器进程每天死掉大约1-2次,我不能简单地将调试器连接到进程并设置一些断点来保护写入特定的内存位置。任何额外的提示如何评价这一点是非常受欢迎的。我将获得更多转储,以便能够至少进行差异分析,以检查哪些测试可能导致此问题。

对我而言,看起来确实保留了线程静态的CLR数组是未固定的,有人确实将未装箱的bool值写入第一个数组元素。 CLR数组不包含值,但通常是托管对象的地址,但只有bool值(一个)而不是通常的CLR对象及其对象头。

错误的PInvoke签名会导致此行为吗?我见过像

这样的东西
    [DllImport( "kernel32.dll" )]
    public static extern bool Beep( int frequeny_in, int time_in );

确实返回一个字节的bool但是Beep方法确实返回一个4字节的bool。 PInvoke(bool而不是int)的错误返回类型是否会导致此类问题?

0 个答案:

没有答案