PInvoke: beyond the magic
Ever ran into problems passing data between unmanaged code and managed code ? Or just curious what really happens when you slap that [DllImport] on a method ? This post is for you: below I’ll shine some light inside the blackbox that’s called Platform Invoke.
Let’s start with a very minimal console app that has a call to an unmanaged Win32 function:
namespace TestPInvoke
{
class Program
{
[DllImport("kernel32.dll")]
static extern void Sleep(uint dwMilliseconds);
static void Main(string[] args)
{
Console.WriteLine("Press any key...");
while (!Console.KeyAvailable)
{
Sleep(1000);
}
}
}
}
Nothing exciting going on there: just the console polling for a keypress, and sleeping the thread for 1 second after every poll. The important thing of course is the way in which we sleep the thread, which is with PInvoke instead of using the usual mscorlib System.Threading.Thread.Sleep(Int32).
Now let’s run it under WinDbg + SOS, and see if we can find out what happens. The managed stack while sleeping looks like this:
Child SP IP Call Site
00ebee24 0108013d DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)
00ebee28 0108008e [InlinedCallFrame: 00ebee28] TestPInvoke.Program.Sleep(UInt32)
00ebee6c 0108008e TestPInvoke.Program.Main(System.String[])
On the bottom is the entrypoint. The next frame on the stack is just an information frame telling us the call to Program.Sleep was inlined in Main (notice the same IP). The next frame is more interesting: as the last frame on the managed stack this must be our marshalling stub.
We can dump the MethodDescriptor of the Program.Main and DomainBoundILStubClass.IL_STUB_PInvoke methods for comparison, which gives us:
0:000> !IP2MD 0108008e
MethodDesc: 00fc37c8
Method Name: TestPInvoke.Program.Main(System.String[])
Class: 00fc12a8
MethodTable: 00fc37e4
mdToken: 06000002
Module: 00fc2ed4
IsJitted: yes
CodeAddr: 01080050
and
0:000> !IP2MD 0108013d
MethodDesc: 00fc38f0
Method Name: DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)
Class: 00fc385c
MethodTable: 00fc38b0
mdToken: 06000000
Module: 00fc2ed4
IsJitted: yes
CodeAddr: 010800c0
This tells us both methods are originally IL code, and they are JIT compiled. For the Main method we knew this of course, and for the PInvoke stub it can’t be a surprise either given the class and method names. So let’s dump out the IL:
0:000> !DumpIL 00fc37c8
IL_0001: ldstr "Press any key..."
IL_0006: call System.Console::WriteLine
IL_000c: br.s IL_001b
IL_000f: ldc.i4 1000
IL_0014: call TestPInvoke.Program::Sleep
IL_001b: call System.Console::get_KeyAvailable
IL_0020: ldc.i4.0
IL_0021: ceq
IL_0025: brtrue.s IL_010e
IL_0027: ret
No surprises there. Next the stub:
0:000> !DumpIL 00fc38f0
error decoding IL
OK, that’s weird. The metadata tells us we have an IL compiled method, the JITted code is there:
0:000> !u 010800c0
Normal JIT generated code
DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)
(actual code left out)
but where is the IL body?
In fact, it turns out since .NET v4.0, all interop stubs are generated at runtime in IL and JIT compiled for the relevant architecture. Note this runtime IL has a clear difference with the IL emitted in runtime assemblies (for instance the ones generated for XML serialization), as the interop stubs aren’t contained in a runtime generated assembly or module. Instead, the module token is spoofed to be identical to the calling frame’s module (you can check this above). Likewise, there is only runtime data for these methods, and looking up its class info gives:
!DumpClass 00fc385c
Class Name:
mdToken: 02000000
File: C:\dev\voidcall\Profiler\ProfilerNext\TestPInvoke\TestPInvoke\bin\Debug\TestPInvoke.exe
Parent Class: 00000000
Module: 00fc2ed4
Method Table: 00fc38b0
Total Method Slots: 0
Class Attributes: 101
Transparency: Critical
This containing class – DomainBoundILStubClass – is some weird thing as well: it doesn’t inherit anything (not even System.Object), the name isn’t filled in, and there are no method slots, even though we know there is a at least one method in this class, namely the one we just followed to get to it. So probably this class is just a construct for keeping integrity in the CLR internal datastructures.
So there really seems to be no good way to get the IL of those stubs. The CLR team realized this as well and decided to publish the generated IL as ETW events. The ILStub Diagnostics tool can be used to intercept them. If we do this for our test program we see the following (formatted for readability):
// Managed Signature: void(uint32)
// Native Signature: unmanaged stdcall void(int32)
.maxstack 3
.locals (int32 A,int32 B)
// Initialize
call native int [mscorlib] System.StubHelpers.StubHelpers::GetStubContext()
call void [mscorlib] System.StubHelpers.StubHelpers::DemandPermission(native int)
// Marshal
ldc.i4 0x0
stloc.0
IL_0010:
ldarg.0
stloc.1
// CallMethod
ldloc.1
call native int [mscorlib] System.StubHelpers.StubHelpers::GetStubContext()
ldc.i4 0x14
add
ldind.i
ldind.i
calli unmanaged stdcall void(int32) //actual unmanaged method call
// Unmarshal (nothing in this case)
// Return
ret
The (un)marshalling isn’t very interesting in this case (int32 in and nothing out). To make it more clear for those who don’t use IL daily, I used ILAsm to compile this method body into a dll and used ILSpy to view it in decompiled C#:
static void ILStub_PInvoke(int A)
{
//initialize
StubHelpers.DemandPermission(StubHelpers.GetStubContext());
//CallMethod
calli(void(int32), A, *(*(StubHelpers.GetStubContext() + 20))); //not actual C#, but more readable anyway
}
The call to the unmanaged method is done with a calli instruction, which is a strongly typed call to an unmanaged callsite. The first parameter (not on the stack but encoded in IL), is the signature of the callsite [void(int32)], followed by (on the stack) the argument (in this case A), ultimately followed by the unmanaged function pointer (which must be stored in offset 20 of the context returned from StubHelpers.GetStubContext()).
So what magic takes place in StubHelpers.GetStubContext() ?
The answer will come naturally if we take for example a simple program that has 2 PInvoke methods with the same input and output arguments:
[DllImport("kernel32.dll")]
static extern void ExitThread(uint dwExitCode);
[DllImport("kernel32.dll")]
static extern void Sleep(uint dwMilliseconds);
If I let the CLR generate an IL stub for both methods, I have exactly the same input and output marshalling, and even the unmanaged function call signature (not address) is the same.
That seems a bit of a waste, so how could one optimize this ?
Indeed, we would save on basically everything we care about (RAM, JIT compilation) by just generating one IL stub for every unique input+output argument signature, and injecting that stub with the unmanaged address it needs to call.
This is exactly how it works: when the CLR encounters a PInvoke method, it pushes a frame on the stack (InlinedCallFrame) with info about - among other things - the unmanaged function address just before calling the actual IL stub.
The stub in turn requests this information through StubHelpers.GetStubContext() (aka ‘gimme my callframe’), and calls into the unmanaged function.
To see this in action, consider the code:
namespace TestPInvoke
{
class Program
{
[DllImport("kernel32.dll")]
static extern void Sleep(uint dwMilliseconds);
[DllImport("kernel32.dll", EntryPoint = "Sleep")]
static extern void SleepAgain(uint dwMilliseconds);
static void Main(string[] args)
{
Console.WriteLine("Press any key...");
while (!Console.KeyAvailable)
{
Sleep(500);
SleepAgain(500);
}
}
}
}
I’ll run this from WinDbg+SOS, here’s the disassembly of the calls to Sleep and SleepAgain in main:
mov ecx,1F4h
call 0042c04c (TestPInvoke.Program.Sleep(UInt32), mdToken: 06000001)
mov ecx,1F4h
call 0042c058 (TestPInvoke.Program.SleepAgain(UInt32), mdToken: 06000002)
You see the calls to Sleep and SleepAgain are pointing to different addresses. If we dump the unmanaged code at these locations we have:
!u 0042c04c (Sleep)
Unmanaged code
mov eax,42379Ch
jmp 006100d0 (DomainBoundILStubClass.IL_STUB_PInvoke(UInt32))
!u 0042c058 (SleepAgain)
Unmanaged code
mov eax,4237C8h
jmp 006100d0 (DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)
Indeed, we see in a few lines that some different value is loaded into eax, before jumping to the same address (the IL stub). Since the value in eax is the only thing seperating the two, this must be a pointer to our call frame.
So let’s consider these as memory addresses and check what’s there:
dd 42379Ch
0042379c 63000001 20ea0005 00000000 00192385
004237ac 001925ec 00423808 0042c010 00000000
dd 4237C8h
004237c8 630b0002 20ea0006 00000000 00192385
004237d8 001925ec 00423810 0042c01c 00000000
Now remember the offset in the calli instruction above ? The unmanaged call was to a pointer reference at offset 20 (14h) in our stubcontext. Or in plain words: take the value at offset 20 in the callframe (emphasized), and dereference it. This gives us:
00423808 => 7747cf49 (KERNEL32!SleepStub)
00423810 => 7747cf49 (same)
And there we have it, PInvoke demystified.
In a next post I’ll address the following questions:
- can we manually create our own marshalling stubs in C# (at compile time) ?
- can it be faster than the runtime generated one ?
- what about the reverse case (unmanaged code calling us) ?