Useful null reference exceptions

As a .NET developer you’re guaranteed to have at some point run into the “Object reference not set to instance of an Object” message, or NullReferenceException. Encountering one without exception means there is a logic problem in your code, but finding what the supposed “instance of an Object” or target was can be quite hard. After all, the target was not assigned to the reference, so how do we know what should have been there ? In fact we can’t, as Eric Lippert explains.

However, just because we don’t know the target doesn’t mean we don’t know anything: in most cases there is some information we can obtain like:

  • type info: the type of the reference, and so the type of the target or at least what it inherits from (baseclass) or what it implements (interface)
  • operation info: the type of operation that caused the reference to be dereferenced

For example, in case the operation is a method call (callvirt), we know the C# compiler only compiles if it knows the methods are supported by the supposed target (through inheritance or directly), so with the info about the intended method we would have a good hint about what object is null.

According to MSFT, some other instructions that can throw NullReferenceException are: “The following Microsoft intermediate language (MSIL) instructions throw NullReferenceException: callvirt, cpblk, cpobj, initblk, ldelem.<type>, ldelema, ldfld, ldflda, ldind.<type>, ldlen, stelem.<type>, stfld, stind.<type>, throw, and unbox“. The amount of useful information will depend on the IL instruction context. However, currently none of this information is in the NullReferenceException: it just states some object reference was null, it was dereferenced, and in what method, nothing more, deal with it.

If you’re running debug code under Visual Studio this isn’t so much of a problem, as the debugger will break as soon as the dereferencing occurs (‘first chance’) and show you the source, but what about production code without source? Sure you can catch the exception and log it, but between the time it is thrown and the moment a catch handler is found, the runtime (CLR) is in control. At the time we are back in user code inside our catch handler, all we have to go on is the information in the NullReferenceException itself, which is essentially nothing. Debugging pros can attach WinDbg+SOS to a crashdump, but when the crash is the result of a ‘second chance exception’ (no exception handler was found) we’re at an even later stage. What we really want is a tool which can attach to the production process like a debugger, and get more info about first chance exceptions as they are thrown, as that’s the moment when all the info is available.

To give you an idea of the available info, the code below periodically throws a series of NullReferenceExceptions with a very different origin:

using System;
using System.Threading;

namespace TestNullReference
{
    interface TestInterface
    {
        void TestCall();
    }

    class TestClass
    {
        public int TestField;

        public void TestCall() { }
    }

    class TestClass2 : TestClass
    {   
    }

    class Program
    {
        #region methods to invoke null reference exceptions for various IL opcodes

        /// <summary>
        /// IL throw
        /// </summary>
        static void Throw()
        {
            throw null;
        }

        /// <summary>
        /// IL callvirt on interface
        /// </summary>
        static void CallVirtIf()
        {
            TestInterface i = null;
            i.TestCall();
        }

        /// <summary>
        /// IL callvirt on class
        /// </summary>
        static void CallVirtClass()
        {
            TestClass i = null;
            i.TestCall();
        }

        /// <summary>
        /// IL callvirt on inherited class
        /// </summary>
        static void CallVirtBaseClass()
        {
            TestClass2 i = null;
            i.TestCall();
        }

        /// <summary>
        /// IL ldelem
        /// </summary>
        /// <param name="a"></param>
        static void LdElem()
        {
            int[] array = null;
            var firstElement = array[0];
        }

        /// <summary>
        /// IL ldelema
        /// </summary>
        /// <param name="a"></param>
        static unsafe void LdElemA()
        {
            int[] array = null;
            fixed (int* firstElementA = &(array[0]))
            {                
            }
        }

        /// <summary>
        /// IL stelem
        /// </summary>
        /// <param name="a"></param>
        static void StElem()
        {
            int[] array = null;
            array[0] = 3;
        }

        /// <summary>
        /// IL ldlen
        /// </summary>
        /// <param name="a"></param>
        static void LdLen()
        {
            int[] array = null;
            var len = array.Length;
        }

        /// <summary>
        /// IL ldfld
        /// </summary>
        /// <param name="a"></param>
        static void LdFld()
        {
            TestClass c = null;
            var fld = c.TestField;
        }

        /// <summary>
        /// IL ldflda
        /// </summary>
        /// <param name="a"></param>
        static unsafe void LdFldA()
        {
            TestClass c = null;
            fixed (int* fld = &(c.TestField))
            {
                
            }
        }

        /// <summary>
        /// IL stfld
        /// </summary>
        /// <param name="a"></param>
        static void StFld()
        {
            TestClass c = null;
            c.TestField = 3;
        }

        /// <summary>
        /// IL unbox_any
        /// </summary>
        static void Unbox()
        {
            object o = null;
            var val = (int) o;
        }

        /// <summary>
        /// IL ldind
        /// </summary>
        static unsafe void LdInd()
        {
            int* valA = null;
           
            var val = *valA;
        }

        /// <summary>
        /// IL ldind
        /// </summary>
        static unsafe void StInd()
        {
            int* valA = null;

            *valA = 3;
        }
        #endregion

        static void LogNullReference(Action a)
        {
            try
            {
                a();
            }
            catch (NullReferenceException ex)
            {
                var msg = string.Format("NullReferenceException executing {0} : {1}", a.Method.Name, ex.Message);
                Console.WriteLine(msg);
            }
        }

        static void Main(string[] args)
        {
            while (!Console.KeyAvailable)
            {
                LogNullReference(Throw);

                LogNullReference(CallVirtIf);
                LogNullReference(CallVirtClass);
                LogNullReference(CallVirtBaseClass);

                LogNullReference(LdElem);
                LogNullReference(LdElemA);
                LogNullReference(StElem);
                LogNullReference(LdLen);

                LogNullReference(LdFld);
                LogNullReference(LdFldA);
                LogNullReference(StFld);

                LogNullReference(Unbox);

                LogNullReference(LdInd);
                LogNullReference(StInd);

                Thread.Sleep(2000);   
            }           
        }
    }
}

All 14 of them will give us the dreaded “Object reference not set to an instance of an object” message.

Now what happens if we attach a tracing tool that gets as much info as possible:

Attempted to throw an uninitialized exception object. In static void TestNullReference.Program::Throw() cil managed  IL 1/1 (reported/actual).
Attempted to call void TestNullReference.TestInterface::TestCall() cil managed  on an uninitialized type. In static void TestNullReference.Program::CallVirtIf() cil managed  IL 3/3 (reported/actual).
Attempted to call void TestNullReference.TestClass::TestCall() cil managed  on an uninitialized type. In static void TestNullReference.Program::CallVirtClass() cil managed  IL 3/3 (reported/actual).
Attempted to call void TestNullReference.TestClass::TestCall() cil managed  on an uninitialized type. In static void TestNullReference.Program::CallVirtBaseClass() cil managed  IL 3/3 (reported/actual).
Attempted to load elements of type System.Int32 from an uninitialized array. In static void TestNullReference.Program::LdElem() cil managed  IL 3/4 (reported/actual).
Attempted to load elements of type System.Int32 from an uninitialized array. In static void TestNullReference.Program::LdElemA() cil managed  IL 3/4 (reported/actual).
Attempted to store elements of type System.Int32 in an uninitialized array. In static void TestNullReference.Program::StElem() cil managed  IL 3/5 (reported/actual).
Attempted to get the length of an uninitialized array. In static void TestNullReference.Program::LdLen() cil managed  IL 3/3 (reported/actual).
Attempted to load non-static field int TestNullReference.TestClass::TestField from an uninitialized type. In static void TestNullReference.Program::LdFld() cil managed  IL 3/3 (reported/actual).
Attempted to load non-static field int TestNullReference.TestClass::TestField from an uninitialized type. In static void TestNullReference.Program::LdFldA() cil managed  IL 3/3 (reported/actual).
Attempted to store non-static field int TestNullReference.TestClass::TestField in an uninitialized type. In static void TestNullReference.Program::StFld() cil managed  IL 3/4 (reported/actual).
Attempted to cast/unbox a value/reference type of type System.Int32 using an uninitialized address. In static void TestNullReference.Program::Unbox() cil managed  IL 3/3 (reported/actual).
Attempted to load elements of type System.Int32 indirectly from an illegal address. In static void TestNullReference.Program::LdInd() cil managed  IL 4/4 (reported/actual).
Attempted to store elements of type System.Int32 indirectly to a misaligned or illegal address. In static void TestNullReference.Program::StInd() cil managed  IL 4/5 (reported/actual).

You can download and play with the tool already. Below I’ll shed some light on how this info can be obtained.

What the tracer does

One blog post is not enough to fully explain how to write a managed debugger. However, enough has been written about how to leverage the managed debugging API so for this post I’m going to assume we’ve attached a managed debugger to the target process, implemented debugger callback handlers, hooked them up and are handling exception callbacks.

The exception callback has the following signature:

HRESULT Exception (
    [in] ICorDebugAppDomain   *pAppDomain,
    [in] ICorDebugThread      *pThread,
    [in] ICorDebugFrame       *pFrame,
    [in] ULONG32              nOffset,
    [in] CorDebugExceptionCallbackType dwEventType,
    [in] DWORD                dwFlags
);

The actual exception can be obtained from the thread as an ICorDebugReferenceValue which can be dereferenced to an ICorDebugObjectValue of which you can ultimately get the ICorDebugClass and metadata token (mdTypeDef). To find out if this exception is a NullReferenceException, you can either look up this token using the metadata APIs, or compare it to a prefetched metadata token.

When we know we’re dealing with a 1st chance null reference exception, we can dig deeper and try to find out the offending IL instruction. From nOffset, we already have the IL offset in the method frame’s code. The code itself can be obtained by querying the ICorDebugFrame for an ICorDebugILFrame interface, and requesting it for its code (ICorDebugCode2), which has a method for retreiving the actual IL bytes.

Depending on the IL instruction we find at nOffset in the IL bytes, we can get various details and log them.

For the instructions that can throw:

  • callvirt: a call to a known instance method (mdMethodDef) on an uninitialized type
  • cpblk, cpobj, initblk: shouldn’t happen (not exposed by C#)
  • ldelem.<type>, ldelema, stelem.<type>: an attempt to load/store elements of a known type (mdTypeDef) from/to an uninitialized array
  • ldfld, ldflda, stfld: an attempt to load/store a known non-static field (mdFieldDef) of a known uninitialized type
  • ldind.<type>, stind.<type>: an invalid address was passed to the load instruction, or a misaligned address was passed to the store instruction (shouldn’t happen as this would be a compiler instead of user code bug)
  • ldlen: an attempt to get the length of an uninitialized array
  • throw: an attempt to throw an uninitialized exception object
  • unbox, unbox_any: an attempt to cast/unbox a value/reference type of a known type (mdTypeDef) using an uninitialized address

The various metadata tokens can be looked up using the metadata APIs mentioned before, and finally formatted into a nice message.

TraceCLI: a production debugging and tracing tool

Ever had this application that just crashed or malfunctioned in a Production environment ? And whatever you tried, it wasn’t reproducible in Test ? Or perhaps it was just slow in serving some pages (web application), and you wanted to see the actual SQL your Object Relational Model (ORM) generated and sent to the database ?

Well, there are some options.

For one, you could add more logging. Typically though, Murphy’s Law interferes and even though you think you log ‘everything’, you find out  when you really need it you forgot about that one spot, or need more detail (like argument or field values). Changing the code and building a new release, not to mention get it through Test, Acceptance and into Production is not really an attractive option here.

An alternative is to use WinDbg (+ SOS) on a crashdump obtained using a GUI app like DebugDiag or a command line app like ProcDump. For real crashes this ‘post mortem debugging’ option is often sufficient: you’ll have a full memory dump to work with taken at the exact time of the crash so all the details are there. However, using WinDbg is a pretty hardcore skill not every (.NET) developer has, and for more subtle malfunctions – for instance your application not behaving nicely with other systems – or in case of performance issues, you typically want to inspect a live target anyway.

And here it becomes problematic: I probably don’t have to tell you attaching a debugger to a live production process and stepping through it is not a good idea. Although even without WinDbg there’s still a large set of tools at your disposal (SQL tracing/profiling, performance counters, ETW tracing), those are mostly indirect and slow to setup: you’re watching symptoms instead of the real disease.

What I really wanted was a tool which I can attach to a live target without disrupting it, and in which I can specify some classes, methods or even namespaces to monitor (hitcounts, timings). Perhaps I even want to dump some (private) fields of a class when a method is hit. That sort of thing.

It didn’t exist (or I couldn’t find it), and I was searching for a challenge anyway, so I made one myself using native C++ and the debugging APIs.

The tool – for now named TraceCLI – allows you to live attach to a process (x86 or x64), and instrument the namespaces, classes, and methods you want. On hitting them, you can either just monitor their hitcounts or do more advanced stuff like measure the time in the method (for now including calls to other methods), as well as dump fields of the class the method is a member of on entering the method. For simple much used scenarios, there are presets (like SQL tracing) and for very advanced scenarios or multiple hit filter settings, you can also provide a configuration with an xml config file.

Although ultimately the goal is to use it on production processes, it’s still very alpha, both in terms of stability as well as feature set. So please use at your own risk.

Example: SQL tracing

For example, to trace SQL queries manually (without a preset), we could use the following commandline:

traceCLI_cmdline

which specifies we want to dump the field ‘_commandText’ of a class with a method ‘RunExecuteReader’ upon entering the method.

During startup, TraceCLI attaches to the process and checks all loaded assemblies for such methods, and instruments them:

14:33:53.588 TraceCLI (x86) v0.1 - Ruurd Keizer
14:33:53.588 ### PARAMETERS
14:33:53.589 Filter #1: namespace (null), class (null), method RunExecuteReader.
14:33:53.589   Fields dumped on entry: _commandText
14:33:53.658 ### ATTACH
14:33:53.659 Attaching to process with pId 5320: C:\Program Files (x86)\IIS Express\iisexpress.exe
14:33:53.738 Processing filter...
14:33:53.739 Searching domain: DefaultDomain
14:33:53.739 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Configuration\v4.0_4.0.0.0__b03f5f7f11d50a3a\System.Configuration.dll
14:33:53.755 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Core\v4.0_4.0.0.0__b77a5c561934e089\System.Core.dll
14:33:53.800 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\Microsoft.Build.Utilities.v4.0\v4.0_4.0.0.0__b03f5f7f11d50a3a\Microsoft.Build.Utilities.v4.0.dll
(...)
14:33:54.077 Searching domain: /LM/W3SVC/4/ROOT-1-130487604823520283
14:33:54.078 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Data.Linq\v4.0_4.0.0.0__b77a5c561934e089\System.Data.Linq.dll
14:33:54.093 Searching assembly: C:\Windows\Microsoft.NET\Framework\v4.0.30319\Temporary ASP.NET Files\root\4ecd037f\b8cc2ad2\assembly\dl3\2c89eab9\1cd35552_3295cf01\Solvay.Common.dll
14:33:54.094 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\Microsoft.Web.Infrastructure\v4.0_1.0.0.0__31bf3856ad364e35\Microsoft.Web.Infrastructure.dll
(...)
14:33:54.796 Searching assembly: C:\Windows\Microsoft.NET\Framework\v4.0.30319\Temporary ASP.NET Files\root\4ecd037f\b8cc2ad2\assembly\dl3\5283b6cf\d83f43ef_2d8acf01\System.Web.Http.WebHost.dll
14:33:54.798 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
14:33:54.902 Found 4 methods satisfying the filters
14:33:54.902 Found method: internal System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReader(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.903 Found method: internal System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReader(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.903 Found method: private System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReaderTds(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.903 Found method: private System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReaderSmi(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.904 Activating all breakpoints

While the tool is attached it dumps SQL to the log as it executes it:

14:33:54.904 ### TRACING
14:34:00.617 Field: private string System.Data.SqlClient.SqlCommand::_commandText, Value: SELECT
    [GroupBy1].[A1] AS [C1]
    FROM ( SELECT
        COUNT(1) AS [A1]
        FROM  [dbo].[T_Computer] AS [Extent1]
        INNER JOIN [dbo].[T_OS] AS [Extent2] ON [Extent1].[HWComputerIndex] = [Extent2].[ComputerIndex]
    )  AS [GroupBy1]
14:34:00.626 Field: private string System.Data.SqlClient.SqlCommand::_commandText, Value: SELECT
    [GroupBy1].[A1] AS [C1]
    FROM ( SELECT
        COUNT(1) AS [A1]
        FROM  [dbo].[T_Computer] AS [Extent1]
        INNER JOIN [dbo].[T_OS] AS [Extent2] ON [Extent1].[HWComputerIndex] = [Extent2].[ComputerIndex]
    )  AS [GroupBy1]
14:34:00.635 Field: private string System.Data.SqlClient.SqlCommand::_commandText, Value: SELECT
    [GroupBy1].[A1] AS [C1]
    FROM ( SELECT
        COUNT(1) AS [A1]
        FROM ( SELECT DISTINCT
            [Extent1].[SlaSID] AS [SlaSID],
            [Extent1].[SlaInfo] AS [SlaInfo],
            [Extent1].[SLaTeam] AS [SLaTeam],
            [Extent1].[SlaSource] AS [SlaSource]
            FROM [dbo].[T_SID_SLA_DRP] AS [Extent1]
            WHERE ( NOT ((N'int' = [Extent1].[SLaTeam]) AND ([Extent1].[SLaTeam] IS NOT NULL))) AND ( NOT ((N'wintel' = [Extent1].[SlaSource]) AND ([Extent1].[SlaSource] IS NOT NULL)))
        )  AS [Distinct1]
    )  AS [GroupBy1]
Pinvoke

PInvoke: beyond the magic

Ever ran into problems passing data between unmanaged code and managed code ? Or just curious what really happens when you slap that [DllImport] on a method ? This post is for you: below I’ll shine some light inside the blackbox that’s called Platform Invoke.

Let’s start with a very minimal console app that has a call to an unmanaged Win32 function:

namespace TestPInvoke
{
    class Program
    {
        [DllImport("kernel32.dll")]
        static extern void Sleep(uint dwMilliseconds);

        static void Main(string[] args)
        {
            Console.WriteLine("Press any key...");

            while (!Console.KeyAvailable)
            {
                Sleep(1000);
            }
        }
    }
}

Nothing exciting going on there: just the console polling for a keypress, and sleeping the thread for 1 second after every poll. The important thing of course is the way in which we sleep the thread, which is with PInvoke instead of using the usual mscorlib System.Threading.Thread.Sleep(Int32).

Now let’s run it under WinDbg + SOS, and see if we can find out what happens. The managed stack while sleeping looks like this:

Child SP IP       Call Site
00ebee24 0108013d DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)
00ebee28 0108008e [InlinedCallFrame: 00ebee28] TestPInvoke.Program.Sleep(UInt32)
00ebee6c 0108008e TestPInvoke.Program.Main(System.String[])

On the bottom is the entrypoint. The next frame on the stack is just an information frame telling us the call to Program.Sleep was inlined in Main (notice the same IP). The next frame is more interesting: as the last frame on the managed stack this must be our marshalling stub.

We can dump the MethodDescriptor of the Program.Main and DomainBoundILStubClass.IL_STUB_PInvoke methods for comparison, which gives us:

0:000> !IP2MD 0108008e
MethodDesc: 00fc37c8
Method Name: TestPInvoke.Program.Main(System.String[])
Class: 00fc12a8
MethodTable: 00fc37e4
mdToken: 06000002
Module: 00fc2ed4
IsJitted: yes
CodeAddr: 01080050

and

0:000> !IP2MD 0108013d
MethodDesc: 00fc38f0
Method Name: DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)
Class: 00fc385c
MethodTable: 00fc38b0
mdToken: 06000000
Module: 00fc2ed4
IsJitted: yes
CodeAddr: 010800c0

This tells us both methods are originally IL code, and they are JIT compiled. For the Main method we knew this of course, and for the PInvoke stub it can’t be a surprise either given the class and method names. So let’s dump out the IL:

0:000> !DumpIL 00fc37c8
IL_0001: ldstr "Press any key..."
IL_0006: call System.Console::WriteLine
IL_000c: br.s IL_001b
IL_000f: ldc.i4 1000
IL_0014: call TestPInvoke.Program::Sleep
IL_001b: call System.Console::get_KeyAvailable
IL_0020: ldc.i4.0
IL_0021: ceq
IL_0025: brtrue.s IL_010e
IL_0027: ret

No surprises there. Next the stub:

0:000> !DumpIL 00fc38f0
error decoding IL

OK, that’s weird. The metadata tells us we have an IL compiled method, the JITted code is there:

0:000> !u 010800c0
Normal JIT generated code
DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)
(actual code left out)

but where is the IL body?

In fact, it turns out since .NET v4.0, all interop stubs are generated at runtime in IL and JIT compiled for the relevant architecture. Note this runtime IL has a clear difference with the IL emitted in runtime assemblies (for instance the ones generated for XML serialization), as the interop stubs aren’t contained in a runtime generated assembly or module. Instead, the module token is spoofed to be identical to the calling frame’s module (you can check this above). Likewise, there is only runtime data for these methods, and looking up its class info gives:

!DumpClass 00fc385c
Class Name: 
mdToken: 02000000
File: C:\dev\voidcall\Profiler\ProfilerNext\TestPInvoke\TestPInvoke\bin\Debug\TestPInvoke.exe
Parent Class: 00000000
Module: 00fc2ed4
Method Table: 00fc38b0
Total Method Slots: 0
Class Attributes: 101
Transparency: Critical

This containing class – DomainBoundILStubClass – is some weird thing as well: it doesn’t inherit anything (not even System.Object), the name isn’t filled in, and there are no method slots, even though we know there is a at least one method in this class, namely the one we just followed to get to it. So probably this class is just a construct for keeping integrity in the CLR internal datastructures.

So there really seems to be no good way to get the IL of those stubs. The CLR team realized this as well and decided to publish the generated IL as ETW events. The ILStub Diagnostics tool can be used to intercept them. If we do this for our test program we see the following (formatted for readability):

// Managed Signature: void(uint32)
// Native Signature: unmanaged stdcall void(int32)
.maxstack 3
.locals (int32 A,int32 B)
// Initialize
    call native int [mscorlib] System.StubHelpers.StubHelpers::GetStubContext()
    call void [mscorlib] System.StubHelpers.StubHelpers::DemandPermission(native int)
// Marshal
    ldc.i4 0x0
    stloc.0    
IL_0010:
    ldarg.0
    stloc.1    
// CallMethod
    ldloc.1
    call native int [mscorlib] System.StubHelpers.StubHelpers::GetStubContext()
    ldc.i4 0x14
    add
    ldind.i
    ldind.i
    calli unmanaged stdcall void(int32) //actual unmanaged method call
// Unmarshal (nothing in this case)
// Return
    ret

The (un)marshalling isn’t very interesting in this case (int32 in and nothing out). To make it more clear for those who don’t use IL daily, I used ILAsm to compile this method body into a dll and used ILSpy to view it in decompiled C#:

static void ILStub_PInvoke(int A)
{
    //initialize
    StubHelpers.DemandPermission(StubHelpers.GetStubContext());
    //CallMethod
    calli(void(int32), A, *(*(StubHelpers.GetStubContext() + 20))); //not actual C#, but more readable anyway
}

The call to the unmanaged method is done with a calli instruction, which is a strongly typed call to an unmanaged callsite. The first parameter (not on the stack but encoded in IL), is the signature of the callsite [void(int32)], followed by (on the stack) the argument (in this case A), ultimately followed by the unmanaged function pointer (which must be stored in offset 20 of the context returned from StubHelpers.GetStubContext()).

So what magic takes place in StubHelpers.GetStubContext() ?

The answer will come naturally if we take for example a simple program that has 2 PInvoke methods with the same input and output arguments:

[DllImport("kernel32.dll")]
static extern void ExitThread(uint dwExitCode);

[DllImport("kernel32.dll")]
static extern void Sleep(uint dwMilliseconds);

If I let the CLR generate an IL stub for both methods, I have exactly the same input and output marshalling, and even the unmanaged function call signature (not address) is the same.

That seems a bit of a waste, so how could one optimize this ?

Indeed, we would save on basically everything we care about (RAM, JIT compilation) by just generating one IL stub for every unique input+output argument signature, and injecting that stub with the unmanaged address it needs to call.

This is exactly how it works: when the CLR encounters a PInvoke method, it pushes a frame on the stack (InlinedCallFrame) with info about – among other things – the unmanaged function address just before calling the actual IL stub.

The stub in turn requests this information through StubHelpers.GetStubContext() (aka ‘gimme my callframe’), and calls into the unmanaged function.

To see this in action, consider the code:

namespace TestPInvoke
{
    class Program
    {
        [DllImport("kernel32.dll")]
        static extern void Sleep(uint dwMilliseconds);

        [DllImport("kernel32.dll", EntryPoint = "Sleep")]
        static extern void SleepAgain(uint dwMilliseconds);

        static void Main(string[] args)
        {
            Console.WriteLine("Press any key...");

            while (!Console.KeyAvailable)
            {
                Sleep(500);
                SleepAgain(500);
            }
        }
    }
}

I’ll run this from WinDbg+SOS, here’s the disassembly of the calls to Sleep and SleepAgain in main:

mov     ecx,1F4h
call    0042c04c (TestPInvoke.Program.Sleep(UInt32), mdToken: 06000001)
mov     ecx,1F4h
call    0042c058 (TestPInvoke.Program.SleepAgain(UInt32), mdToken: 06000002)

You see the calls to Sleep and SleepAgain are pointing to different addresses. If we dump the unmanaged code at these locations we have:

!u 0042c04c (Sleep)
Unmanaged code
mov     eax,42379Ch
jmp     006100d0 (DomainBoundILStubClass.IL_STUB_PInvoke(UInt32))

!u 0042c058 (SleepAgain)
Unmanaged code
mov     eax,4237C8h
jmp     006100d0 (DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)

Indeed, we see in a few lines that some different value is loaded into eax, before jumping to the same address (the IL stub). Since the value in eax is the only thing seperating the two, this must be a pointer to our call frame.

So let’s consider these as memory addresses and check what’s there:

dd 42379Ch
0042379c  63000001 20ea0005 00000000 00192385
004237ac  001925ec 00423808 0042c010 00000000

dd 4237C8h
004237c8  630b0002 20ea0006 00000000 00192385
004237d8  001925ec 00423810 0042c01c 00000000

Now remember the offset in the calli instruction above ? The unmanaged call was to a pointer reference at offset 20 (14h) in our stubcontext. Or in plain words: take the value at offset 20 in the callframe (emphasized), and dereference it. This gives us:

00423808 => 7747cf49 (KERNEL32!SleepStub)
00423810 => 7747cf49 (same)

And there we have it, PInvoke demystified.

In a next post I’ll address the following questions:

  • can we manually create our own marshalling stubs in C# (at compile time) ?
  • can it be faster than the runtime generated one ?
  • what about the reverse case (unmanaged code calling us) ?