Getting started with Lattice

In April, Pivotal released Lattice, a platform for hosting cloud native applications which is aimed at being accessible and convenient for developers. However, it’s not just that: it’s also a testbed for the new elastic runtime codename ‘Diego’ which we will likely see incorporated in the next version of Lattice’s big – enterprise ready – brother Pivotal Cloud Foundry in due time. This new runtime comes with the ability to run Docker workloads which makes it very interesting.

In this post, I’ll describe the minimal steps required to set up a machine in which we create another VM using vagrant and virtualbox which will run Lattice and host its first containerized workloads. Note: in case you already run VMware fusion/workstation and the VMware integration for vagrant, you don’t need the initial hosting VM, so you can skip the first steps and go directly to ‘Get Lattice’.

Create a virtual machine

In fact, it doesn’t have to be virtual, just get a x64 machine, either physical or using your hypervisor of choice. Since this machine will run its own virtualized workload, it’s essential it has virtualization instructions, either hardware based or virtualized. For example in VMware Workstation this option shows as: Virtualize VTx

Install Ubuntu

Install the latest stable Ubuntu (desktop), and make sure it’s updated.

Install vagrant and virtualbox

In order to spin up the lattice machine, we’ll use virtualbox, and to provision it lattice depends on vagrant. For vagrant we need a version (>1.6) which is not by default in the ubuntu repos, so we install it via direct download:

sudo apt-get install virtualbox
sudo dpkg -i vagrant_1.7.2_x86_64.deb

Get lattice

Install git and clone the lattice repository:

sudo apt-get install git
git clone
cd lattice
git checkout <VERSION>

Here <VERSION> is the version you find in the file ‘Version’.

Next provision the virtual machine with
vagrant up

Of course we could ssh to the lattice VM now, but the idea is to access it via API calls. The lattice CLI wraps the API and offers a convenient interface.

Get/build the lattice CLI

You can build the lattice CLI from source, or download a binary. If you take the download option you can skip the following paragraph.

Building the CLI from source

In order to do this we need Go, and again the version in the ubuntu repos is too old (<1.4):

wget --no-check-certificate --no-verbose
tar -C /usr/local -xzf go1.4.2.linux-amd64.tar.gz

And to setup the build environment:

export GOROOT=/usr/local/go
export GOPATH=$HOME/go
export PATH=$PATH:$GOROOT/bin:$GOPATH/bin

If you want this to persist, add the exports to ~/.bashrc

Now build the CLI binary:

go get -d

You can check if the CLI has been build successfully by typing: ltc

Connecting to the lattice

Point the CLI to the lattice instance:
ltc target <system_ip_from_Vagrantfile>

If you run into errors or timeouts at this stage, try to ping the lattice VM, or (re)start the lattice VM directly from VirtualBox which will usually tell you what’s wrong.

Running a test workload

A Docker container hosting a simple test web application is available in the Docker hub. You can spin up your first instance with:
ltc create lattice-app cloudfoundry/lattice-app

Check it’s status with
ltc status lattice-app

Next, spin up a few more with
ltc scale lattice-app 4

When this is done, point a browser at lattice-app.<system_ip_from_Vagrantfile> and refresh a couple of times to see the built in load balancing and router in action.

Cloud native applications: a primer

At times, IT departments can get so large and influential it can become tempting to believe in the fallacy that the IT department has a right to exist in itself. But here’s the thing:

  • unless you work in a real tech company, the only raison d’être for IT is to support the business, and more specifically to support business applications
  • the business doesn’t care how you set up your infrastructure, as long as they get the apps they want, and as long as they work

App evolution

In the old days in order to run and support one application, you needed to add a new wing to your datacenter, insert tons of equipment and staff it with greasemonkeys to keep it running. Clearly, this was less than ideal as applications had strong dependencies on hardware and Operating System (OS). This meant developers couldn’t just focus on adding value for the business but instead had to spend lots (if not most) of their time on worrying about the hardware, infrastructure and OS.

As technology evolved, we have seen a movement to get rid of those dependencies and let development become as app centric as possible:

  • virtualization has allowed us to run applications without physical hardware dependencies
  • standardization and abstraction made sure we could worry about just one type of virtual hardware in the enterprise (x86, LUN/NFS storage)
  • virtual machine languages (Java/.NET) and run anywhere languages (javascript, python) have abstracted the remaining hardware (x86) and OS dependencies away

Platforms & deployment

The same kind of movement has taken place for app deployment: where the first applications were identified by the mainframe they ran on, the later (2nd) generation apps used a client-server scale up platform, where we now see a movement to scale out cloud applications.


Second and third generation platform applications.

These 3rd generation applications or cloud native applications are characterized by being independent of the hardware and OS, built to scale, built for failure, and running as Software-as-a-Service (SaaS) on a highly automated – 3rd generation – platform.


Containers are the common unit of deployment for cloud native applications, and can be seen as Operating System level virtualization: where virtual machines share the hardware of their hosting physical machine, containers share the OS core (kernel) of their hosting (virtual) machine. Therefore, containers provide a very lightweight deployment mechanism as well as OS level isolation enabling scale out of new instances in seconds (compared to minutes for VMs). In practical terms, a container is the application itself, bundled together with any libraries it requires to run and any user mode customization of the OS.


Containers are isolated but share the OS kernel of the host, thus are lightweight. However, each container must have the same OS kernel.

An important thing to keep in mind is that containers do not lift the Operating System dependency:

  • containers can only run on the same machine if they depend on the same OS kernel
  • the app running inside the container should be made to run on that particular OS
  • the OS kernel has to support containers

Operating system level virtualization support has been in OS kernels for a long time, and the same can be said about container implementations. However, initial adoption was slow until publically available guest OS (Linux) and container implementations (Docker) offered access to developers in a friendly way. Ever since adoption has skyrocketed as containers enable very short application release cycles.

3rd generation platforms

While containers are great, they are really just the deployment artifact of a true 3rd generation platform. A complete platform requires the following essentials:

The various platforms above are very much still in development. Platforms differ most by maturity, supported languages and/or runtimes (java/node/.net), the implied OS (Linux/Windows), out of the box stack integration, intended workloads (big data vs. enterprise apps), centralized logging, management UI capabilities, and maybe most important: community. Some illustrations of the evolving landscape:

  • as it stands in almost all cases the constainer OS is some flavor of Linux, but Microsoft announced Docker engine support in the next server OS, and cloud foundry has alpha engine (BOSH) support for Windows in the form of IronFoundry.
  • Docker is (becoming) the dominant container packaging artifact. To illustrate this, originally Cloud Foundry uses a different container construction technology based on detection of the bare application type (java/node) followed by a scripted, on demand construction of a droplet. The next iteration codename Diego will have support for various container back-ends including Docker.

Containers or VMs?

After seeing all the goodness containers can provide, it’s a natural question to wonder whether we still need virtual machines. After all, if we have OS level virtualization, why would we need to stack it on top of machine level virtualization? The answer is two fold:

  • security: hardware level virtualization is much more mature, and hardware assisted. This provides a battle hardened level of isolation on top of software based container isolation. Moreover, specifying security policies such as firewalls at the hardware level can still be convenient.
  • virtualization suites such as VMware vSphere are not just a hypervisor. The extra functionality provided by for example the ability to dynamically distribute resources and deploy new machines from templates in minutes can still be a convincing argument in favor of hardware virtualization, also when using OS level virtualization on top of it.

Moving to the 3rd generation

Creating new apps for the 3rd generation platform is not rocket science. It does however require a different mindset on what a solution architecture should look like. Developers and businesses should both understand the (im)possibilities in adopting this new platform generation.

Developers: the 12 factor app

I already mentioned a limited set of characteristics of cloud native applications above. Adam Wiggins wrote a manifesto about the so called 12 factor app in which he describes the ingredients/factors for a proper cloud native app. Depending on the specific application, not all factors described are equally relevant, so the 12 factors should be considered as a set of best practices, not a law set in stone. However, developers writing new applications or migrating legacy applications to the cloud would do well to consult this document before taking any definite decisions.

All in or hybrid?

It’s always more easy to start with a clean slate. When you for example have a greenfield with some ESXi hosts, it’s relatively easy to install a 3rd generation platform stack and start hosting 12 factor applications. In practice however, most enterprises will have numerous legacy apps running just fine on (virtualized) hardware. Depending on the number and type of legacy apps in your organisation, it could be you want to go all in on the next generation and make a clean switch, or rather build a hybrid environment.

In the case of a hybrid approach you will have to identify the best candidates for early migration. To do this you can look at:

  • application workload: what apps have a substantial workload which is in your current environment scaled up, not out?
  • the work involved to migrate: stateless, modular applications are much more easy to migrate than stateful, monolithic ones

The reason why existing web applications are usually the top candidates is because when designed well they are stateless and modular. Therefore, an example of a hybrid approach is for instance to migrate specific web applications to the new platform, and keep existing databases on predetermined fixed hardware and provision them as a service on the 3rd platform.

Useful null reference exceptions

As a .NET developer you’re guaranteed to have at some point run into the “Object reference not set to instance of an Object” message, or NullReferenceException. Encountering one without exception means there is a logic problem in your code, but finding what the supposed “instance of an Object” or target was can be quite hard. After all, the target was not assigned to the reference, so how do we know what should have been there ? In fact we can’t, as Eric Lippert explains.

However, just because we don’t know the target doesn’t mean we don’t know anything: in most cases there is some information we can obtain like:

  • type info: the type of the reference, and so the type of the target or at least what it inherits from (baseclass) or what it implements (interface)
  • operation info: the type of operation that caused the reference to be dereferenced

For example, in case the operation is a method call (callvirt), we know the C# compiler only compiles if it knows the methods are supported by the supposed target (through inheritance or directly), so with the info about the intended method we would have a good hint about what object is null.

According to MSFT, some other instructions that can throw NullReferenceException are: “The following Microsoft intermediate language (MSIL) instructions throw NullReferenceException: callvirt, cpblk, cpobj, initblk, ldelem.<type>, ldelema, ldfld, ldflda, ldind.<type>, ldlen, stelem.<type>, stfld, stind.<type>, throw, and unbox“. The amount of useful information will depend on the IL instruction context. However, currently none of this information is in the NullReferenceException: it just states some object reference was null, it was dereferenced, and in what method, nothing more, deal with it.

If you’re running debug code under Visual Studio this isn’t so much of a problem, as the debugger will break as soon as the dereferencing occurs (‘first chance’) and show you the source, but what about production code without source? Sure you can catch the exception and log it, but between the time it is thrown and the moment a catch handler is found, the runtime (CLR) is in control. At the time we are back in user code inside our catch handler, all we have to go on is the information in the NullReferenceException itself, which is essentially nothing. Debugging pros can attach WinDbg+SOS to a crashdump, but when the crash is the result of a ‘second chance exception’ (no exception handler was found) we’re at an even later stage. What we really want is a tool which can attach to the production process like a debugger, and get more info about first chance exceptions as they are thrown, as that’s the moment when all the info is available.

To give you an idea of the available info, the code below periodically throws a series of NullReferenceExceptions with a very different origin:

using System;
using System.Threading;

namespace TestNullReference
    interface TestInterface
        void TestCall();

    class TestClass
        public int TestField;

        public void TestCall() { }

    class TestClass2 : TestClass

    class Program
        #region methods to invoke null reference exceptions for various IL opcodes

        /// <summary>
        /// IL throw
        /// </summary>
        static void Throw()
            throw null;

        /// <summary>
        /// IL callvirt on interface
        /// </summary>
        static void CallVirtIf()
            TestInterface i = null;

        /// <summary>
        /// IL callvirt on class
        /// </summary>
        static void CallVirtClass()
            TestClass i = null;

        /// <summary>
        /// IL callvirt on inherited class
        /// </summary>
        static void CallVirtBaseClass()
            TestClass2 i = null;

        /// <summary>
        /// IL ldelem
        /// </summary>
        /// <param name="a"></param>
        static void LdElem()
            int[] array = null;
            var firstElement = array[0];

        /// <summary>
        /// IL ldelema
        /// </summary>
        /// <param name="a"></param>
        static unsafe void LdElemA()
            int[] array = null;
            fixed (int* firstElementA = &(array[0]))

        /// <summary>
        /// IL stelem
        /// </summary>
        /// <param name="a"></param>
        static void StElem()
            int[] array = null;
            array[0] = 3;

        /// <summary>
        /// IL ldlen
        /// </summary>
        /// <param name="a"></param>
        static void LdLen()
            int[] array = null;
            var len = array.Length;

        /// <summary>
        /// IL ldfld
        /// </summary>
        /// <param name="a"></param>
        static void LdFld()
            TestClass c = null;
            var fld = c.TestField;

        /// <summary>
        /// IL ldflda
        /// </summary>
        /// <param name="a"></param>
        static unsafe void LdFldA()
            TestClass c = null;
            fixed (int* fld = &(c.TestField))

        /// <summary>
        /// IL stfld
        /// </summary>
        /// <param name="a"></param>
        static void StFld()
            TestClass c = null;
            c.TestField = 3;

        /// <summary>
        /// IL unbox_any
        /// </summary>
        static void Unbox()
            object o = null;
            var val = (int) o;

        /// <summary>
        /// IL ldind
        /// </summary>
        static unsafe void LdInd()
            int* valA = null;
            var val = *valA;

        /// <summary>
        /// IL ldind
        /// </summary>
        static unsafe void StInd()
            int* valA = null;

            *valA = 3;

        static void LogNullReference(Action a)
            catch (NullReferenceException ex)
                var msg = string.Format("NullReferenceException executing {0} : {1}", a.Method.Name, ex.Message);

        static void Main(string[] args)
            while (!Console.KeyAvailable)







All 14 of them will give us the dreaded “Object reference not set to an instance of an object” message.

Now what happens if we attach a tracing tool that gets as much info as possible:

Attempted to throw an uninitialized exception object. In static void TestNullReference.Program::Throw() cil managed  IL 1/1 (reported/actual).
Attempted to call void TestNullReference.TestInterface::TestCall() cil managed  on an uninitialized type. In static void TestNullReference.Program::CallVirtIf() cil managed  IL 3/3 (reported/actual).
Attempted to call void TestNullReference.TestClass::TestCall() cil managed  on an uninitialized type. In static void TestNullReference.Program::CallVirtClass() cil managed  IL 3/3 (reported/actual).
Attempted to call void TestNullReference.TestClass::TestCall() cil managed  on an uninitialized type. In static void TestNullReference.Program::CallVirtBaseClass() cil managed  IL 3/3 (reported/actual).
Attempted to load elements of type System.Int32 from an uninitialized array. In static void TestNullReference.Program::LdElem() cil managed  IL 3/4 (reported/actual).
Attempted to load elements of type System.Int32 from an uninitialized array. In static void TestNullReference.Program::LdElemA() cil managed  IL 3/4 (reported/actual).
Attempted to store elements of type System.Int32 in an uninitialized array. In static void TestNullReference.Program::StElem() cil managed  IL 3/5 (reported/actual).
Attempted to get the length of an uninitialized array. In static void TestNullReference.Program::LdLen() cil managed  IL 3/3 (reported/actual).
Attempted to load non-static field int TestNullReference.TestClass::TestField from an uninitialized type. In static void TestNullReference.Program::LdFld() cil managed  IL 3/3 (reported/actual).
Attempted to load non-static field int TestNullReference.TestClass::TestField from an uninitialized type. In static void TestNullReference.Program::LdFldA() cil managed  IL 3/3 (reported/actual).
Attempted to store non-static field int TestNullReference.TestClass::TestField in an uninitialized type. In static void TestNullReference.Program::StFld() cil managed  IL 3/4 (reported/actual).
Attempted to cast/unbox a value/reference type of type System.Int32 using an uninitialized address. In static void TestNullReference.Program::Unbox() cil managed  IL 3/3 (reported/actual).
Attempted to load elements of type System.Int32 indirectly from an illegal address. In static void TestNullReference.Program::LdInd() cil managed  IL 4/4 (reported/actual).
Attempted to store elements of type System.Int32 indirectly to a misaligned or illegal address. In static void TestNullReference.Program::StInd() cil managed  IL 4/5 (reported/actual).

You can download and play with the tool already. Below I’ll shed some light on how this info can be obtained.

What the tracer does

One blog post is not enough to fully explain how to write a managed debugger. However, enough has been written about how to leverage the managed debugging API so for this post I’m going to assume we’ve attached a managed debugger to the target process, implemented debugger callback handlers, hooked them up and are handling exception callbacks.

The exception callback has the following signature:

HRESULT Exception (
    [in] ICorDebugAppDomain   *pAppDomain,
    [in] ICorDebugThread      *pThread,
    [in] ICorDebugFrame       *pFrame,
    [in] ULONG32              nOffset,
    [in] CorDebugExceptionCallbackType dwEventType,
    [in] DWORD                dwFlags

The actual exception can be obtained from the thread as an ICorDebugReferenceValue which can be dereferenced to an ICorDebugObjectValue of which you can ultimately get the ICorDebugClass and metadata token (mdTypeDef). To find out if this exception is a NullReferenceException, you can either look up this token using the metadata APIs, or compare it to a prefetched metadata token.

When we know we’re dealing with a 1st chance null reference exception, we can dig deeper and try to find out the offending IL instruction. From nOffset, we already have the IL offset in the method frame’s code. The code itself can be obtained by querying the ICorDebugFrame for an ICorDebugILFrame interface, and requesting it for its code (ICorDebugCode2), which has a method for retreiving the actual IL bytes.

Depending on the IL instruction we find at nOffset in the IL bytes, we can get various details and log them.

For the instructions that can throw:

  • callvirt: a call to a known instance method (mdMethodDef) on an uninitialized type
  • cpblk, cpobj, initblk: shouldn’t happen (not exposed by C#)
  • ldelem.<type>, ldelema, stelem.<type>: an attempt to load/store elements of a known type (mdTypeDef) from/to an uninitialized array
  • ldfld, ldflda, stfld: an attempt to load/store a known non-static field (mdFieldDef) of a known uninitialized type
  • ldind.<type>, stind.<type>: an invalid address was passed to the load instruction, or a misaligned address was passed to the store instruction (shouldn’t happen as this would be a compiler instead of user code bug)
  • ldlen: an attempt to get the length of an uninitialized array
  • throw: an attempt to throw an uninitialized exception object
  • unbox, unbox_any: an attempt to cast/unbox a value/reference type of a known type (mdTypeDef) using an uninitialized address

The various metadata tokens can be looked up using the metadata APIs mentioned before, and finally formatted into a nice message.

Creating an automatic self-updating process

Recently I was asked by a client to replace a single monolithic custom workflow engine with a more scaleable and loosely coupled modern alternative. We decided on a centralized queue which contained the work items and persisted them, with a manager (scheduler) on top which would accept connections of a dynamically scaleable number of processors which would request and then do the actual work. It’s an interesting setup in itself which relies heavily on dependency injection, Command-Query-Seperation, Entity Framework code first with Migrations for the database, and code first WCF for a strongly typed communication between the scheduler and its processors.

Since there would be many Processors without an administration of where they would be installed, one of the wishes was to make them self-update at runtime when new versions of the code would be available.

Detecting a new version

A key component of the design is for the processors to register themselves on the scheduler when they come start. In the same spirit, they could call to an updatemanager service periodically to check for updates. I implemented this by placing a version inside the processor primary assembly (in the form of an embedded resource). The update manager returns the current latest available version and download location. If this version is more recent than the built in version, the decision to update can be made.

This completes the easy part.


The problem with updating a process in-place at runtime, is that the operating system locks executable images (exe/dll) when they are mapped inside a running processes. So when you try to overwrite them, you get ‘file is in use by another process’ errors. The natural approach would therefore be to unload every non-OS library except the executable itself, followed by the overwrite action and subsequent reload.

In fact this works for native code/processes, however managed assemblies once loaded can not be unloaded. It therefore appears we are out of luck and can’t use this method. However, we have a (brute force) escape: while we can’t unload managed assemblies, we can unload the whole AppDomain they have been loaded into.

Updating: managed approach

The idea therefore becomes to spin up the process with almost nothing in the default AppDomain (which can never be unloaded), and from there spawn a new AppDomain with the actual Processor code. If an update is detected, we can unload, update, and respawn it again.

And still it didn’t work…the problem I ran into now is that somehow the default domain persisted in loading the one of the user defined assemblies. I loaded my new AppDomain with the following lines:

public class Processor : MarshalByRefObject
    AppDomain _processorDomain;

    public void Start()
       // startup code here...

    public static Processor HostInNewDomain()
        // Setup config of new domain to look at parent domain app/web config.
        var procSetup = AppDomain.CurrentDomain.SetupInformation;
        procSetup.ConfigurationFile = procSetup.ConfigurationFile;

        // Start the processor in a new AppDomain.
        _processorDomain = AppDomain.CreateDomain("Processor", AppDomain.CurrentDomain.Evidence, procSetup);

        return (Processor)domain.CreateInstanceAndUnwrap(Assembly.GetExecutingAssembly().FullName, typeof(Processor).FullName);

and in a seperate assembly:

public class ProcessorHost
    Processor _proc;

    public void StartProcessor()
        proc = Processor.HostInNewDomain();

There are several problems in this code:

  • the Processor type is used inside the default AppDomain in order to identify the assembly and type to spawn in there – this causes the assembly which contains the type to get loaded in the default domain as well.
  • after spawning the new AppDomain, we call into the Processor.Start() to get it going. For the remoting to work, the runtime generates a proxy inside the default domain to get to the Processor (MarshalByRefObject) in the Processor domain. It does so by loading the type from the assembly containing the Processor type and reflecting on that. I tried different approaches (reflection, casting to dynamic), but it seems the underlying mechanism to generate the proxy is always the same.

So what is the solution ? For one we can make it autostart by starting all the action in the constructor of the Processor. That way we don’t need to call anything to start the Processor, so the runtime doesn’t generate a proxy. Moreover, we can take a stringly typed dependency on the assembly and type. This will result in the code above to change to:

public class Processor : MarshalByRefObject
    public Processor()

    public void Start()
        // startup code here....

with in a seperate assembly:

public class ProcessorHost
    private const string ProcessorAssembly = "Processor, Version=, Culture=neutral, PublicKeyToken=null";
    private const string ProcessorType = "Processor.Processor";

    AppDomain _processorDomain;
    ObjectHandle _hProcessor;

    public void Start()
        // Setup config of new domain to look at parent domain app/web config.
        var procSetup = AppDomain.CurrentDomain.SetupInformation;
        procSetup.ConfigurationFile = procSetup.ConfigurationFile;

        // Start the processor in a new AppDomain.
        _processorDomain = AppDomain.CreateDomain("Processor", AppDomain.CurrentDomain.Evidence, procSetup);

        // Just keep an ObjectHandle, no need to unwrap this.
        _hProcessor = _processorDomain.CreateInstance(ProcessorAssembly, ProcessorType);

Communicating with the new AppDomain

Above I circumvented the proxy generation (and thereby type assembly loading in the default AppDomain) by kicking off the startup code automatically in the Processor constructor. However, this restriction introduces a new problem: since we cannot ever call into or out of the new domain by going through user defined types, as that would cause user defined assemblies to be locked in place, how then do we communicate to the parent/default domain an update is ready ?

For the moment I do this by writting AppDomain data in the Processor domain – AppDomain.SetData(someKey, someData) – and reading it periodically from the parent domain – AppDomain.GetData(someKey). It’s not ideal as it requires polling, but it at least works: I only use standard framework methods and types, and so the update works.


Down the rabbit hole: JIT Cache

Some weeks ago I was working on a heavyweight .NET web application and got annoyed by the fact that I needed to wait forever to see changes in my code at work. I don’t have a slow machine, and where the initial compilation of the .NET/C# was pretty fast, the loading – dominated by Just-In-Time compilation (JIT) – took forever. It routinely took 30-60 seconds to come up, and most of that time was spent on loading images and JITing them.

That the JIT can take time is a well known fact, and there are some ways to alleviate the burden. For one there is the Native Image Generator (NGen), a tool shipped with the .NET framework since v1.1. It allows you to pre-JIT entire assemblies and install them in a system cache for later use. Other more recent developments are the .NET Native toolchain (MSFT) and SharpLang / C# Native (community), which leverage the C++ compiler instead of the JIT compiler to compile .NET (store) apps directly to native images.

However great these techniques are, they are designed with the idea ‘pay once, profit forever’ in mind. A good idea for production releases, but they won’t solve my problem; if I change a single statement in my code and rebuild using these tools, it will increase the time I have to wait instead of reduce it due to the more extensive Common Intermediate Language (CIL) to native compilation.

The idea

An alternative for the scenario described above would be to keep a system global cache of JITted code (methods), only invoking the actual JIT for code that isn’t in the cache yet.


  • Interceptor: a mechanism to hook calls from the Common Language Runtime (CLR) to the JIT compiler. We need this to be able to introduce our own ‘business’ logic (caching mechanism) in this channel.
  • Injector: a mechanism to load the Interceptor into a remote .NET process. We need this to hook every starting .NET program: most JIT compilation is done during startup, so loading the interception code should take place at that time for maximum profit.
  • Cache: the actual business logic. A smart cache keeping track of already JITted code and validity.

Note: in this article I’m going to discuss the first 2 which are the technical challenge. The actual cache will be a topic of a future article, and in all honesty I’m not sure it’s going to work. The awesome involved in the first two items was worth the effort already.



In the desktop version of .NET, the CLR and JIT are two separate libraries – clr.dll and clrjit.dll/protojit.dll respectively – which get loaded in every .NET process. I started from the very simple assumption that the CLR calls into some function export of the clrjit library. When I checked out the public exports of clrjit though, there are only 2:


I called the getJit and got back something which was likely a pointer to some C++ class, but was unable to figure out what to do with it or to disseminate the methods and required arguments, so my best guess was googling for ‘jit’, ‘getJit’, ‘hook’ etc.

I found a brilliant article by Daniel Pistelli, who identified the returned value from clrjit!getJit as an implementation of Rotor‘s ICorJitCompiler. The Rotor project (officially: SSCLI) is the closest thing we non-MSFT people have to the sources of the native parts of the .NET ecosystem (runtime, jit, GC). However, MSFT made it very clear it was only a demonstration project: it wasn’t the actual .NET source. Moreover, the latest release is from 2006. In his article, Daniel found that he could use the Rotor source headers to work with the production .NET version of the JIT: the vtable in the .NET desktop implementation is more extensive, but the  first entries are identical.

For the full details I’ll refer you to his article, but once operational this is enough for us to intercept and wrap requests for compilation to the JIT compiler with our own function with the signature:

int __stdcall compileMethod(ULONG_PTR classthis, ICorJitInfo *comp, CORINFO_METHOD_INFO *info, unsigned flags, BYTE **nativeEntry, ULONG  *nativeSizeOfCode)


Most JIT compilation takes place during process start, so to not miss anything, we have to  find a way to hook the JIT in a very early stage. There are two methods I explored: for the first I self-hosted the CLR. Using the unmanaged hosting APIs it’s possible to write a native application in which you have much more control over process execution. For instance you can first load the runtime, which will automatically load the JIT compiler as well, insert your custom hook next, and only then start executing managed code. This will ensure you don’t miss a bit.

An example trace from a self-hosted CLR trivial console app with JIT hooking:


Note that the JIT is hooked before managed execution starts.

However, this method has a downside, namely that it will only work for processes we start explicitly using our native loader. Any .NET executable started in the regular way will escape our hooking code. What we really want is to load our hook at process start in every .NET process. For this we need a couple of things:

  1. process start notifications – to detect a new process start
  2. remote code injection – to load our hooking code into the newly started process

It turns out both are possible, but to do so we have to dive into the domain of kernel mode, and remote code injection. For fun, try and enter those keywords in a search engine and see how many references to ‘black hat’, ‘malicious’, ‘rootkit’, ‘security exploits’ etc you find. Clearly, the methods I want to use have some attraction on a whole different kind of audience as well.

Anyway, I still want it so down the rabbit hole we go.

1. Process start notifications

We can register a method for receiving process start/quit notifications by calling PsSetCreateProcessNotifyRoutine. This function is part of the kernel mode APIs, and to access them we have to write a kernel driver. When you download and install the Windows Driver Kit, which integrates with Visual Studio, you get standard templates for writing drivers, which I strongly advice you to use, because writing a driver from scratch is not especially hard, but it is very troublesome as any bug or bugcheck hit will make your process (a.k.a. the kernel) crash, so better to start from some tested scaffold. When testing the driver I did so in a new VM, which was a good foresight as I fried it a couple of times, making it completely unbootable.

Anyway, back to the code. To register the notification routine we have to call PsSetCreateProcessNotifyRoutine during Driver Entry:

HANDLE hNotifyEvent;
PKEVENT NotifyEvent = NULL;
unsigned long lastPid;

VOID NotifyRoutine(_In_ HANDLE parentId, _In_ HANDLE processId, _In_ BOOLEAN Create)

    if (Create)
        DbgPrint("Execution detected. PID: %d", processId);

        if (NotifyEvent != NULL)
            lastPid = (unsigned long)processId;
            KeSetEvent(NotifyEvent, 0, FALSE);
        DbgPrint("Termination detected. PID: %d", processId);


    // remove notify callback
    PsSetCreateProcessNotifyRoutine(NotifyRoutine, TRUE);

NTSTATUS DriverEntry(_In_ PDRIVER_OBJECT DriverObject, _In_ PUNICODE_STRING RegistryPath)
    // Create an event object to signal when a process started.
    DECLARE_CONST_UNICODE_STRING(NotifyEventName, L"\\NotifyEvent");
    NotifyEvent = IoCreateSynchronizationEvent((PUNICODE_STRING)&NotifyEventName, &hNotifyEvent);
    if (NotifyEvent == NULL)

    // boiler plate code omitted

    WdfDriverCreate(DriverObject, RegistryPath, &attributes, &config, WDF_NO_HANDLE);

    PsSetCreateProcessNotifyRoutine(NotifyRoutine, FALSE);

    DriverObject->DriverUnload = OnUnload; // omitting this will hang the system

You can see we also register a kernel event object which will be signaled every time a process start notification is received, as we will need this later.

Once process start  notifications were going, I explored ways to also do part 2 (remote code injection) from kernel mode, but decided against it for two reasons: the kernel mode APIs, while offering some very powerful and low level access to the machine and OS, are very limited (you cannot access regular win32 APIs), so it’s much easier and faster to develop in user mode. And second, I got bored of restoring yet another fried VM.

1b. Getting notifications to user mode

So I needed an always running component in user mode which communicates with the kernel mode driver: the ideal use case for a Windows service. By default kernel driver objects aren’t accessible in user mode. To access it you have to expose them in the (kernel) object directory as a ‘Dos Device’. Adding a symbolic link like \DosDevices\InterceptorDriver to the actual driver object – using WdfDeviceCreateSymbolicLink – is sufficient to access it by name in user mode (full path: \\.\InterceptorDriver).

Just open it like a file:

HANDLE hDevice = CreateFileW(
    L"\\\\.\\InterceptorDriver", // driver to open
    0,                           // no access to driver
    NULL,                        // default security attributes
    OPEN_EXISTING,               // disposition
    0,                           // file attributes
    NULL);                       // do not copy file attributes

For the actual communication the preferred way is using IOCTL: in user mode you can send an IO control code to the driver:

DWORD junk = 0;
BOOL bResult = DeviceIoControl(hDevice, // device to be queried
               IOCTL_PROCESS_NOTIFYNEW, // operation to perform
               NULL, 0,                 // no input buffer
               &pId, sizeof(pId),       // output buffer
               &junk,                   // # bytes returned
               (LPOVERLAPPED)NULL);     // synchronous I/O

The driver itself has to handle the code:

VOID InterceptorDriverEvtIoDeviceControl(_In_ WDFQUEUE Queue, _In_ WDFREQUEST Request, _In_ size_t OutputBufferLength, _In_ size_t InputBufferLength, _In_ ULONG IoControlCode)
    size_t bytesReturned = 0;
    switch (IoControlCode)
        if (NotifyEvent == NULL)

        // Set a finite timeout to allow service shutdown (else thread is stuck in kernel mode).
        LARGE_INTEGER timeOut;
        timeOut.QuadPart = -10000 * 1000; // 100 ns units.
        status = KeWaitForSingleObject(NotifyEvent, Executive, KernelMode, FALSE, &timeOut);

        if (status == STATUS_SUCCESS)
            unsigned long * buffer;
            if (NT_SUCCESS(WdfRequestRetrieveOutputBuffer(Request, sizeof(lastPid), &buffer, NULL)))
                *buffer = lastPid;
                bytesReturned = sizeof(lastPid);
    WdfRequestCompleteWithInformation(Request, status, bytesReturned);

and in the driver IO queue setup register this method:

NTSTATUS InterceptorDriverQueueInitialize(_In_ WDFDEVICE Device)
    WDFQUEUE queue;
    NTSTATUS status;
    WDF_IO_QUEUE_CONFIG queueConfig;


    queueConfig.EvtIoDeviceControl = InterceptorDriverEvtIoDeviceControl;
    queueConfig.EvtIoStop = InterceptorDriverEvtIoStop;

    status = WdfIoQueueCreate(Device,&queueConfig,WDF_NO_OBJECT_ATTRIBUTES,&queue);
    if( !NT_SUCCESS(status) )
        TraceEvents(TRACE_LEVEL_ERROR, TRACE_QUEUE, "WdfIoQueueCreate failed %!STATUS!", status);
        return status;
    return status;

The mechanism we have here is a sort of ‘long-polling’ of the kernel driver: the service sends an IOCTL code to the driver, and the driver pauses the thread on an event which is signaled every time a process is started. Only then does the thread return to usermode, with in its output buffer the ID of the process. To allow for windows service shutdown, it’s advisable to wait for the event with a timeout (and poll again if it returned due to this timeout), otherwise the thread will be stuck in kernel mode until you start one more process – making service shutdown impossible.

2. Remote code injection

We are back in user mode now, and we can run code once a process starts. The next step is to somehow load our JIT hooking code in every new (.NET) process, and make it start executing. There are a couple of ways in which you can do this, and most involve hacks around CreateRemoteThread. This Win32 function allows a process to start a thread in the address space of another process. The challenge is how to get the process to load our hooking code. There are 2 approaches which both require writing into the remote process memory before calling CreateRemoteThread:

  • write the hooking code directly in the remote process, and call CreateRemoteThread with an entry point in this memory
  • compile our hooking code to a dll, and only write the dll name to the remote process memory. Then call CreateRemoteThread with the address of kernel32!LoadLibrary with its argument pointing to the name

As I want to be able to hook the JIT in 32 as well as 64 bit processes, I have to compile 2 versions of the hooking code anyway. For the sake of code modularity and seperation of concerns I opted for the second way, so the simple recipe I took is:

  • A. Write a dll which on load executes the hooking code, and compile it in 2 flavors (32/64 bit).
  • B. In the Windows service, on process start notification, use CreateRemoteThread + LoadLibrary to load the correct flavor of the dll in the target
A. Auto executing library

This is quite easy, but you have to beware the dragons. A dll has a DllMain entry point with signature:

BOOL APIENTRY DllMain( HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved); 

This entry point is called when (specified in ul_reason_for_call) the dll is first loaded or unloaded, or on thread creation/destruction. The thing to beware for is written in the Remarks section: “Access to the entry point is serialized by the system on a process-wide basis. Threads in DllMain hold the loader lock so no additional DLLs can be dynamically loaded or initialized.”. In other words: you can not load a library in code that runs in the context of DllMain.

Why is this a problem for us ? The hooking code has to query the .NET shim library (mscoree.dll) to find out if and which .NET runtimes are loaded in the process. Since there is no a priori way to know for sure the shim library is already loaded when we try to get a module handle, our hooking code may trigger a module load and so a deadlock.

The fix is easy: just start a new thread in the DllMain entrypoint and make that thread query the shim library. This thread will start execution outside the current loader lock.

B. CreateRemoteThread + LoadLibrary

I will skip over most details here as it’s described in much detail in various articles, however there are some things to beware of when cross injecting from the 64 bit service to a 32 bit process. The steps in the ‘regular’ procedure are:

  1. Get access to the process
  2. Write memory with name of hooking Dll
  3. Start remote thread with entrypoint kernel32!LoadLibrary
  4. Free memory

Most of these are straightforward, but there is a problem in cross injecting in step 3, and more specifically in finding the exact address to call.

When injecting in a same bitness architecture, this is easy as we can use a trick: kernel32 is loaded into every process at the same virtual address. This address can change, but only at reboot. Using this trick, we can:

  1. Get the module handle (virtual address) of the kernel32 module in the injecting process – it will be identical in the remote process
  2. Call kernel32!GetProcAddress to find the address of LoadLibrary

When injecting cross bitness, we have 2 problems: the kernel32 loading address is different for 64 and 32 bit, and we can not use kernel32!GetProcAddress on our 64 bit kernel module to find the address in the 32 bit one. To fix this, I replaced the steps above for this scenario by:

  1. Use PSAPI and call EnumProcessModulesEx on the target process, with the explicit LIST_MODULES_32BIT option (there are also 64 modules in a 32 bit process, go figure), get their names (GetModuleBaseName) to find kernel32, and when found get the module address from GetModuleInformation
  2. Use ImageHlp’s MapAndLoad and extract the header information from the PE header of the 32 bit kernel32. Find the export directory and together with the name directory find the RVA of LoadLibrary ourselves (Note: the RVAs in the PE are the in memory RVAs. On disk layout of a PE is different, you can use the section info header to correlate the two). Add this to the number from step 1 to find the VA of kernel32!LoadLibrary

Working setup

A DbgView of the loading and injection in both flavors of .NET processes (32 and 64 bit):



Note: I strive to put the full code out there eventually. But it may take some time.


Contract first WCF service (and client)

There seems to be a popular misconception stating WCF is obsolete/legacy, supposedly because it has been replaced by new techniques like ASP.NET WebAPI. For sure: since WebAPI was created to simplify the development of very specific – but extensively used – HTTP(S) REST services on port 80, it’s clear if you are developing a service exactly like that, you would be foolish to do so in WCF (since it’s just extra work).

However, to state a technology is obsolete because there is a better alternative for a very specific use case when there are over 9000 use cases which no other tech addresses is silly. Your personal bubble might be 100% web, but that doesn’t mean there is nothing else: WCF – unlike WebAPI – is a communications abstraction. Its strong point is versatility: if I want my service to use HTTP today and named pipes or message queueing tomorrow, I can do so with minimum effort.

It’s in this versatility where we hit the second misconception: ‘WCF is too hard’. I have to admit I spent quite some days cursing at my screen over WCF. Misconfigured services, contracts, versioning, endpoints, bindings, wsdls, etc. there is a lot you can do wrong, and every link in the chain has to be working before you can stop cursing.

However, that is all with WCF done the classic way, but none of that stuff is necessary if you do it the right way, which is contract first without much/any configuration.

WCF the right way™: contract first

The basic ingredients you need are:

  • a service contract in a shared library
  • the service: an implementation of the contract
  • some testing code (client)

And that’s it. I’ll give a minimal implementation of this concept to show the power and simplicity of this approach below. Did I mention no configuration ?

The service contract

using System.ServiceModel;

namespace Contracts
    public interface IContractFirstServiceContract
        bool Ping();

It’s important you place this contract/interface in a seperate – shared – assembly so both the service implementations as well as the client(s) can access them.

The service implementation

using System;
using Contracts;

namespace Service
    public class ContractFirstService : IContractFirstServiceContract
        public bool Ping()
            return true;

That’s all, and the service can now be hosted. This can be done in IIS with a svc, or by self hosting it (in a console app). In any case, we need a ServiceHost implementation, which I’ll place in the service project:

using System;
using System.ServiceModel;
using System.ServiceModel.Channels;
using Contracts;

namespace Service
    public class ContractFirstServiceHost : ServiceHost
        public ConstractFirstServiceHost(Binding binding, Uri baseAddress)
            : base(typeof(ContractFirstService), baseAddress)
            AddServiceEndpoint(typeof(IContractFirstServiceContract), binding, baseAddress);

The client (testing the service)

In the following piece of code I’ll self-host the service, open a client and test the WCF call roundtrip.

using System;
using System.ServiceModel;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Contracts;
using Service;

namespace Tests
    public class IntegrationTest
        public void TestClientServer()
            var binding = new BasicHttpBinding();
            var endpoint = new EndpointAddress("http://localhost/contractfirstservice");

            // Self host the service.
            var service = new ContractFirstServiceHost(binding, endpoint.Uri);

            // Create client.
            var channelFactory = new ChannelFactory(binding, endpoint);
            var client = channelFactory.CreateChannel();

            // Call the roundtrip test function.
            var roundtripResult = client.Ping();


There you have it: a contract first WCF service in a few lines.


Why so many IT projects fail

In the first year of my physics education at TU Delft, I had a mandatory course labelled ‘ethics of engineering and technology’. As part of the accompanying seminar we did a case study based on the Space Shuttle Challenger disaster with a small group of students.
If you don’t know what went wrong: in a nutshell, some critical parts – O-rings – were more likely to fail at low temperatures (they had before), and the conditions of this particular launch were especially harsh for them. The engineers, knowing about the issue, wanted to postpone or abort the launch but were pressurized into not digging their heels in by senior NASA executives.
In the seminar, some students were engineers (NASA and Morton Thiokol), some executives, and of course the end result was a stalemate followed by an executive decision … and a disastrous launch. Nobel prize winning physicist Richard Feynmann later demonstrated what went wrong technically (by submerging the O-ring in ice cold water before the committee), as well as non-technically, with a classic Feynmann quote: “reality must take precedence over public relations, for nature cannot be fooled”.

So what does this have to do with software and IT ?

Everything. Every year human lives and IT become more entangled. The direct consequence is that technical problems in IT can have profound effects on human lives and reputations. With it comes increasing responsibility for the technicians and executives, be it developers, engineers, infrastructure technicians, database administrators, security consultants, project managers or higher level management (CxO).

The examples of this going wrong are numerous. Most often this results in huge budget deficits, but increasingly it leads to more disturbing data leaks (or worse: tampering) and the accompanying privacy concerns.
The most striking cases have been the malfunctioning US healthcare insurance marketplace, which resulted in millions of people who were involuntarily uninsured, or closer to home, the Dutch tax agency which spent 200 million euros on a failed project, or the Dutch government which spends 4-5 billion every year on failed IT projects. I didn’t even mention the National Electronic Health Record yet (Dutch: ‘electronisch patienten dossier’), which is still in an unclear state and ITs own Chernobyl waiting to happen.

And these examples are the published – public – cases: most enterprises with failing IT projects won’t tell you about them, because why would they, since it’s defacing at best (yet another dataleak) ? From personal experience, I’ve seen solutions which were so badly designed people could easily and/or accidentally access the lives of colleagues (HR systems) or find out a case was being made for their firing (before they were told).

So how is this possible ?

Before I answer that question, let me start with two disclaimers. First, in this post I want to focus on technical failure, meaning even though the project can be a project management success (delivered in time and under budget), it can still be a technical timebomb waiting to go off (like the Challenger). Second, a lot has been written about failing IT projects already, and this is by no means the definite story. It is however a developers point of view.

With that out of the way, let’s answer the question: in my perspective this is the result of a lack of professional attitude and transparency on all fronts: by the creative people (developers/engineers), executives, and their higher level management.

A good developer or engineer has a constructive mindset: if at all possible he will not unthinkingly accept what he is told but instead guide his customer (his manager, functional designer or some 3rd party) through their functional requirements and tell them the impact and consequences of different approaches, and suggests alternatives in case those are much more friendly for the technical back-end (either directly or in terms of future maintenance).
However, it can happen that the functional wishes are just not feasible in the framework you have been using and there is no real alternative (except starting from scratch). Depending on where you are, going ahead and ‘doing it anyway’ may lead to negative effects on human lives, and ultimately on the company/government you work for. In cases like this, the professional engineer will dig their heels in. Not because they are lazy and don’t want to do the work. Not because they are assholes or out of spite. But because it’s the professional thing to do.

A good executive expects their people to display this exact behavior: yes they expect a ‘can do’ mentality and for them to ‘make it work’, but by choosing the right solution and telling them when an idea is really bad, not by unthinkingly saying ‘yes sir/mam’ to everything. The good executive will either make sure the requirement goes away, or communicate a clear message to their higher ups when such functional requests lead to delays instead of pressurizing their people to ‘just do it’ and ‘make sacrifices’.

Companies which recruit developers and engineers primarily based on price – and see them as ‘production workers’ – can’t expect to get the kind of critical professionals described above. I’m not sure if companies with such hiring practices have good executives who just lack the awareness of what this results in, or also lack the good executives. However, the end result is that they will recruit the ones just looking for quick dough, and consequently – out of fear – always say yes without being clear about the risks (if they even see them). In fact, they might even make it work on short notice, but your company ends up with a technically weakened product and some serious risks. Word of warning: this doesn’t mean the opposite is true; highly paid technicians are not necessarily good professionals.

Finally, a good higher level management should cultivate a culture of transparency and open communication. If there is a good reason an ambitious goal is not reached, all too often higher level management (under pressure themselves from investors and/or public opinion) turns a blind eye or feign ignorance and indiscriminately punish the departments and project managers for not reaching them, even when those in question reported early this may be the outcome. This behavior will instill a sense of fear of transparency and cultivate a culture in which everyone will, down the whole chain, just make it work, regardless of future risks.

TraceCLI: a production debugging and tracing tool

Ever had this application that just crashed or malfunctioned in a Production environment ? And whatever you tried, it wasn’t reproducible in Test ? Or perhaps it was just slow in serving some pages (web application), and you wanted to see the actual SQL your Object Relational Model (ORM) generated and sent to the database ?

Well, there are some options.

For one, you could add more logging. Typically though, Murphy’s Law interferes and even though you think you log ‘everything’, you find out  when you really need it you forgot about that one spot, or need more detail (like argument or field values). Changing the code and building a new release, not to mention get it through Test, Acceptance and into Production is not really an attractive option here.

An alternative is to use WinDbg (+ SOS) on a crashdump obtained using a GUI app like DebugDiag or a command line app like ProcDump. For real crashes this ‘post mortem debugging’ option is often sufficient: you’ll have a full memory dump to work with taken at the exact time of the crash so all the details are there. However, using WinDbg is a pretty hardcore skill not every (.NET) developer has, and for more subtle malfunctions – for instance your application not behaving nicely with other systems – or in case of performance issues, you typically want to inspect a live target anyway.

And here it becomes problematic: I probably don’t have to tell you attaching a debugger to a live production process and stepping through it is not a good idea. Although even without WinDbg there’s still a large set of tools at your disposal (SQL tracing/profiling, performance counters, ETW tracing), those are mostly indirect and slow to setup: you’re watching symptoms instead of the real disease.

What I really wanted was a tool which I can attach to a live target without disrupting it, and in which I can specify some classes, methods or even namespaces to monitor (hitcounts, timings). Perhaps I even want to dump some (private) fields of a class when a method is hit. That sort of thing.

It didn’t exist (or I couldn’t find it), and I was searching for a challenge anyway, so I made one myself using native C++ and the debugging APIs.

The tool – for now named TraceCLI – allows you to live attach to a process (x86 or x64), and instrument the namespaces, classes, and methods you want. On hitting them, you can either just monitor their hitcounts or do more advanced stuff like measure the time in the method (for now including calls to other methods), as well as dump fields of the class the method is a member of on entering the method. For simple much used scenarios, there are presets (like SQL tracing) and for very advanced scenarios or multiple hit filter settings, you can also provide a configuration with an xml config file.

Although ultimately the goal is to use it on production processes, it’s still very alpha, both in terms of stability as well as feature set. So please use at your own risk.

Example: SQL tracing

For example, to trace SQL queries manually (without a preset), we could use the following commandline:


which specifies we want to dump the field ‘_commandText’ of a class with a method ‘RunExecuteReader’ upon entering the method.

During startup, TraceCLI attaches to the process and checks all loaded assemblies for such methods, and instruments them:

14:33:53.588 TraceCLI (x86) v0.1 - Ruurd Keizer
14:33:53.588 ### PARAMETERS
14:33:53.589 Filter #1: namespace (null), class (null), method RunExecuteReader.
14:33:53.589   Fields dumped on entry: _commandText
14:33:53.658 ### ATTACH
14:33:53.659 Attaching to process with pId 5320: C:\Program Files (x86)\IIS Express\iisexpress.exe
14:33:53.738 Processing filter...
14:33:53.739 Searching domain: DefaultDomain
14:33:53.739 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Configuration\v4.0_4.0.0.0__b03f5f7f11d50a3a\System.Configuration.dll
14:33:53.755 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Core\v4.0_4.0.0.0__b77a5c561934e089\System.Core.dll
14:33:53.800 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\Microsoft.Build.Utilities.v4.0\v4.0_4.0.0.0__b03f5f7f11d50a3a\Microsoft.Build.Utilities.v4.0.dll
14:33:54.077 Searching domain: /LM/W3SVC/4/ROOT-1-130487604823520283
14:33:54.078 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Data.Linq\v4.0_4.0.0.0__b77a5c561934e089\System.Data.Linq.dll
14:33:54.093 Searching assembly: C:\Windows\Microsoft.NET\Framework\v4.0.30319\Temporary ASP.NET Files\root\4ecd037f\b8cc2ad2\assembly\dl3\2c89eab9\1cd35552_3295cf01\Solvay.Common.dll
14:33:54.094 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\Microsoft.Web.Infrastructure\v4.0_1.0.0.0__31bf3856ad364e35\Microsoft.Web.Infrastructure.dll
14:33:54.796 Searching assembly: C:\Windows\Microsoft.NET\Framework\v4.0.30319\Temporary ASP.NET Files\root\4ecd037f\b8cc2ad2\assembly\dl3\5283b6cf\d83f43ef_2d8acf01\System.Web.Http.WebHost.dll
14:33:54.798 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
14:33:54.902 Found 4 methods satisfying the filters
14:33:54.902 Found method: internal System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReader(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.903 Found method: internal System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReader(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.903 Found method: private System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReaderTds(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.903 Found method: private System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReaderSmi(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.904 Activating all breakpoints

While the tool is attached it dumps SQL to the log as it executes it:

14:33:54.904 ### TRACING
14:34:00.617 Field: private string System.Data.SqlClient.SqlCommand::_commandText, Value: SELECT
    [GroupBy1].[A1] AS [C1]
        COUNT(1) AS [A1]
        FROM  [dbo].[T_Computer] AS [Extent1]
        INNER JOIN [dbo].[T_OS] AS [Extent2] ON [Extent1].[HWComputerIndex] = [Extent2].[ComputerIndex]
    )  AS [GroupBy1]
14:34:00.626 Field: private string System.Data.SqlClient.SqlCommand::_commandText, Value: SELECT
    [GroupBy1].[A1] AS [C1]
        COUNT(1) AS [A1]
        FROM  [dbo].[T_Computer] AS [Extent1]
        INNER JOIN [dbo].[T_OS] AS [Extent2] ON [Extent1].[HWComputerIndex] = [Extent2].[ComputerIndex]
    )  AS [GroupBy1]
14:34:00.635 Field: private string System.Data.SqlClient.SqlCommand::_commandText, Value: SELECT
    [GroupBy1].[A1] AS [C1]
        COUNT(1) AS [A1]
            [Extent1].[SlaSID] AS [SlaSID],
            [Extent1].[SlaInfo] AS [SlaInfo],
            [Extent1].[SLaTeam] AS [SLaTeam],
            [Extent1].[SlaSource] AS [SlaSource]
            FROM [dbo].[T_SID_SLA_DRP] AS [Extent1]
            WHERE ( NOT ((N'int' = [Extent1].[SLaTeam]) AND ([Extent1].[SLaTeam] IS NOT NULL))) AND ( NOT ((N'wintel' = [Extent1].[SlaSource]) AND ([Extent1].[SlaSource] IS NOT NULL)))
        )  AS [Distinct1]
    )  AS [GroupBy1]
mcg_implemented elsewhere

Compile time marshalling

In one of my posts about managed/unmanaged interop in C# (P/Invoke), I left you with the promise of answering a few questions, namely: can we manually create our own marshalling stubs in C# (at compile time), and can they be faster than the runtime generated ones ?

A bit of background

It’s funny that when I raised these questions back in March, I was still unaware of .NET Native and ASP vNext which were announced by Microsoft in the following months. The main idea behind these initiatives is to speed up especially the startup time of .NET code on resource constrained systems (mobile, cloud).
For instance, while traditionally on desktop systems intermediate language (IL) in .NET assemblies is compiled to machine code at runtime by the Just-In-Time Compiler (JIT), .NET Native moves this step to compile time. While this has several advantages, a direct consequence of the lack of runtime IL compilation is that we can’t generate and run IL code on the fly anymore. Even though not much user code uses this, the framework itself critically depends on this feature for interop marshalling stub generation. Since it is no longer available in .NET Native, this phase had to be moved to compile time as well. In fact, this step – called Marshalling and Code Generation (MCG) is one of the elements of the .NET Native toolchain. By the way, .NET Native isn’t the first project which has done compile time marshalling. For example, it has been used for a long time in the DXSharp project.

The basic concepts are always the same: generate code which marshals the input arguments and return values, and wrap it around a calli IL instruction. Since the C# compiler will never emit a calli instruction, this actual call will always have to be implemented in IL directly (or the compiler will have to be extended, recently possible with Roslyn). Where the desktop .NET runtime (CLR) emits the whole marshalling stub in IL, the MCG generated code is C# so it requires a seperate call to an IL method with the calli implementation. If you drill down far enough in the generated sources for a .NET Native project, in the end you’ll find something like this (all other classes/methods omitted for brevity):

internal unsafe static partial class Interop
    private static partial class McgNative
        internal static partial class Intrinsics
            internal static T StdCall(IntPtr pfn, void* arg0, int arg1)
                // This method is implemented elsewhere in the toolchain
                return default(T);

Note the giveaway comment ‘this method is implemented elsewhere in the toolchain’, which you can read as ‘this is as far as we can go with C#’, and which indicates that some other tool in the .NET Native chain will emit the real body for the method.

DIY compile time marshalling

So what would the .NET Native ‘implemented elsewhere’ source look like, or: how can we do our own marshalling ? To call a native function which expects an integer argument (like the Sleep function I used in previous posts), first we would need to create an IL calli implementation which takes the address of the native callsite  and the integer argument:

.assembly extern mscorlib {}
.assembly CalliImpl { .ver 0:0:0:0 }
.module CalliImpl.dll

.class public CalliHelpers
    .method public static void Action_uint32(native int, unsigned int32) cil managed
        calli unmanaged stdcall void(int32)

If we feed it the address of the Sleep function in kernel32 (using LoadLibrary and GetProcAddress, which we ironically invoke through P/Invoke…), we can see the CalliHelper method on the managed stack instead of the familiar DomainBoundILStubClass. In other words, compile time marshalling in action:

Child SP IP Call Site
00f2f264 77a9d4bc [InlinedCallFrame: 00f2f264]
00f2f260 010b03e4 CalliHelpers.Action_uint32(IntPtr, UInt32)
00f2f290 010b013b TestPInvoke.Program.Main(System.String[])
00f2f428 63c92652 [GCFrame: 00f2f428]

This ‘hello world’ example is nice but ideally you would like to use well tested code. Therefore, I wanted to try and leverage the MCG from .NET Native, but it turned out to be a bit more work than I anticipated as you need to somehow inject the actual IL calli stubs to make the calls work. So perhaps in a future blog.

What about C++ interop ?

There seems to be a lot of confusion around this type of interop: some claim it to be faster, some slower. In reality it can be both depending on what you do. The C++ compiler understands both types of code (native and managed), and with it comes its main selling point: not speed but type safety. Where in C# the developer has to provide the P/Invoke signature, including calling convention and marshalling of the arguments and return values, the C++ compiler knows this already from the native header files. Therefore, in C++/CLI you simply include the header and if necessary (you are in a managed section) the compiler does the P/Invoke for you implicitly.


using namespace System;

int main(array ^args)
    Console::WriteLine(L"Press any key...");
    while (!Console::KeyAvailable)
    return 0;

Sleep is an unmanaged function included from Windows.h, and invoked from a managed code body. From the managed stack in WinDbg you can see how it works:

00e3f16c 00fa2065 DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)
00e3f170 00fa1fcc [InlinedCallFrame: 00e3f170] .Sleep(UInt32)
00e3f1b4 00fa1fcc .main(System.String[])
00e3f1c8 00fa1cff .mainCRTStartupStrArray(System.String[])

As you can see, there is again a marshalling stub, as in C#, it is however generated without developer intervention. This alone should be reason enough to use C++/CLI in heavy interop scenarios, but there are more advantages. For instance, the C++ compiler can optimize away multiple dependent calls across the interop boundary, making the whole thing faster, or can P/Invoke to native C++ class instance functions, something entirely impossible in C#. It moreover allows you to apart from depending on external native code, create ‘mixed mode’ or IJW (It Just Works) assemblies which contain native code as well as the usual managed code in a self contained unit.
Despite all this, the P/Invoke offered by C++/CLI still leverages the runtime stub generation mechanism, and therefore, it’s not intrinsically faster than explicit P/Invoke.

Word of warning

Let me end with this: the aim of this post is to offer an insight in the black box called interop, not as a promotion for DIY marshalling. If you find yourself in need of creating your own (compile time) marshalling stubs for faster interop, chances are you are doing something wrong. Especially for enterprise/web development it’s not very likely the interop itself is the bottleneck. Therefore, focussing on improving the interop scenario yourself – instead of letting the .NET framework team worry about it – is very, very likely a case of premature optimization. However, for game/datacenter/scientific scenarios, you can end up in situations where you want to use every CPU cycle efficiently, and perhaps after reading this post you’ll have a better idea of where to look.


Why developers don’t make millions of dollars

Let’s start with a riddle: what does a developer do to unwind at the end of a frustrating day at the office ?
Answer: he goes home and takes some time to write more code.

So, does this mean all developers are workaholics ? Highly questionable: why would there be more workaholics in development than in other professions? So…do they do it because they are socially awkward and have nothing fun to do ? Hmmm, I know the stereotype, but most developers I know have families, friends, love sports, parties, music and beer like any other human being out there. So that can’t be it.
So why do they do it ? A cynical answer might be: working on side projects improves your resume. While it’s definitely true that doing extra stuff can improve your resume, this line of reasoning is confusing cause and effect: employers didn’t somehow invent working on side projects is a bonus. Instead, it’s an intrinsic drive many great developers have, and as a result employers decided to include checking for side projects in their hiring practices.

So in fact the simple truth is…**drumroll**…great developers love what they do.

However, in the typical day job there is always external pressure saying you can only spend this many hours on it, or go in some direction that is politically correct but aesthetically or technically wrong. While some developers learn to appreciate this game of thrones forces playing a role and migrate to management positions, this is still a distraction from the thing they REALLY love: simply playing with new technologies and writing awesome code.

“Hey, I came here to read about earning millions of dollars, what’s the deal?” Patience my young padawan, the side project was important, because it sets the stage for an essential first conclusion, which is: Developers are artists (craftsman if you like). Why do I say this ? Because developers share this single very important trait with ‘real’ artists: in their off-time, they still do the same thing without direct benefits to their day job, but this time in complete freedom. Developers are not unique in this regard, other professionals sharing this characteristic include scientists and professional athletes.

So why is it that we don’t share that other very important trait: getting paid millions of $ ? Is it because we are not making things for the masses like rock stars and football heroes ? On the contrary: the very core of our everyday society is made by developers. From the apps on your smartphone, the electronics in your car, the systems wiring your paycheck every month, to the internet itself, all made by developers. These days software is as fundamental as power and food. Of course in order to run software needs a hardware infrastructure: the chips in the smartphones and servers, and on a larger scale the buildings they sit in and batteries powering them. These are however like the stadiums and roads you need for sports. The actual athlete creating the content are the artists: developers and designers.

So again, why aren’t these guys paid big wads of cash ? Some say it’s because professional athletes, musicians and movie stars only have a limited number of years to earn enough until they retire. While that line of thought is very social, it’s equally false. The markets don’t care for those guys any more than they care for any other human being, and club owners and record labels would very much like to pay them less.

No, the real answer starts with lack of visibility. Even with my untrained eye I can see one striker is making a more beautiful goal than the next, or a goalie gets an impossible save. To a lesser degree, I can differentiate great art from kitsch. It is however impossible to identify great scientists without being one yourself, and the same is true for developers. This can lead to only one outcome: most companies, because they can’t/won’t differentiate skill and dedication and see developers as production workers instead of creative professionals, will let new hires compete on price instead of quality. So they hire the script kiddies, googlecoders, and ‘fake it till you make it’ programmers. Some time after, because the ‘production workers’ aren’t delivering, they hire an extra management layer – most likely an ‘Agile guru’ with zero technical training – to put them to work.

And here we have the real problem: even if a great developer manages to pass this indiscriminate hiring wall, and turns out to be a real asset, between him and the CEO there are 9001 layers of management, each of which has to earn more than the last, irrespective of their actual contribution to the company. In this sense the position of developers is much like the one of coffee bean farmers, in that a very small percentage of the money reaches the ones creating the actual product. This still wouldn’t be a problem if development was a more visible art, like sports or music. Developers would be recognized as the creative force and the discrepancy would be impossible to sell to a larger audience.

So how can we fix it ? I’m not sure. Teaching everyone to code is a good initiative for a number of reasons, but it’s highly unlikely this will ever solve the ‘untrained eye’ problem at the levels we are talking about. It also will take ages before the effects will be seen. A more short term solution would be to put some developers on a pedestal: not just before other developers but for a large audience. Any rockstar coders out there willing to take the spotlights ? :)