Contract first WCF service (and client)

There seems to be a popular misconception stating WCF is obsolete/legacy, supposedly because it has been replaced by new techniques like ASP.NET WebAPI. For sure: since WebAPI was created to simplify the development of very specific – but extensively used – HTTP(S) REST services on port 80, it’s clear if you are developing a service exactly like that, you would be foolish to do so in WCF (since it’s just extra work).

However, to state a technology is obsolete because there is a better alternative for a very specific use case when there are over 9000 use cases which no other tech addresses is silly. Your personal bubble might be 100% web, but that doesn’t mean there is nothing else: WCF – unlike WebAPI – is a communications abstraction. Its strong point is versatility: if I want my service to use HTTP today and named pipes or message queueing tomorrow, I can do so with minimum effort.

It’s in this versatility where we hit the second misconception: ‘WCF is too hard’. I have to admit I spent quite some days cursing at my screen over WCF. Misconfigured services, contracts, versioning, endpoints, bindings, wsdls, etc. there is a lot you can do wrong, and every link in the chain has to be working before you can stop cursing.

However, that is all with WCF done the classic way, but none of that stuff is necessary if you do it the right way, which is contract first without much/any configuration.

WCF the right way™: contract first

The basic ingredients you need are:

  • a service contract in a shared library
  • the service: an implementation of the contract
  • some testing code (client)

And that’s it. I’ll give a minimal implementation of this concept to show the power and simplicity of this approach below. Did I mention no configuration ?

The service contract

using System.ServiceModel;

namespace Contracts
    public interface IContractFirstServiceContract
        bool Ping();

It’s important you place this contract/interface in a seperate – shared – assembly so both the service implementations as well as the client(s) can access them.

The service implementation

using System;
using Contracts;

namespace Service
    public class ContractFirstService : IContractFirstServiceContract
        public bool Ping()
            return true;

That’s all, and the service can now be hosted. This can be done in IIS with a svc, or by self hosting it (in a console app). In any case, we need a ServiceHost implementation, which I’ll place in the service project:

using System;
using System.ServiceModel;
using System.ServiceModel.Channels;
using Contracts;

namespace Service
    public class ContractFirstServiceHost : ServiceHost
        public ConstractFirstServiceHost(Binding binding, Uri baseAddress)
            : base(typeof(ContractFirstService), baseAddress)
            AddServiceEndpoint(typeof(IContractFirstServiceContract), binding, baseAddress);

The client (testing the service)

In the following piece of code I’ll self-host the service, open a client and test the WCF call roundtrip.

using System;
using System.ServiceModel;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Contracts;
using Service;

namespace Tests
    public class IntegrationTest
        public void TestClientServer()
            var binding = new BasicHttpBinding();
            var endpoint = new EndpointAddress("http://localhost/contractfirstservice");

            // Self host the service.
            var service = new ContractFirstServiceHost(binding, endpoint.Uri);

            // Create client.
            var channelFactory = new ChannelFactory(binding, endpoint);
            var client = channelFactory.CreateChannel();

            // Call the roundtrip test function.
            var roundtripResult = client.Ping();


There you have it: a contract first WCF service in a few lines.


Why so many IT projects fail

In the first year of my physics education at TU Delft, I had a mandatory course labelled ‘ethics of engineering and technology’. As part of the accompanying seminar we did a case study based on the Space Shuttle Challenger disaster with a small group of students.
If you don’t know what went wrong: in a nutshell, some critical parts – O-rings – were more likely to fail at low temperatures (they had before), and the conditions of this particular launch were especially harsh for them. The engineers, knowing about the issue, wanted to postpone or abort the launch but were pressurized into not digging their heels in by senior NASA executives.
In the seminar, some students were engineers (NASA and Morton Thiokol), some executives, and of course the end result was a stalemate followed by an executive decision … and a disastrous launch. Nobel prize winning physicist Richard Feynmann later demonstrated what went wrong technically (by submerging the O-ring in ice cold water before the committee), as well as non-technically, with a classic Feynmann quote: “reality must take precedence over public relations, for nature cannot be fooled”.

So what does this have to do with software and IT ?

Everything. Every year human lives and IT become more entangled. The direct consequence is that technical problems in IT can have profound effects on human lives and reputations. With it comes increasing responsibility for the technicians and executives, be it developers, engineers, infrastructure technicians, database administrators, security consultants, project managers or higher level management (CxO).

The examples of this going wrong are numerous. Most often this results in huge budget deficits, but increasingly it leads to more disturbing data leaks (or worse: tampering) and the accompanying privacy concerns.
The most striking cases have been the malfunctioning US healthcare insurance marketplace, which resulted in millions of people who were involuntarily uninsured, or closer to home, the Dutch tax agency which spent 200 million euros on a failed project, or the Dutch government which spends 4-5 billion every year on failed IT projects. I didn’t even mention the National Electronic Health Record yet (Dutch: ‘electronisch patienten dossier’), which is still in an unclear state and ITs own Chernobyl waiting to happen.

And these examples are the published – public – cases: most enterprises with failing IT projects won’t tell you about them, because why would they, since it’s defacing at best (yet another dataleak) ? From personal experience, I’ve seen solutions which were so badly designed people could easily and/or accidentally access the lives of colleagues (HR systems) or find out a case was being made for their firing (before they were told).

So how is this possible ?

Before I answer that question, let me start with two disclaimers. First, in this post I want to focus on technical failure, meaning even though the project can be a project management success (delivered in time and under budget), it can still be a technical timebomb waiting to go off (like the Challenger). Second, a lot has been written about failing IT projects already, and this is by no means the definite story. It is however a developers point of view.

With that out of the way, let’s answer the question: in my perspective this is the result of a lack of professional attitude and transparency on all fronts: by the creative people (developers/engineers), executives, and their higher level management.

A good developer or engineer has a constructive mindset: if at all possible he will not unthinkingly accept what he is told but instead guide his customer (his manager, functional designer or some 3rd party) through their functional requirements and tell them the impact and consequences of different approaches, and suggests alternatives in case those are much more friendly for the technical back-end (either directly or in terms of future maintenance).
However, it can happen that the functional wishes are just not feasible in the framework you have been using and there is no real alternative (except starting from scratch). Depending on where you are, going ahead and ‘doing it anyway’ may lead to negative effects on human lives, and ultimately on the company/government you work for. In cases like this, the professional engineer will dig their heels in. Not because they are lazy and don’t want to do the work. Not because they are assholes or out of spite. But because it’s the professional thing to do.

A good executive expects their people to display this exact behavior: yes they expect a ‘can do’ mentality and for them to ‘make it work’, but by choosing the right solution and telling them when an idea is really bad, not by unthinkingly saying ‘yes sir/mam’ to everything. The good executive will either make sure the requirement goes away, or communicate a clear message to their higher ups when such functional requests lead to delays instead of pressurizing their people to ‘just do it’ and ‘make sacrifices’.

Companies which recruit developers and engineers primarily based on price – and see them as ‘production workers’ – can’t expect to get the kind of critical professionals described above. I’m not sure if companies with such hiring practices have good executives who just lack the awareness of what this results in, or also lack the good executives. However, the end result is that they will recruit the ones just looking for quick dough, and consequently – out of fear – always say yes without being clear about the risks (if they even see them). In fact, they might even make it work on short notice, but your company ends up with a technically weakened product and some serious risks. Word of warning: this doesn’t mean the opposite is true; highly paid technicians are not necessarily good professionals.

Finally, a good higher level management should cultivate a culture of transparency and open communication. If there is a good reason an ambitious goal is not reached, all too often higher level management (under pressure themselves from investors and/or public opinion) turns a blind eye or feign ignorance and indiscriminately punish the departments and project managers for not reaching them, even when those in question reported early this may be the outcome. This behavior will instill a sense of fear of transparency and cultivate a culture in which everyone will, down the whole chain, just make it work, regardless of future risks.

TraceCLI: a production debugging and tracing tool

Ever had this application that just crashed or malfunctioned in a Production environment ? And whatever you tried, it wasn’t reproducible in Test ? Or perhaps it was just slow in serving some pages (web application), and you wanted to see the actual SQL your Object Relational Model (ORM) generated and sent to the database ?

Well, there are some options.

For one, you could add more logging. Typically though, Murphy’s Law interferes and even though you think you log ‘everything’, you find out  when you really need it you forgot about that one spot, or need more detail (like argument or field values). Changing the code and building a new release, not to mention get it through Test, Acceptance and into Production is not really an attractive option here.

An alternative is to use WinDbg (+ SOS) on a crashdump obtained using a GUI app like DebugDiag or a command line app like ProcDump. For real crashes this ‘post mortem debugging’ option is often sufficient: you’ll have a full memory dump to work with taken at the exact time of the crash so all the details are there. However, using WinDbg is a pretty hardcore skill not every (.NET) developer has, and for more subtle malfunctions – for instance your application not behaving nicely with other systems – or in case of performance issues, you typically want to inspect a live target anyway.

And here it becomes problematic: I probably don’t have to tell you attaching a debugger to a live production process and stepping through it is not a good idea. Although even without WinDbg there’s still a large set of tools at your disposal (SQL tracing/profiling, performance counters, ETW tracing), those are mostly indirect and slow to setup: you’re watching symptoms instead of the real disease.

What I really wanted was a tool which I can attach to a live target without disrupting it, and in which I can specify some classes, methods or even namespaces to monitor (hitcounts, timings). Perhaps I even want to dump some (private) fields of a class when a method is hit. That sort of thing.

It didn’t exist (or I couldn’t find it), and I was searching for a challenge anyway, so I made one myself using native C++ and the debugging APIs.

The tool – for now named TraceCLI – allows you to live attach to a process (x86 or x64), and instrument the namespaces, classes, and methods you want. On hitting them, you can either just monitor their hitcounts or do more advanced stuff like measure the time in the method (for now including calls to other methods), as well as dump fields of the class the method is a member of on entering the method. For simple much used scenarios, there are presets (like SQL tracing) and for very advanced scenarios or multiple hit filter settings, you can also provide a configuration with an xml config file.

Although ultimately the goal is to use it on production processes, it’s still very alpha, both in terms of stability as well as feature set. So please use at your own risk.

Example: SQL tracing

For example, to trace SQL queries manually (without a preset), we could use the following commandline:


which specifies we want to dump the field ‘_commandText’ of a class with a method ‘RunExecuteReader’ upon entering the method.

During startup, TraceCLI attaches to the process and checks all loaded assemblies for such methods, and instruments them:

14:33:53.588 TraceCLI (x86) v0.1 - Ruurd Keizer
14:33:53.588 ### PARAMETERS
14:33:53.589 Filter #1: namespace (null), class (null), method RunExecuteReader.
14:33:53.589   Fields dumped on entry: _commandText
14:33:53.658 ### ATTACH
14:33:53.659 Attaching to process with pId 5320: C:\Program Files (x86)\IIS Express\iisexpress.exe
14:33:53.738 Processing filter...
14:33:53.739 Searching domain: DefaultDomain
14:33:53.739 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Configuration\v4.0_4.0.0.0__b03f5f7f11d50a3a\System.Configuration.dll
14:33:53.755 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Core\v4.0_4.0.0.0__b77a5c561934e089\System.Core.dll
14:33:53.800 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\Microsoft.Build.Utilities.v4.0\v4.0_4.0.0.0__b03f5f7f11d50a3a\Microsoft.Build.Utilities.v4.0.dll
14:33:54.077 Searching domain: /LM/W3SVC/4/ROOT-1-130487604823520283
14:33:54.078 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Data.Linq\v4.0_4.0.0.0__b77a5c561934e089\System.Data.Linq.dll
14:33:54.093 Searching assembly: C:\Windows\Microsoft.NET\Framework\v4.0.30319\Temporary ASP.NET Files\root\4ecd037f\b8cc2ad2\assembly\dl3\2c89eab9\1cd35552_3295cf01\Solvay.Common.dll
14:33:54.094 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\Microsoft.Web.Infrastructure\v4.0_1.0.0.0__31bf3856ad364e35\Microsoft.Web.Infrastructure.dll
14:33:54.796 Searching assembly: C:\Windows\Microsoft.NET\Framework\v4.0.30319\Temporary ASP.NET Files\root\4ecd037f\b8cc2ad2\assembly\dl3\5283b6cf\d83f43ef_2d8acf01\System.Web.Http.WebHost.dll
14:33:54.798 Searching assembly: C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
14:33:54.902 Found 4 methods satisfying the filters
14:33:54.902 Found method: internal System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReader(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.903 Found method: internal System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReader(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.903 Found method: private System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReaderTds(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.903 Found method: private System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand::RunExecuteReaderSmi(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.SqlDataReader) cil managed
14:33:54.904 Activating all breakpoints

While the tool is attached it dumps SQL to the log as it executes it:

14:33:54.904 ### TRACING
14:34:00.617 Field: private string System.Data.SqlClient.SqlCommand::_commandText, Value: SELECT
    [GroupBy1].[A1] AS [C1]
        COUNT(1) AS [A1]
        FROM  [dbo].[T_Computer] AS [Extent1]
        INNER JOIN [dbo].[T_OS] AS [Extent2] ON [Extent1].[HWComputerIndex] = [Extent2].[ComputerIndex]
    )  AS [GroupBy1]
14:34:00.626 Field: private string System.Data.SqlClient.SqlCommand::_commandText, Value: SELECT
    [GroupBy1].[A1] AS [C1]
        COUNT(1) AS [A1]
        FROM  [dbo].[T_Computer] AS [Extent1]
        INNER JOIN [dbo].[T_OS] AS [Extent2] ON [Extent1].[HWComputerIndex] = [Extent2].[ComputerIndex]
    )  AS [GroupBy1]
14:34:00.635 Field: private string System.Data.SqlClient.SqlCommand::_commandText, Value: SELECT
    [GroupBy1].[A1] AS [C1]
        COUNT(1) AS [A1]
            [Extent1].[SlaSID] AS [SlaSID],
            [Extent1].[SlaInfo] AS [SlaInfo],
            [Extent1].[SLaTeam] AS [SLaTeam],
            [Extent1].[SlaSource] AS [SlaSource]
            FROM [dbo].[T_SID_SLA_DRP] AS [Extent1]
            WHERE ( NOT ((N'int' = [Extent1].[SLaTeam]) AND ([Extent1].[SLaTeam] IS NOT NULL))) AND ( NOT ((N'wintel' = [Extent1].[SlaSource]) AND ([Extent1].[SlaSource] IS NOT NULL)))
        )  AS [Distinct1]
    )  AS [GroupBy1]
mcg_implemented elsewhere

Compile time marshalling

In one of my posts about managed/unmanaged interop in C# (P/Invoke), I left you with the promise of answering a few questions, namely: can we manually create our own marshalling stubs in C# (at compile time), and can they be faster than the runtime generated ones ?

A bit of background

It’s funny that when I raised these questions back in March, I was still unaware of .NET Native and ASP vNext which were announced by Microsoft in the following months. The main idea behind these initiatives is to speed up especially the startup time of .NET code on resource constrained systems (mobile, cloud).
For instance, while traditionally on desktop systems intermediate language (IL) in .NET assemblies is compiled to machine code at runtime by the Just-In-Time Compiler (JIT), .NET Native moves this step to compile time. While this has several advantages, a direct consequence of the lack of runtime IL compilation is that we can’t generate and run IL code on the fly anymore. Even though not much user code uses this, the framework itself critically depends on this feature for interop marshalling stub generation. Since it is no longer available in .NET Native, this phase had to be moved to compile time as well. In fact, this step – called Marshalling and Code Generation (MCG) is one of the elements of the .NET Native toolchain. By the way, .NET Native isn’t the first project which has done compile time marshalling. For example, it has been used for a long time in the DXSharp project.

The basic concepts are always the same: generate code which marshals the input arguments and return values, and wrap it around a calli IL instruction. Since the C# compiler will never emit a calli instruction, this actual call will always have to be implemented in IL directly (or the compiler will have to be extended, recently possible with Roslyn). Where the desktop .NET runtime (CLR) emits the whole marshalling stub in IL, the MCG generated code is C# so it requires a seperate call to an IL method with the calli implementation. If you drill down far enough in the generated sources for a .NET Native project, in the end you’ll find something like this (all other classes/methods omitted for brevity):

internal unsafe static partial class Interop
    private static partial class McgNative
        internal static partial class Intrinsics
            internal static T StdCall(IntPtr pfn, void* arg0, int arg1)
                // This method is implemented elsewhere in the toolchain
                return default(T);

Note the giveaway comment ‘this method is implemented elsewhere in the toolchain’, which you can read as ‘this is as far as we can go with C#’, and which indicates that some other tool in the .NET Native chain will emit the real body for the method.

DIY compile time marshalling

So what would the .NET Native ‘implemented elsewhere’ source look like, or: how can we do our own marshalling ? To call a native function which expects an integer argument (like the Sleep function I used in previous posts), first we would need to create an IL calli implementation which takes the address of the native callsite  and the integer argument:

.assembly extern mscorlib {}
.assembly CalliImpl { .ver 0:0:0:0 }
.module CalliImpl.dll

.class public CalliHelpers
    .method public static void Action_uint32(native int, unsigned int32) cil managed
        calli unmanaged stdcall void(int32)

If we feed it the address of the Sleep function in kernel32 (using LoadLibrary and GetProcAddress, which we ironically invoke through P/Invoke…), we can see the CalliHelper method on the managed stack instead of the familiar DomainBoundILStubClass. In other words, compile time marshalling in action:

Child SP IP Call Site
00f2f264 77a9d4bc [InlinedCallFrame: 00f2f264]
00f2f260 010b03e4 CalliHelpers.Action_uint32(IntPtr, UInt32)
00f2f290 010b013b TestPInvoke.Program.Main(System.String[])
00f2f428 63c92652 [GCFrame: 00f2f428]

This ‘hello world’ example is nice but ideally you would like to use well tested code. Therefore, I wanted to try and leverage the MCG from .NET Native, but it turned out to be a bit more work than I anticipated as you need to somehow inject the actual IL calli stubs to make the calls work. So perhaps in a future blog.

What about C++ interop ?

There seems to be a lot of confusion around this type of interop: some claim it to be faster, some slower. In reality it can be both depending on what you do. The C++ compiler understands both types of code (native and managed), and with it comes its main selling point: not speed but type safety. Where in C# the developer has to provide the P/Invoke signature, including calling convention and marshalling of the arguments and return values, the C++ compiler knows this already from the native header files. Therefore, in C++/CLI you simply include the header and if necessary (you are in a managed section) the compiler does the P/Invoke for you implicitly.


using namespace System;

int main(array ^args)
    Console::WriteLine(L"Press any key...");
    while (!Console::KeyAvailable)
    return 0;

Sleep is an unmanaged function included from Windows.h, and invoked from a managed code body. From the managed stack in WinDbg you can see how it works:

00e3f16c 00fa2065 DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)
00e3f170 00fa1fcc [InlinedCallFrame: 00e3f170] .Sleep(UInt32)
00e3f1b4 00fa1fcc .main(System.String[])
00e3f1c8 00fa1cff .mainCRTStartupStrArray(System.String[])

As you can see, there is again a marshalling stub, as in C#, it is however generated without developer intervention. This alone should be reason enough to use C++/CLI in heavy interop scenarios, but there are more advantages. For instance, the C++ compiler can optimize away multiple dependent calls across the interop boundary, making the whole thing faster, or can P/Invoke to native C++ class instance functions, something entirely impossible in C#. It moreover allows you to apart from depending on external native code, create ‘mixed mode’ or IJW (It Just Works) assemblies which contain native code as well as the usual managed code in a self contained unit.
Despite all this, the P/Invoke offered by C++/CLI still leverages the runtime stub generation mechanism, and therefore, it’s not intrinsically faster than explicit P/Invoke.

Word of warning

Let me end with this: the aim of this post is to offer an insight in the black box called interop, not as a promotion for DIY marshalling. If you find yourself in need of creating your own (compile time) marshalling stubs for faster interop, chances are you are doing something wrong. Especially for enterprise/web development it’s not very likely the interop itself is the bottleneck. Therefore, focussing on improving the interop scenario yourself – instead of letting the .NET framework team worry about it – is very, very likely a case of premature optimization. However, for game/datacenter/scientific scenarios, you can end up in situations where you want to use every CPU cycle efficiently, and perhaps after reading this post you’ll have a better idea of where to look.


Why developers don’t make millions of dollars

Let’s start with a riddle: what does a developer do to unwind at the end of a frustrating day at the office ?
Answer: he goes home and takes some time to write more code.

So, does this mean all developers are workaholics ? Highly questionable: why would there be more workaholics in development than in other professions? So…do they do it because they are socially awkward and have nothing fun to do ? Hmmm, I know the stereotype, but most developers I know have families, friends, love sports, parties, music and beer like any other human being out there. So that can’t be it.
So why do they do it ? A cynical answer might be: working on side projects improves your resume. While it’s definitely true that doing extra stuff can improve your resume, this line of reasoning is confusing cause and effect: employers didn’t somehow invent working on side projects is a bonus. Instead, it’s an intrinsic drive many great developers have, and as a result employers decided to include checking for side projects in their hiring practices.

So in fact the simple truth is…**drumroll**…great developers love what they do.

However, in the typical day job there is always external pressure saying you can only spend this many hours on it, or go in some direction that is politically correct but aesthetically or technically wrong. While some developers learn to appreciate this game of thrones forces playing a role and migrate to management positions, this is still a distraction from the thing they REALLY love: simply playing with new technologies and writing awesome code.

“Hey, I came here to read about earning millions of dollars, what’s the deal?” Patience my young padawan, the side project was important, because it sets the stage for an essential first conclusion, which is: Developers are artists (craftsman if you like). Why do I say this ? Because developers share this single very important trait with ‘real’ artists: in their off-time, they still do the same thing without direct benefits to their day job, but this time in complete freedom. Developers are not unique in this regard, other professionals sharing this characteristic include scientists and professional athletes.

So why is it that we don’t share that other very important trait: getting paid millions of $ ? Is it because we are not making things for the masses like rock stars and football heroes ? On the contrary: the very core of our everyday society is made by developers. From the apps on your smartphone, the electronics in your car, the systems wiring your paycheck every month, to the internet itself, all made by developers. These days software is as fundamental as power and food. Of course in order to run software needs a hardware infrastructure: the chips in the smartphones and servers, and on a larger scale the buildings they sit in and batteries powering them. These are however like the stadiums and roads you need for sports. The actual athlete creating the content are the artists: developers and designers.

So again, why aren’t these guys paid big wads of cash ? Some say it’s because professional athletes, musicians and movie stars only have a limited number of years to earn enough until they retire. While that line of thought is very social, it’s equally false. The markets don’t care for those guys any more than they care for any other human being, and club owners and record labels would very much like to pay them less.

No, the real answer starts with lack of visibility. Even with my untrained eye I can see one striker is making a more beautiful goal than the next, or a goalie gets an impossible save. To a lesser degree, I can differentiate great art from kitsch. It is however impossible to identify great scientists without being one yourself, and the same is true for developers. This can lead to only one outcome: most companies, because they can’t/won’t differentiate skill and dedication and see developers as production workers instead of creative professionals, will let new hires compete on price instead of quality. So they hire the script kiddies, googlecoders, and ‘fake it till you make it’ programmers. Some time after, because the ‘production workers’ aren’t delivering, they hire an extra management layer – most likely an ‘Agile guru’ with zero technical training – to put them to work.

And here we have the real problem: even if a great developer manages to pass this indiscriminate hiring wall, and turns out to be a real asset, between him and the CEO there are 9001 layers of management, each of which has to earn more than the last, irrespective of their actual contribution to the company. In this sense the position of developers is much like the one of coffee bean farmers, in that a very small percentage of the money reaches the ones creating the actual product. This still wouldn’t be a problem if development was a more visible art, like sports or music. Developers would be recognized as the creative force and the discrepancy would be impossible to sell to a larger audience.

So how can we fix it ? I’m not sure. Teaching everyone to code is a good initiative for a number of reasons, but it’s highly unlikely this will ever solve the ‘untrained eye’ problem at the levels we are talking about. It also will take ages before the effects will be seen. A more short term solution would be to put some developers on a pedestal: not just before other developers but for a large audience. Any rockstar coders out there willing to take the spotlights ? :)


Build 2014: it’s a good time to be a developer

At this year’s build conference keynote, Microsoft’s new CEO Satya Nadella said: “it’s a good time to be a developer”, and after seeing all MSFT’s announcements that came in the following days, I believe he was right.

The new MSFT has openness written all over. Anders Hejlsberg physically pushed the button on stage to make the sources for the new next generation managed C# compiler – codename ‘Roslyn’ – available to the public. A historic event, and likely hard to beat at following conferences. However, this certainly wasn’t the first open sourcing by MSFT. Instead, it is the culmination of events started in the early 2000s with the WiX installer and the Outercurve foundation, in recent years contributions have been faster and more extensive: Entity Framework, MVC, Reactive Extensions, the .NET framework itself, and now the next generation compiler.

Openness in this case doesn’t just refer to open source though: it’s a philosophy, a way of life. While in the past MSFT worked on something in secret for years, followed by a ‘tadaaa’ moment, at this year’s Build, those present were shown projects which are still in a pre-alpha phase, like ‘.NET native’ and the new Just in Time compiler codename ‘RyuJIT’. We were even shown a roadmap of things to come, like the return of the start menu in Windows 8, windowed store apps, the compilers finally! taking advantage of superscalar instructionsets (MMX, SSE, AVX), and the future possibility to develop uniform Windows store apps for all device categories (including the Xbox One).

Where in the past MSFT had a habit of competing with non-MSFT developments, today the keyword seems to be ‘embrace’: we can use visualstudio online to host GIT as well as TFS source code collections, we can use Office on iPad, Windows Azure hosts node, Java, as well as the traditional .NET, and cross platform use of C# is supported by the new MSFT/Xamarin partnership.

All in all MSFT is sending out a very positive vibe these days, and it is this vibe developers can get behind totally. It’s a good time to be a developer indeed.

Abstractions and performance

When it comes down to performance, enterprise developers are lazy, especially the managed (Java/.NET) kind. After all, when performance problems pop up experience tells you it’s always a question of database, disk I/O or network latency issue in the end, right ? Well not always.

One of the main objectives in the creation of managed platforms was to increase overall developer productivity, and they are immensely successful at that…but this comes at a cost: performance. When you think about it, it’s no surprise that in cases where performance per cycle still matters, native development is still very dominant. Think cases like games, database engines, operating systems, or apps for mobile platforms.
Although current desktop CPUs are so immensely powerful enterprise code can most of the time perform despite it being managed and despite it being developed by a person without any sense of what his code will compile too, you will sometimes encounter cases where it’s just not enough.

Despite knowing the above, and having a passion for native/performance coding besides .NET, I recently almost fell into this very trap.

For the next iteration of our software asset management solution I had to, based on a set of selected assets, compile a manager’s view of the employees who these assets could be ordered for. Moreover, I had to take into account employees who already had the asset, and whether they could order the asset at all based on their place in the organization.
While the User Interface was new, the established – and time tested – business queries were already there, so instead of making a new one I used 2 existing queries to get lists of employees and assets from the database independently.
Next I joined them together in memory (LINQ to objects), and was baffled to see the overall process take 43 seconds. With my enterprise managed developer hat on, I immediately assumed this must be a problem of the database, but some simple timing showed the 2 initial queries returned almost instantly. It was undeniable: the problem was really my LINQ query itself.

How could that be? Aren’t in memory queries much faster than database queries? Well…yes and no. Compared to a CPU cycle, the overhead involved in a roundtrip to the database is so astronomical, that even when the database returns instantly, you can still do a lot of work in that time. So when you have a small in memory dataset doing an in memory query is indeed much faster.
This changes when the dataset gets larger, and it changes fast. To understand why, you have to be able to look under the covers of the LINQ abstraction and understand what really happens.

In the case of LINQ to entities (or LINQ2SQL), the concatenation of select, join, where and group by clauses is ultimately compiled into a SQL query which is sent to the database. Next, the database engine is responsible for analysing the query and establishing a query plan which makes it execute fast. In the case of a LINQ to objects query however, the result is just a series of operations on your dataset as you specified them: there is no engine optimizing it.

‘But wait! Isn’t the compiler that engine in the case of in memory operations ?’ Not in this case. As good as the compiler is, it won’t optimize a full dataset scan away and replace it with an index scan (something the database engine would do).
It’s even worse: the LINQ abstraction will even prevent the compiler from unrolling inner loops for us. This is because the LINQ syntax is a concatenation of methods (closures even), while the compiler optimizations in .NET are limited to method scope: there is no global scope optimization like in native C++ (templating).

So what now? The solution I commonly encounter in enterprise development is: if it doesn’t perform, push the code into the database. In other words: implement the same thing as a stored procedure, and the database will do the heavy lifting for you (a.k.a. lazy enterprise developer plan B).
This approach can work, but it has some disadvantages. Over time, complex business code which is pushed inside the database becomes sort of a black box for other developers: somehow there is this magical thing which gives us the right answers, and before someone else dares touch that thing again, we have to rule out everything else. Another problem is fragmentation: with parts of the logic in code and parts in the database you easily lose overview.

Either way, considering we had 2 in memory datasets already, the idea to push the whole thing into a stored procedure really didn’t sit well with me. I refused to accept I couldn’t make it perform in code. So with the knowledge above I took a minute to sit down and really look at the query. The first thing I did was move one subquery outside the main query. This already shaved off a factor of 3. Nice, but still: 15 seconds is not nearly good enough for a query used to populate a website view.
Next I realized one of the where clauses was doing a full scan of the dataset while it could be easily grouped in sets per employee. So I transformed the List of simple objects with (employeeId and assetId in them) into a Dictionary of Lists of assetIds indexed by employeeId, and the query immediately executed in 0.3 seconds. Over 100 times faster than the original, and now acceptable for a webpage.

Abstractions are nice, but sometimes you have to lift the covers.


PInvoke: beyond the magic

Ever ran into problems passing data between unmanaged code and managed code ? Or just curious what really happens when you slap that [DllImport] on a method ? This post is for you: below I’ll shine some light inside the blackbox that’s called Platform Invoke.

Let’s start with a very minimal console app that has a call to an unmanaged Win32 function:

namespace TestPInvoke
    class Program
        static extern void Sleep(uint dwMilliseconds);

        static void Main(string[] args)
            Console.WriteLine("Press any key...");

            while (!Console.KeyAvailable)

Nothing exciting going on there: just the console polling for a keypress, and sleeping the thread for 1 second after every poll. The important thing of course is the way in which we sleep the thread, which is with PInvoke instead of using the usual mscorlib System.Threading.Thread.Sleep(Int32).

Now let’s run it under WinDbg + SOS, and see if we can find out what happens. The managed stack while sleeping looks like this:

Child SP IP       Call Site
00ebee24 0108013d DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)
00ebee28 0108008e [InlinedCallFrame: 00ebee28] TestPInvoke.Program.Sleep(UInt32)
00ebee6c 0108008e TestPInvoke.Program.Main(System.String[])

On the bottom is the entrypoint. The next frame on the stack is just an information frame telling us the call to Program.Sleep was inlined in Main (notice the same IP). The next frame is more interesting: as the last frame on the managed stack this must be our marshalling stub.

We can dump the MethodDescriptor of the Program.Main and DomainBoundILStubClass.IL_STUB_PInvoke methods for comparison, which gives us:

0:000> !IP2MD 0108008e
MethodDesc: 00fc37c8
Method Name: TestPInvoke.Program.Main(System.String[])
Class: 00fc12a8
MethodTable: 00fc37e4
mdToken: 06000002
Module: 00fc2ed4
IsJitted: yes
CodeAddr: 01080050


0:000> !IP2MD 0108013d
MethodDesc: 00fc38f0
Method Name: DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)
Class: 00fc385c
MethodTable: 00fc38b0
mdToken: 06000000
Module: 00fc2ed4
IsJitted: yes
CodeAddr: 010800c0

This tells us both methods are originally IL code, and they are JIT compiled. For the Main method we knew this of course, and for the PInvoke stub it can’t be a surprise either given the class and method names. So let’s dump out the IL:

0:000> !DumpIL 00fc37c8
IL_0001: ldstr "Press any key..."
IL_0006: call System.Console::WriteLine
IL_000c: br.s IL_001b
IL_000f: ldc.i4 1000
IL_0014: call TestPInvoke.Program::Sleep
IL_001b: call System.Console::get_KeyAvailable
IL_0020: ldc.i4.0
IL_0021: ceq
IL_0025: brtrue.s IL_010e
IL_0027: ret

No surprises there. Next the stub:

0:000> !DumpIL 00fc38f0
error decoding IL

OK, that’s weird. The metadata tells us we have an IL compiled method, the JITted code is there:

0:000> !u 010800c0
Normal JIT generated code
(actual code left out)

but where is the IL body?

In fact, it turns out since .NET v4.0, all interop stubs are generated at runtime in IL and JIT compiled for the relevant architecture. Note this runtime IL has a clear difference with the IL emitted in runtime assemblies (for instance the ones generated for XML serialization), as the interop stubs aren’t contained in a runtime generated assembly or module. Instead, the module token is spoofed to be identical to the calling frame’s module (you can check this above). Likewise, there is only runtime data for these methods, and looking up its class info gives:

!DumpClass 00fc385c
Class Name: 
mdToken: 02000000
File: C:\dev\voidcall\Profiler\ProfilerNext\TestPInvoke\TestPInvoke\bin\Debug\TestPInvoke.exe
Parent Class: 00000000
Module: 00fc2ed4
Method Table: 00fc38b0
Total Method Slots: 0
Class Attributes: 101
Transparency: Critical

This containing class – DomainBoundILStubClass – is some weird thing as well: it doesn’t inherit anything (not even System.Object), the name isn’t filled in, and there are no method slots, even though we know there is a at least one method in this class, namely the one we just followed to get to it. So probably this class is just a construct for keeping integrity in the CLR internal datastructures.

So there really seems to be no good way to get the IL of those stubs. The CLR team realized this as well and decided to publish the generated IL as ETW events. The ILStub Diagnostics tool can be used to intercept them. If we do this for our test program we see the following (formatted for readability):

// Managed Signature: void(uint32)
// Native Signature: unmanaged stdcall void(int32)
.maxstack 3
.locals (int32 A,int32 B)
// Initialize
    call native int [mscorlib] System.StubHelpers.StubHelpers::GetStubContext()
    call void [mscorlib] System.StubHelpers.StubHelpers::DemandPermission(native int)
// Marshal
    ldc.i4 0x0
// CallMethod
    call native int [mscorlib] System.StubHelpers.StubHelpers::GetStubContext()
    ldc.i4 0x14
    calli unmanaged stdcall void(int32) //actual unmanaged method call
// Unmarshal (nothing in this case)
// Return

The (un)marshalling isn’t very interesting in this case (int32 in and nothing out). To make it more clear for those who don’t use IL daily, I used ILAsm to compile this method body into a dll and used ILSpy to view it in decompiled C#:

static void ILStub_PInvoke(int A)
    calli(void(int32), A, *(*(StubHelpers.GetStubContext() + 20))); //not actual C#, but more readable anyway

The call to the unmanaged method is done with a calli instruction, which is a strongly typed call to an unmanaged callsite. The first parameter (not on the stack but encoded in IL), is the signature of the callsite [void(int32)], followed by (on the stack) the argument (in this case A), ultimately followed by the unmanaged function pointer (which must be stored in offset 20 of the context returned from StubHelpers.GetStubContext()).

So what magic takes place in StubHelpers.GetStubContext() ?

The answer will come naturally if we take for example a simple program that has 2 PInvoke methods with the same input and output arguments:

static extern void ExitThread(uint dwExitCode);

static extern void Sleep(uint dwMilliseconds);

If I let the CLR generate an IL stub for both methods, I have exactly the same input and output marshalling, and even the unmanaged function call signature (not address) is the same.

That seems a bit of a waste, so how could one optimize this ?

Indeed, we would save on basically everything we care about (RAM, JIT compilation) by just generating one IL stub for every unique input+output argument signature, and injecting that stub with the unmanaged address it needs to call.

This is exactly how it works: when the CLR encounters a PInvoke method, it pushes a frame on the stack (InlinedCallFrame) with info about – among other things – the unmanaged function address just before calling the actual IL stub.

The stub in turn requests this information through StubHelpers.GetStubContext() (aka ‘gimme my callframe’), and calls into the unmanaged function.

To see this in action, consider the code:

namespace TestPInvoke
    class Program
        static extern void Sleep(uint dwMilliseconds);

        [DllImport("kernel32.dll", EntryPoint = "Sleep")]
        static extern void SleepAgain(uint dwMilliseconds);

        static void Main(string[] args)
            Console.WriteLine("Press any key...");

            while (!Console.KeyAvailable)

I’ll run this from WinDbg+SOS, here’s the disassembly of the calls to Sleep and SleepAgain in main:

mov     ecx,1F4h
call    0042c04c (TestPInvoke.Program.Sleep(UInt32), mdToken: 06000001)
mov     ecx,1F4h
call    0042c058 (TestPInvoke.Program.SleepAgain(UInt32), mdToken: 06000002)

You see the calls to Sleep and SleepAgain are pointing to different addresses. If we dump the unmanaged code at these locations we have:

!u 0042c04c (Sleep)
Unmanaged code
mov     eax,42379Ch
jmp     006100d0 (DomainBoundILStubClass.IL_STUB_PInvoke(UInt32))

!u 0042c058 (SleepAgain)
Unmanaged code
mov     eax,4237C8h
jmp     006100d0 (DomainBoundILStubClass.IL_STUB_PInvoke(UInt32)

Indeed, we see in a few lines that some different value is loaded into eax, before jumping to the same address (the IL stub). Since the value in eax is the only thing seperating the two, this must be a pointer to our call frame.

So let’s consider these as memory addresses and check what’s there:

dd 42379Ch
0042379c  63000001 20ea0005 00000000 00192385
004237ac  001925ec 00423808 0042c010 00000000

dd 4237C8h
004237c8  630b0002 20ea0006 00000000 00192385
004237d8  001925ec 00423810 0042c01c 00000000

Now remember the offset in the calli instruction above ? The unmanaged call was to a pointer reference at offset 20 (14h) in our stubcontext. Or in plain words: take the value at offset 20 in the callframe (emphasized), and dereference it. This gives us:

00423808 => 7747cf49 (KERNEL32!SleepStub)
00423810 => 7747cf49 (same)

And there we have it, PInvoke demystified.

In a next post I’ll address the following questions:

  • can we manually create our own marshalling stubs in C# (at compile time) ?
  • can it be faster than the runtime generated one ?
  • what about the reverse case (unmanaged code calling us) ?

Reactive Extension and ObserveOn

I’ve been actively following and using the work of the Reactive Extensions (Rx) team from early on, as Rx is truly a unique library for working with events. However, some days ago I discovered something didn’t quite work as expected, and it involves the ObserveOn and SubscribeOn methods of Observable.

The problem case

We had an eventstream – in particular XML messages arriving on a TCP port – which arrived at a relatively high rate. We did all of the event detection, filtering and handling with a Rx chain, which worked great. In the end, the event data had to be persisted using a database. This last step is where we meet the real problem, as the database operation could take longer than the time between arriving events causing queuing of the incoming events.

The solution (or so we naively thought)

Let’s put every database operation on a seperate thread so we offload all IO delays and free our main thread for the real computations. How ? There is this nice little method on Observable called ObserveOn which allows you to specify where you want the observing to take place:

public static class Observable
    public static IObservable ObserveOn(this IObservable source, IScheduler scheduler);

    public static IObservable SubscribeOn(this IObservable source, IScheduler scheduler);

So let’s ObserveOn(ThreadPool), and we fix our problem !

WTF or ‘Why are my events still queueing’ ?

The essential thing to remember is Rx is not multithreaded. If you specify you want to Observe events on a particular thread, Rx will help you, but that doesn’t mean your main thread won’t block until that call returns. So what’s the point of ObserveOn and SubscribeOn ? It’s mostly useful for STA scenarios: most notably the one where a UI thread receives events, which you SubscribeOn ObserveOn a background thread to prevent blocking of the UI thread, and eventually ObserveOn the UI thread again to update the UI. Sure the case uses two threads, but it’s all sequential.

The real fix

Explictly spawn a new thread/task in the OnNext which takes care of the database update, and immediately return to the observable.


AppPool crashes and RapidFail protection

Yesterday, one of our production sites began to crash at random intervals. We managed to narrow the issue down to one specific user logging in at the time, and clicking on a number of (again random) pages.

Post-mortem debugging using crashdumps and WinDbg showed the last exceptions on the stack to be (again random) and pretty minor.

The only thing they had in common was that they were unhandled, and so ended up in the Application_Error method of the Web project’s HttpApplication derived class.

So what happened ?

In the end it boils down to a feature in Internet Information Services called “Rapid Fail Protection”. If enabled (default), the application pool will stop and serve 503 Service unavailable responses when it sees X unhandled exceptions in Y minutes (both configurable).

rapid fail protection

Of course the best fix is to properly catch exceptions, however, if you ever have a case of Application Pools stopping under mysterious circumstances, check if you have Rapid Fail protection turned on.