RAII in C#

We’ve already discussed RAII in C++, now it’s time to see how we implement the same pattern in C#. We don’t manage our memory in C#, however we still need to manage other resources, like network and database conections, files and mutexes. In C++, whenever control leaves a block, the local objects defined in that block are destroyed. In C# this is not the case because the runtime uses garbage collection to manage memory. This means that the runtime periodically looks for objects that are no long referenced and deletes those that it finds. So, unfortunately, we cannot use the same trick manage resources.

In order to guarantee that a resource is released in C# we can use a finally block like so:

// acquire a resource
try
{
   // use resource and do other stuff as well
}
finally
{
   // release resource
}

A finally block will be executed whenever control leaves the try block, either because it reaches the end of the block, it reaches a return statement of an exception is thrown.

We can also encapsulate resource management with a class like in C++. In C# we normally do this by implementing the IDisposable interface. The IDisposable interface looks like this:

interface IDisposable
{
    void Dispose();
}

We acquire the resource in the constructor and we release the resource in the Dispose method.

class ResourceHolder : IDisposable
{
   public ResourceHolder()
   {
       // acquire resource here
   }

   public void Dispose()
   {
       // release resource
   }
}

The Dispose method of our ResourceHolder must be called. It does not happen automatically like a destructor in C++. We can combine a class implementing IDisposable with the finally block like this:

var holder = ResourceHolder()
try
{
   // use resource and do other stuff as well
}
finally
{
   holder.Dispose();
}

In fact, this pattern is so useful that some syntactic sugar exists for it, the using statement.

using(var holder = ResourceHolder())
{
   // use resource and do other stuff as well
}

The above code is exactly equivalent to our previous code example with an explicit finally block. Whenever control exits the using block, the Dispose method of the object declared inside the using statement is called.

RAII is a great example of something that ends up being more complicated in C# than it is in C++. Usually, things are easier in C#!

What are Dependency Injection Frameworks and Why are they Bad?

What is Dependency Injection?

Let’s suppose we are writing some code. This is part of a big legacy code base and we are using an object oriented language like Java or C#. Say we would like to value some stocks.

Suppose we have a class, StockValuer, that takes as its inputs, various interfaces, say, an IInterestRateProvider, a INewsFeedReader, and a IHistoricPriceReader. These interfaces are implemented by theInterestRateProvider, NewsFeedReader and IHistoricPriceReader classes respectively. Now each of these types in turn will have arguments they depend on, but we will elide that for now. So we will set it all up, with something like:

IInterestRateProvider interestRateProvider = new InterestRateProvder(...);
INewsFeedReader newsFeedReader = new NewsFeedReader(...);
IHistoricPriceProvider historicPriceReader = new HistoricPriceReader(...);

IStockValuer stockValuer = new StockValuer(interestRateProvider, newsFeedReader, historicPriceReader);

So we have one top level class, the StockValuer, and we pass various other objects into it, that provide the functionality it needs. This style of programming is called Inversion of Control or Dependency Injection.

Usually, when we write code like this, we test it by passing fakes or mocks of the various interfaces. This can be really great for mocking out the kind of low level stuff that it is usually hard to test like database access or user input.

These style of programming goes hand in had with the factory pattern. We can see as much above, we have written a factory to create our StockValuer!

What are Dependency Injection Frameworks?

There is another way for us to create our StockValuer. We can use a dependency injection Framework like Spring in Java or Castle Windsor in C#. We will no longer have a factory that explicitly builds up our StockValuer. Instead, we will register the various types that we wish to use, and the framework will resolve them at runtime. What this means, is that rather than using the new keyword and passing arguments to constructors, we will call some special methods from our dependency injection library.

So in our StockValuer example we would write something like:

var container = new DependencyInjectionContainer();

container.Register<IInterestRateProvider, InterestRateProvider>();
container.Register<INewsFeedReader, NewsFeedReader>();
container.Register<IHistoricPriceReader, HistoricPriceReader>();

container.Register<IStockValuer, StockValuer>();

Then, when the stock valuer is used in your real code, say, a function like,

double GetPortfolioValue(Stocks[] stocks, IStockValuer stockValuer)
{
...
}

The dependency injection framework will create all these types at run time, and provide them to the method. We have to explicitly provide the very bottom level arguments, things like the configuration, but the framework resolves everything else.

Why is it bad?

I think this is a pretty awful way to program. Here are my reasons why.

It Makes Our Code Hard to Read and Understand

One of the guiding principles behind dependency injection is that it doesn’t matter what specific implementation of the IThingy interface you get, just that it is an IThingy. This is all very well and good in principle, but not in practice. Whenever I am reading or debugging code, and I want to know what it actually does, I always need to know what specific implementation of IThingy I am dealing with. What’s even worse, is that DI frameworks break IDEs. Because various types and constructors are resolved at runtime, semantic search no longer works. I can’t even look up where a type is created anymore!

It Encourages Us to Write Bad Code

Dependency injection frameworks encourage us to write our code as a mish-mash of dependent classes without any clear logic. Each individual piece of real working code gets split out into it’s own class and divorced from it’s actual context. We end up with a bewildering collection of types that have no real world meaning, and a completely baffling dependency graph. Everything is now hidden behind an interface, and every interface has only one implementation.

It turns Compile Time Errors into Runtime Errors

For me this is an absolutely unforgivable cardinal sin. Normally, in a statically typed language, if you don’t provide the right arguments to a method this is a compile time problem. In fact normally this is something that is picked up by your IDE before you even try to build your code. Not so with dependency injection frameworks. Now you will not discover if you have provided the correct arguments until you run your code!

To give an example, I was working on a web app that was built using dependency injection. One day we merged some changes, built and tested our code, and deployed it to the test environment. While it was running, it crashed. We had forgotten to register one of the arguments a new method was using, and, the dependency injection framework couldn’t resolve this at runtime. This is something we could have easily spotted if we were writing our code without dependency injection magic. Instead our type error was only discovered when a specific end point of our service was hit.

It is Outrageously Verbose

The sort of code we write with DI frameworks is naturally very verbose, lots of interfaces and lots of classes. I once stripped castle windsor out of a C# project and halved the number of lines without changing the functionality at all. The problem with really verbose code is that it is harder it is to maintain. Indeed the more lines of code you have, the more bugs you will have.

Worse than this though, is the tests. There is a solution of sorts to the issue with runtime errors mentioned above. We write unit tests to validate the type correctness of our code. This to me is a pretty mad solution. It only catches the type errors that you remember to test for and it bloats our code base even more.

It is Far Too Complicated.

Using a DI framework requires us to learn an entirely new meta-langauge that sits on top of C# or Java. We need to understand all kinds of tricks and gotchas. Instead of building up programs with a few basic tools, we are now writing them with a complex unintuitive meta language that describes how to evaluate dependency graphs at runtime. Dependency injection takes something simple but inelegant, a factory, and turns it into a incomprehensible mess.

The Tricky Little Differences Between C# and C++ – part 1

There are a lot of big differences between C++ and C#. Really, despite their similar syntax, they are completely different languages. C# has garbage collection, C++ has manual memory management. C# is compiled to a special intermediate language that is then just in time compiled by the run time. C++ is compiled to machine code.

But, when you transfer between the two languages you are probably not going to get caught out by obvious things like that. It’s much more likely that the small subtle differences will trip you up. I’m going to cover some of these little differences that I’ve come across.

In this post we’ll be looking at uninitialised strings. Suppose we create a string in C++ without instantiating it directly ourselves and then print its contents. Like this:

string s;
cout << s << endl; 

This will just print an empty line, because the string s is empty. However, if you try the same thing in C#, it won’t work quite the same. If we use C# and do something like:

string s;
Console.WriteLine(s);

we’ll get the following runtime error:

Program.cs(10,31): error CS0165: Use of unassigned local variable 's'

This is a little surprising, normally C# is the more user friendly language. But in this case in C++ we get a friendly default behaviour whereas in C# we get a nasty runtime exception. Why?

This is because, in our C++ example we created a string object on the stack and dealt with it directly. When we created it, C++ called the default constructor which creates an empty string. However, in C# strings are reference types. This means that whenever we create one we are creating an object on the (managed) head. So our C# is really equivalent to the following C++ code:

String* s = NULL;
count << *s << endl;

If you run this you’ll end up with a seg fault, that’s because a null pointer points to memory address 0 which is inaccessible to you.

Don’t Get Caught out by Covariance!

Downcasting is bad, you shouldn’t do it, it’s code smell and it’s an anti-pattern. Unfortunately, in real world code you will see plenty of it. So you need to be aware of the pottential pitfalls of using downcasting. One that caught me out recently is covariance.

In C++, you can quite freely cast between types. You can take a pointer to some memory and cast it to any type you like, and read out what you get. Generally speaking you won’t get anything useful. If you’re not careful you’ll get “undefined behaviour”. If you have data stored in memory, the assumption is that you know that type that data is, and you are responsible for using it appropriately.

In languages like Java and C#, things are different. Here, the runtime checks our casts and will throw an exception if it thinks you have gone wrong. The consequences of this difference can sometimes be surprising.

Let’s look at an example. Suppose we have a base class Security defined like so:

public class Security
{
    int Id;
    public Security(int id)
    {
        Id = id;
    }
}

and two subclasses, Stock:

public class Stock : Security
{
    string Name;
    public Stock(int id, string name) : base(id)
    {
        Name = name;
    }
}

and Bond:

public class Bond : Security
{
    double Rate;
    public Bond(string name, double rate) : base(id)
    {
        Rate = rate;
    }
}

Now if we have a reference of type Security, that actually points to a Stock, we can happily downcast it to a reference of type Stock, like this:

Security s1 = new Stock(1, "GOOG");
Stock AsStock = (Stock)s1;
Console.WriteLine(s1.Name);

Because, Stock is a subtype of Security we can cast a reference to a Stock to a reference to a Security. This works because every Stock will have all the fields of a Security, in the same relative locations in memory.

However, if you have a reference of type Security that points to a Security or a Bond, and try and cast it to a Stock, you’ll have trouble. If we run the following code

Security s1 = new Bond(1, 0.02);
Stock AsStock = (Stock)s1;
Console.WriteLine(s1.Name);

we will see a run time exception of the form:

Unhandled exception. System.InvalidCastException: Unable to cast object of type 'Casting.Bond' to type 'Casting.Stock'.

This makes sense, the runtime knows that the object we have a reference to is not a Stock. It knows that we can’t cast this object to a stock in a sensible way. So the runtime stops us by throwing an exception.

Let’s look at a slightly more complicated example. Suppose, instead of a single object, we had a whole array of them. The following code:

Security[] securities = new Stock[]{new Stock(1, "ABC"), new Stock(2, "DEF")};
Stock[] stocks = (Stock[]) securities; 
Console.WriteLine(stocks[0].Name);

will run happily. We can cast a reference of type Security[] that points to a Stock[], to a reference of type Stock[]. However if we try

Security[] securities = new Security[]{new Stock(1, "ABC"), new Stock(2, "DEF")};
Stock[] stocks = (Stock[]) securities; 
Console.WriteLine(stocks[0].Name);

we will get an InvalidCastException:

Unhandled exception. System.InvalidCastException: Unable to cast object of type 'Casting.Security[]' to type 'Casting.Stock[]'.

This might seem a little surprising. The objects in our array are still actually Stocks. We know that we can cast a reference of type Security that points to a Stock to a reference of type Security. Why can’t we cast the Security[] reference to a Stock[] reference?

It’s a subtle one. When we cast an array reference, we are not casting the objects in the array, we are casting the array itself. So in the first array example, we are casting a reference of type Security[] to a reference of type Stock[]. The runtime knows that the reference actually points to an object of type Stock[], so this is fine. There will only ever be Stock objects in this array. Even though we have a reference of type Security[] pointing to this array, we can’t do something like:

Security[] securities = new Stock[]{new Stock(1, "ABC"), new Stock(2, "DEF")};
securities[0] = new Bond(2, 0.02);

the runtime knows that our array is of type Stock[], to it throws an exception:

Unhandled exception. System.ArrayTypeMismatchException: Attempted to access an element as a type incompatible with the array.

However, in the second example, we have a reference of type Security[] that points to an array of type Security[]. Although this array only contains stocks, the runtime cannot in general say whether that is true or not. Suppose we had done something like this instead:

Security[] securities = new Security[]{new Stock(1, "ABC"), new Stock(2, "DEF")};
securities[0] = new Bond(2, 0.02);
Stock[] stocks = (Stock[]) securities; 

We can of course add a Bond to an array of type Security[], the runtime doesn’t keep track of all this, which is why it has no way of knowing if the securities array really does contain objects of type Stock or something else.

The name of this type of casting is Covariance. Eric Lippert, one of the original designers of C#, has a pretty good blog about it. The really important point is that when we have an array of some base type, even if we are able to downcast the individual members of that Array, the runtime might stop us from downcasting the entire array.

This tripped me up in my day job last week. I was performing a data load that returned an array of type Security[], I knew this array contained only objects of type Stock and so I cast it to Stock[]. I merged this into master, but then our regression tests failed. Thankfully my mistake was caught before making it into prod.

What is Single Inheritance, and Why is it Bad?

In a previous post we talked about how there are two different notions of inheritance, Interface Inheritance and Implementation Inheritance. In this post we’ll be talking primarily about the latter. That means, we’ll be talking about using inheritance to share actual logic among classes.

Let’s say we defined two base classes. The first is DatabaseAccessor:

class DatabaseAccessor {
   protected:
   bool WriteToDatabase(string query) {
   // Implementation here
   }

   string ReadFromDatabase(string query) {
   // Implementation here
   }
}

which encapsulates our logic for reading from and writing to some database. The second is Logger:

class Logger {
   protected:
   void LogWarning(string message) {
   // Implementation here
   }

   void LogError(string message) {
   // Implementation here
   }
}

this encapsulates our logic for logging errors and warnings. Now, suppose also that we are using a language the supports multiple inheritance, such as C++. If we want to define a class that shares functionality with both of these classes, we can subclass both of them at once:

class SomeUsefulClass : private DatabaseAccessor, private Logger {
   // Some useful code goes here
}

So, we have a subclass named SomeUsefulClass that can use both the DatabaseAccessor and Logger functionality, great!

What if we wanted to do the same thing in a language like C# or Java? Well then we’re out of luck, in C# and Java you cannot inherit from more than one base class. This is called single inheritance. How do we achieve the same effect if we are using either of these languages? Well, one solution would be to chain our subclasses together. We would choose which of DatabaseAccessor and Logger is more fundamental and make it the subclass of the other. Suppose we decided that everything has to log, then the Logger class remains the same and DatabaseAccessor becomes:

class DatabaseAccessor : private Logger {
   protected:
   bool WriteToDatabase(string query) {
   // Implementation here
   }

   string ReadFromDatabase(string query) {
   // Implementation here
   }
}

So now we can subclass DatabaseAccessor and get the Logger functionality as well. Although, what if we really did just want the DatabaseAccessor logic, and we didn’t actually need the Logger implementation? Well tough luck. Now everything that inherits from DatabaseAccessor inherits from Logger as well.

This might not seem like much of a problem with something as trivial as Logger, but in big enterprise applications, it can spiral out of control. You end up in a situation where all the basic re-usable code you might need is locked inside a lengthy chain of subclasses. What if you only need something from the bottom ring of this chain? Unfortunately, you have to pick up all of it’s base classes as well. This makes our code unnecessarily complicated and harder to read and understand. If one of those unwanted base classes does all kinds of heavy weight initialisation on construction, then it will have performance implications as well.

One consolation that is often offered is that both languages allow multiple inheritance of interfaces. This doesn’t really help us though. Implementation inheritance and Interface Inheritance are two completely different things. We can’t convert one of DatabaseAccessor or Logger into an interface. The entire reason we want to inherit from them is to get their implementation!

We could also use composition instead of inheritance. In this case we would inherit from one of the classes, and keep a reference to the other. But, in that case, why even use single inheritance at all? Why not just let our classes keep references to a Logger and a DatabaseAccessor? The language designers have struck a bizarre compromise here, we can use inheritance, but only a little bit. If C# and Java are Object Oriented, then they should allow us to use the full features of object orientation, rather than just a flavour.

The good news is that the people behind C#, Microsoft, have realised the error in their ways. They have released two features that ameliorate the problem of single inheritance, Extension Methods and Default Implementations in Interfaces. More on these in future blog posts.

Book Review – Functional Programming in C#

I read Enrico Buonanno’s Functional Programming in C# directly after Jon Skeet’s C# in depth. Unlike Skeet’s book, this is not a book about C#, it is a book about functional programming which uses C# is the medium of instruction.

What makes this particularly interesting is that C# isn’t really a functional language. Usually functional programming is discussed in terms of niche explicitly functional languages like Haskell. By using C#, Buonanno makes the principles and practices of functional programming a lot more accessible to the average programmer. He does advocate mixing the more traditional object oriented style of C# with functional programming, but this is not something that really comes through in the code samples.

This book covers the basic concepts of functional programming really well. He explains how and why to avoid state mutation, the concept of functions as first class citizens and higher order functions, function purity and side effects, Partial application and currying and lazy computation. One thing that interested me particularly is the good case the author makes that pure functions should not throw exceptions.

He does not spend a long time justifying functional programming. The two main benefits he repeatedly highlights are easier testing and better support for concurrency. It is clear that a functional approach is more suited to concurrent programming. However the claim that functional code is easier to test seems somewhat dubious to me, and is not really backed up with credible examples. He also claims that functional programming leads to cleaner code. Again, I am somewhat skeptical.

I really enjoyed the discussion of user defined types. In particular the pattern of creating new types that wrap low level types and add some semantic meaning. A great example he uses is an age type. This has only one member, an integer. It’s constructor will only accept valid values for human ages. This means that we have added an extra layer of static type checking to this type: when we want an age, we use the age class, and not just an integer. It also means we don’t have to perform extra checks when using an age type, as we know it’s value was checked when it was initialised.

The material on LINQ was also quite good. LINQ is strongly functional in style, it emphasizes data flow over mutation, composability of functions and pureness. This is all well explained, the author even shows how to integrate user defined monads into LINQ. However LINQ is not dealt with in a single place in a systematic way, which is disappointing. Another gripe I have is that the author uses LINQ query syntax. I do not like query syntax. Not only is it ugly, it is usually incomprehensible.

Buonanno uses Map, Bind and Apply to introduce functors, monads and applicatives in a very practical way. He also goes into detail explaining how and why to use the classic monads Option and Either. We even see variations on the Either monad that can be used for error handling and validation. Both the applicative and monadic versions of Traverse appear near the end of the book as well. In my opinion, Monad stacking is one of the worst anti-patterns of functional programming. So it is disappointing that it only gets very limited coverage. I would also have liked if he had used his excellent examples as a jumping off point to dig deeper into category theory. Perhaps however, that would have been too abstract for what is quite a practical book.

Handling state in a functional way is covered well, but it feels a bit academic. It is hard to imagine applying the patterns he covers in a real world code base. Sometimes you just have to use state! The final few chapters cover IObservables, the agent model and the actor model. These sections were quite interesting but felt a little out of place. They really merit a much deeper dive.

When I finished this book I had a much greater appreciation for functional programming. In particular it gave me lots of ideas of how I could practically apply it in my real life work. The code samples were all very good, occasionally though they were a little convoluted. Indeed they sometimes seemed like evidence against a functional style. But, as someone relatively new to functional programming I appreciated how grounded it was in real world examples. Overall, this book is an excellent resource for C# programmers who want to add a little functional flourish to their code. I highly recommend it.

C# In Depth – Book Review

Jon Skeet is a bit of a legend. He has the highest reputation score on Stack overflow. He got there because of his consistently patient, helpful and correct answers. He is probably the most prominent C# developer there is. So, when I started a new job as a C# developer, I decided to read Skeet’s book, C# in Depth.

It’s a very good book. There is one big problem however, the structure. This book is divided into five parts, each dealing with a successive major numbered release of C#. This chronological structure is quite strange. The overriding assumption of the author is that the reader is familiar with C# 1. Given that C# 2 was released 13 years ago, this is a pretty strange angle. It’s hard to imagine there are many programmers today who are familiar with C# 1 but need a detailed walk through of the new additions to the language in C# version 2 to 5. The book ends up a sort of mix between a history of C# and an intermediate user’s guide.

One example of the problem with this structure is how delegates are covered. They are first introduced briefly in chapter 1. Improvements to the delegate syntax in C# 2 are then covered in detail in chapter 5. In neither of these chapters is there a clear explanation of what delegates actually are or why they are part of C#. Indeed when we reach chapter 10 Skeet covers lambda expressions, which, in reality, make delegates redundant for most use cases.

Another victim of the unorthodox structure is the coverage of class properties. Modern C# syntax allows us to define properties in a very quick intuitive manner. In this book, first we learn about properties as they originally appeared in C# 1. Then, in chapter 7, we see how C# 2 allowed a mix of public getters with private setters. Finally in chapter 8 we see how properties are actually implemented in modern C#.

There is of course a benefit to covering older versions of the language in detail. C# is a language designed for enterprise development. So, if you code in it, you are likely to be working with a large legacy code base. This means that understanding what the language looked like in it’s various iterations is useful. However, these topics would be a lot better served if they were covered all at once, rather than being split over multiple chapters.

Skeet spends a lot of time covering Linq, which is great. Linq is a really cool feature of C#, and he covers cool details, like how to use extension methods and iterators to integrate your own code into LINQ. He also covers the query expression Linq syntax. This is the syntax that lets your write a linq expression in the style of a SQL query. Frankly I think Linq expression syntax is a monstrosity and should never be used, but it is probably useful to cover it, and explain how it works (it’s really just syntactic sugar for the normal linq syntax). There is also a useful section on async code, that gets into a lot of really useful detail.

Overall, Skeet has an ability to make some quite obscure topics interesting and accessible. He always presents new ideas with realistic and useful code snippets. Most important of all, he writes in a fun conversational style, that makes reading his book a lot more fun than a typical intermediate language guide.