Sunday, June 8, 2008

Extension Methods are way cool!

OK.  So everyone's heard of LINQ by now.  Most everyone has even heard of some of the cool features of C# 3.0 (lambdas), but in my mind, the coolest - extension methods - largely goes unnoticed.  Extension methods are the plumbing on which LINQ and some of the other cool features in the C# 3.0 libraries are implemented.  They are, in my opinion, the best feature C# has introduced since Generics, and are possibly one of the best features added to traditional languages EVER!

Consider the following - you have a class that someone else wrote.  On their class, they've provided a public interface for doing all of the things you need, but there are several additional things that you've implemented (as a separate utility set of functions) that it would be nice to add to the class' public interface.  Unfortunately, the class is marked 'sealed', or it is the base of a large hierarchy of classes that you simply can't add your functionality to (since you can't cause classes in a vendors library to derive from your 'new' version of their base class).

Extension methods to the rescue - all you need to do is declare a static class in your library (which you probably already have called 'StringUtils' or something like that :)), and provide some static methods on it that use the new 'this' keyword on their first argument.  Magically, the compiler will then 'add' this method to all items that have a type that is compatible with the type you have in the 'this-marked' argument.

For example:

public static class StringUtils
{
public static string RemoveAll(this string s, params string[] args)
{
string ret = s;
foreach(string sremove in args)
ret = ret.Replace(sremove, string.Empty);
}
}

by the way, of course I know this is the most horrible way to implement this function - it's just an example so don't tell me how crappy my code is or that I should be using StringBuilder or yada yada yada...!

The point of this is that after declaring such a function, all objects of type 'string' syntactically receive a member called 'RemoveAll' that has a single 'params' argument.  This is VERY cool.

The coolest thing about this - you can also do it for interfaces, enums, and various other types that you can't possibly provide "code" for in a more traditional way.

What else?

Much of the code that I write on a day-to-day basis works with tree-based data structures.  Some of these structures can get very complicated and much of my unit test code needs to do asserts over a large part of a tree (after performing some complex operation).  As a for instance, consider an expression parser.  Such a parser would presumably build an AST for the expression it's given and return that AST for further processing.  ASTs for all but the most simple expressions can get very tedious to 'check' for validity when writing a parser.

I've recently begun using extension methods for by base node class to help with my unit testing.  I directly put the unit test 'asserts' into the extension methods, and these extension methods are in NO WAY suitable to exist in the library that is being tested (why on earth would I want to have all this extra junk in my library just to support unit tests).  As a matter of fact, my libraries even target .NET 3.0 (C# 2.0) rather than .NET 3.5, C# 3.0.  However, that doesn't stop me from being able to use extension methods in my unit testing code (which doesn't get deployed to my clients, so I don't require them to have 3.5, I just have to have it on my dev machine and build machine).

Here's a simple example of how some of my unit testing code looks:

[Test]
public void FormulaTests2()
{
PrimaryLexer l = new PrimaryLexer();
StringReaderAdapter sra = new StringReaderAdapter("a / (b + c)", 0);
InforceScriptLexerFilter lf = new InforceScriptLexerFilter(sra, l);
InforceScriptSemanticParser sp = new InforceScriptSemanticParser(lf);

RootFormula rf = sp.Parse();

// look for 'a' and '/'
rf.Body.Is<binaryop>()
.OperatorIs(InforceScriptTokenId.Slash)
.Left.Is<invokeop>()
.IdRef.Is<idreference>()
.NameIs("a");
// look for 'b' and '+'
rf.Body.Is<binaryop>()
.Right.Is<binaryop>()
.OperatorIs(InforceScriptTokenId.Plus)
.Left.Is<invokeop>()
.IdRef.Is<idreference>()
.NameIs("b");
// look for 'c'
rf.Body.Is<binaryop>()
.Right.Is<binaryop>()
.Right.Is<invokeop>()
.IdRef.Is<idreference>()
.NameIs("c");
}

In order to make all this possible, I defined a few extension methods:

internal static class TreeAssertions
{
public static T Is<T>(this ExpressionBase node) where T : ExpressionBase
{
Assert.IsInstanceOfType(typeof(T), node, "wrong node type");
return (T)node;
}

public static IdReference NameIs(this IdReference node, string name)
{
Assert.AreEqual(name, node.Id.Name, "names don't match");
return node;
}

public static BinaryOp OperatorIs(this BinaryOp node, InforceScriptTokenId op)
{
Assert.AreEqual(op, node.OperatorTokenId);
return node;
}
}

As you can see, the 'Is' test checks the type of a node, and then returns the node, so I can continue checking other things for the same node (assuming it 'passed' the check).  The same is true for the 'NameIs' and 'OperatorIs' checks.  This sort of programming is generally referred to (I think) as 'Literate Programming' - a technique for which the venerable D. Knuth is given the credit.  However, in order to do this sort of thing in the past, I'd have needed to put all these methods on my base class for the tree nodes, something that would have absolutely been the 'wrong' thing to do (since this test code should not be part of the library-proper).  (By the way, I think this style is now being referred to as 'fluent interfaces' in programming circles).

I can't wait to see what else I can find to use these methods for.  I've already found it to be an amazing benefit to my productivity and the readability of my tests.

No comments: