Thursday, June 5, 2008

CI / TeamCity is Seriously COOL!

OK... I'll be the first to admit that I'm just getting into Agile processes and I'm still a bit skeptical.  At first, I thought (CI = Continuous Integration) 'CI builds - is it really worthwhile?'.  Now, I've got a TeamCity site & build agent up and going, and I'm totally SOLD!

Here are the benefits as I see them for our situation:

  1. We know almost immediately when someone broke the build (they know too!)
  2. We have better checkin quality now that people are tired of getting those 'compilation failed' emails.
  3. We always have a source to go to for a 'current' build - no need to get the sources and build on your own machine, or go ask a 'build master' to get you a build.
  4. We have other 'automation' points that we can hook into when we're ready to move on to bigger & better methods.

As an example of #4, I hope to soon have our NUnit tests running as part of an automated build.  I also think we can have automated installer builds going if we wanted to.  And, best of all, by virtue of TeamCity's ability to 'watch' our source control server for updates, and it's ability to run any arbitrary command line, NAnt, or MSBuild (or many more) task in response to those updates, the sky is the limit!

I can't wait to get more 'good stuff' implemented on TeamCity.

Wednesday, June 4, 2008

The Day of Bugs

Ok, I can honestly say that today was one of the weirder days that I've had in a long while.  I don't know about others, but I can say with confidence that I've never personally identified a bug in Visual Studio in my career.  I've seen plenty of them mentioned by other folks, I've seen 'features' that I'd be inclined to call a bug (but could be interpreted either way), but I've never really found a bug myself.

Today, I found two.  I guess it's a case of 'when it rains, it pours'.  One of them was known long before I 'found' it, but obviously not known to me.  The other, I'm pretty confident, is still unknown to 'everyone'.

Bug #1 - Dynamic Version vs. BAML.

Ok...  So we've done a fair bit of playing with WPF on my project, and we've done some custom control development (user controls) in WPF for use in our application.  WPF is very convenient for being able to prototype and design the UX/UI of something without being bogged down by all the crap you have to do to customize WinForms (our users seem to never like the 'way it is').  I can say that I feel pretty comfortable that my skills with WPF, while not the best on the block, are probably up there along with most of the folks currently doing WPF development.  I've done a ton of data binding work, and feel pretty confident that I know most of the tricks there - especially thanks to the wonderful work of Bea Costa!

So, I was very puzzled when one of our developers started having issues with running our application's forms that use one of my user controls.  The user control was pretty simple - it was a list of names that had alternating highlights for the rows (one row was white, one was gray, etc.).  It did a few other things, but mostly that was the gist of it.  This is one of the simpler controls we have.  Anyway, the weird thing was that the 'bug' that we kept seeing only appeared when running the debug build of our application, and it only appeared when our UI was being used from the APL application.  It never appeared in release builds, and it never appeared when running debug mode in our UI test harness.

So, naturally, I looked first at the APL runtime, thinking it was a bad install on this dev's machine.  We then took his build and his APL workspace and ran it on my machine.  To my surprise, it crashed on my machine too.  Then, we tried running one of my builds on his machine - it worked (also to my surprise!).  So then, I concluded it was a problem with his machine.

Two days later, after he had gotten some other work done and managed to uninstall all of .NET 2.0 through 3.5, VS2005 and VS2008, and then reinstall all of them (carefully in order), he tried it again.  BOOM!  It still didn't work.  I brought him my old laptop, and I had IT set it up for him to be able to use it instead of his desktop, thinking we'd be rebuilding his desktop from scratch.  All the while, still being puzzled by the fact that the behavior ran around to different machines and environments and was so skittish.

Later that day, he came over to my desk and told me the problem started appearing on his release builds too.  I thought, "oh great - a viral bug!".  He then said that the problem also started appearing on my builds.  At this point, I thought - "ok, there's gotta be something else going on here".

The bad part about this bug was this - whenever you ran the application, it would look like it wanted to pop up an error dialog, in fact it would show the thread exception dialog (System.Windows.Forms.ThreadExceptionDialog) briefly (actually several of them on top of each other), but then the application would disappear before you could do anything.  Apparently, looking back, the problem was on one of WPF's "special" threads and APL apparently doesn't react very nicely to the .NET AppDomain having threads other than the main UI thread throw exceptions.

Finally recognizing that I might be able to do something about this, I went into the code and added an exception handler with a plain-old message box in it (e.ToString()).  Looking at the exception text, I saw that it said something about a XAML parse error and that my ValueConverter couldn't be loaded (I had a ValueConverter as a static resource in my XAML for getting the backcolor brush for doing the highlighting).  This error message pointed me to: Rob Relyea's blog post (along with several MSDN forums posts).  The only thing was that his post didn't apply completely to my issue.  But, the workarounds did.  It turned out that I was using AssemblyVersion(1.0.*) in my files (which I really like for our 'in dev' work), but it was causing problems.  It seems that the reason the bug was so 'fleeting' was that there must have been a timing issue on the fourth bit of the version number (the revision), since it's based on a timestamp.

Apparently, my computer is too fast (most of the time), so I didn't see this bug on my builds, just on my colleagues'!  As I said, this bug has been known about for a long time, and while I'm not thrilled with the workaround, it's there and working, so I'll live with it.

BTW, I spent nearly 4 days chasing this bug, off and on (my colleague did most of the legwork).  It wasn't much fun, being that we couldn't get an error message for the first 3 of those.

Bug #2 - VS2008 STL vs. NUnit, C++/CLI, and ME!

So, I spent the last 4 days chasing what I though must have been a bug in our code, only to figure out that I'm pretty sure it's a bug in MS's STL implementation (for Debug builds).  However, this bug is really hard to find (though it, at least, is VERY consistent - i.e. happens every time and is very reproducible).

First, some background on our design.  Our application consists of two major pieces, a UI, and a backend calculation engine.  The UI is written in .NET 3.0/3.5 (C#), and the backend is written in C++ (native).  However, these systems need to be able to share a common file format.  To meet this need, we developed a generic file library in native C++ code that can be used to read/write files for our system.  The files are basically like MS's structured storage, but with the features and interfaces that we desired for our system (along with a design oriented towards meeting our required performance characteristics).

All of our file structures are build upon this file library, along with several other native libraries that support it and some of the other 'shared' features.  We also have a generic 'key' library (this is a domain concept for us - you can think of it as a property bag with some special 'matching' features).

In order to support using these libraries from both C++ and C# code, we decided we'd write the implementations in native C++ using traditional object-oriented design principles (many of which were lifted from my C#/Java experiences), and then write a thin C++/CLI (the managed C++ language) wrapper around this library using the IJW (it just works) interop supported by C++/CLI.  Then, our C# clients would call into the C++/CLI managed library (not even knowing that it's implemented in C++) just as they would call into managed C# code, but be able to use the underlying data structures and implementation of the native libraries.  I think this design is pretty elegant, and we solved quite a few interesting issues when developing it.  We've been using it now for some time and it's working quite well.

So...  now, the bug.  Just recently we needed to add a new feature to the 'key' library.  This library is very simple and the feature was also quite simple.  I added it to the native code, added it in the managed C++/CLI library, and added the C# unit test code to exercise it.  By convention, we only run our unit tests in Release mode, unless we're debugging them, since we only want to take the time to test in one build environment and it makes most sense to test the bits that are going out the door...

Anyway, so I tested in release mode, the tests all passed, and I checked in.  I was happy and I went home for the day.  The next day, I happened to be compiling in Debug mode and I decided to run what I was working on.  I had forgotten that my startup project in VS2008 was set to the unit tests for the 'key' library, so the unit tests ran instead of what I was intending to test.  To my surprise, the unit tests for the 'key' library 'exploded' (they didn't fail, they caused an AccessViolationException that was caught by VS2008 and popped up in the debugger!).  The AV was showing up on the destructor call for one of our unmanaged C++ objects (native library).

To make matters worse, the bug only showed up when the finalizer was called for the class, and even weirder, only showed up when the variables were not deterministically destroyed (i.e. using IDispose and 'using').  Since my unit test code wasn't using 'using' anywhere, I saw the 'explosion'.  But, I saw it only when the finalizer was called (much later than the offending code, obviously).  I invested some time writing code to track which object was the culprit, and after figuring out which (using 'value numbering', a technique I use a lot in single-threaded debugging of applications with lots of object instances and no unique identifiers in them), I followed the code.  It was not at all obvious why there was a problem.  In fact, I looked at it for several days and couldn't figure out what the problem was.

I then posted here, and called my Arch. Evangelist at MS and talked with him about it, and still couldn't figure out what it is (the spoiler is the last post in the thread).  I finally had to resort to the 'commenting' technique to track down the bug.  I first commented out all unit tests and started adding them back in one by one.  Once I found which unit test failed, I commented out the entire test and started adding code back in block by block.  Once I found the offending block, I looked down into the C++ code and STILL couldn't find anything wrong with it.

At that point, I decided I needed to think outside the box.  I looked at what was different between the various method calls that worked and didn't work, and decided that the throwing of the exception might be a problem.  First, I removed all the code in the offending method.  Now, my test failed, but it didn't explode.  So, I put the code back, and looked at the exception more thoroughly - I decided to move it to the head of the function.  That also caused my test to fail, but it didn't explode.  So then I concentrated on the code before the exception (in the original function).  I kept saying to myself - there's NOTHING wrong with this code!  (you can see the code in the MSDN post).  Finally, I thought - "it has to be this code, so let's assume it's broken and figure out when and why".  I then restructured the code (as described in the MSDN post) and determined that it was absolutely something in the 'begin()' or 'end()' STL vector calls.  There's no way I messed that up - that's their code.  Voilà - bug #2.

For this one, I'm going to have to figure out how to submit it to MS.  I'm sure nobody's seen this one (at least as far as I can tell from searching).

Whew!  Looking forward (hopefully) to not having very many more of those days!

Saturday, May 31, 2008

Redistributing C++ runtime components

Ok.  I've had to look for this several times over the years.  This time, I finally decided to just post it on my blog so I can find it again later :)

Here it is

Hopefully it helps you sometime too...

Thursday, May 29, 2008

StL gave Oakwood 1/2 Million Dollars???

What the hell?

Article in recent STL Business Journal:

"St. Louis development fund invests in Oakwood Systems.Oakwood Systems Group Inc. received a $500,000 investment for expansion from the St. L Business Development Fund, which invests in St. L based companies that are unable to access sufficient senior debt to finance organic growth or acquisition.Creve Coeur MO based technology consulting firm Oakwood Systems is one of the fastest growing private companies in the St. Louis area, according to Business Journal research. The company reported $7.5 million in fiscal 2006 revenue, a 92.3 percent increase from fiscal 2004 revenue of $3.9 million. The firm has 85 employees and an office in Nashville, TN."

If this isn't the funniest thing I've heard in a long time - it's gotta be up there. Of course this will only be funny to those of you who have heard my stories about my time at Oakwood and the goings on there.

Monday, May 26, 2008

Patching from a single .exe

We've been doing some work lately to try to figure out a good way to make our software patches go out the door as a single .exe. We aren't using MSI for our patches because we need to be able to do some things that MSI can't do (i.e. install older patches over newer versions, verify the current system version before installing, always install files, regardless of whether they are have been modified by the user or not, etc.).

So, I was thinking about different approaches. As I saw it, there were:
  1. Package everything in a zip, require users to download the zip and extract it into a temp folder, and run a .exe inside the zip that does the patch. Then require the user to delete the folder on their own.
  2. Build an .EXE that has resources in it for the files that were to be included in the patch - have the .exe extract these resources to disk in the proper locations and perform the other logic necessary.
  3. Build a generic packaging process, package stub, and 'package runner' that can solve the problem as well as other similar problems should we ever want to package anything else as a single .EXE (besides patches).

I didn't like (1) for the UX - I think it's pretty nasty UX-wise. I didn't like (2) because there are several problems - (a) the person who would be building the patches doesn't really work in VS, and I don't want them compiling the code. I also don't want to write code to 'package' the resources - it's a pain. (b) I already know one other case where we will need to package stuff into a single .exe for deployment that isn't the same as our patch process, so I'd have to build a similar project again - I hate duplicate code...

I decided to go down the path of (3). I knew from my Win32 days that the .EXE loader will be perfectly fine with an .EXE with 'junk' concatenated to the end of it. I've done this several times in the past - the PE format doesn't have any problem with 'extra' bits at the end of the file. So, I decided to go with the idea of a stub .EXE with a 'package' tacked on the end.

The next question was - what should my package support? I decided that for me a package would be two things - a list of files, and an entry assembly (that gets run by the package stub upon loading). The package would be a 'manifest' (an XML file that lists the package contents), along with a stream of bytes for the package data. For my needs, it was sufficient if all packages contained (1) a list of files, (2) an entry assembly, (3) an argument to pass to the entry assembly on running it, and (4) a list of assemblies upon which the entry assembly depended. The entry assembly would be loaded and executed by the package stub (.EXE) upon running the .EXE.

So, the file looks like:


Stub .EXE
Package Data Bits
Package Manifest
Package Trailer


Where the data bits are GZip compressed streams of bits corresponding to the items listed in the manifest, and the Package Trailer is a 'header' (trailer actually) that is of fixed length and identifies this file as a valid package (i.e. has a identifying 'magic number') and contains offsets of the various items within the file (i.e. the location of the first byte of data bits, and the length of the manifest).

Since my support code for this functionality is in a class library (I called it 'Packaging'), the class library is embedded in the stub .exe as a resource. The stub performs the following steps when it is run:

  1. load the resource "Packaging" and call Assembly.Load on the byte array.
  2. set up a 'resolver' to resolve this assembly for the other assemblies that might be loaded by this package.
  3. unpackage the dependencies of the entry assembly and load them as well
  4. unpackage the entry assembly
  5. look for a 'Package' class in the global namespace of the entry assembly, and cast it to an IPackage (defined in Packaging assembly)
  6. call 'Execute' method on the IPackage from (5), passing an IPackageHost interface that provides functionality for reading the contents of the package and interacting with the stub.

(2) is an event handler attached to the AppDomain.CurrentDomain.AssemblyResolve event.

The IPackageHost is an interface that allows the IPackage.Execute implementation to gain access to the entries (and bits) in the manifest. This host interface is provided by the stub, but can also be implemented by other code in order to do testing, extract packages, run them separately, etc. In fact, my Packaging library provides the implementation of IPackageHost via a PackageHost class that is used by the stub .EXE to provide the necessary support to the entry assembly's IPackage implementation.

If you want more details on how this all works, give me a call, or shoot me an email - I'll be happy to provide more details.

Saturday, May 24, 2008

Another great ALT.NET event!

Yet another great ALT.NET event today. Thanks Glenn for the conversation in the car on the way there (and congrats / good luck on your new job on MEF), thanks Justin for the space, and thanks everyone else for the great conversations.

Hopefully people are interested in coming to my place in July or early Aug for one.

I'll try to be more diligent about blogging some of my recent dev work. I think it's pretty interesting and I haven't found similar content elsewhere.

Monday, April 21, 2008

Coolest weekend ever.

Ok... the weather sucked - it was snowing here - but otherwise this weekend was awesome. I got to meet a lot of great people, and some real top minds in the field. Who did I meet? Among others - Brad Abrams (.NETfx team, coauthor of framework guidelines book), Jeremy Miller (the StructureMap guy), Glenn Block (P&P guy), Scott Hanselman (goes without introduction), Martin Fowler (the 'Refactoring' book guy), Charlie Poole (the NUnit guy) ... the list goes on (no offense please if I met you and didn't include you in this list).

I had a wonderful lunch conversation with Brad for about 20-30 minutes over lunch. It's amazing to have someone like Brad come sit down next to you, introduce himself, and then share a one-on-one conversation with you. It'd be like Fisher Black coming and sitting down next to an actuary and introducing himself and asking about your work. It's very hard to not feel intimidated in such a situation, but not only was Brad not intimidating, but he's a really genuinely nice guy to talk to. Hopefully I'll have the luck of meeting him again and having more conversations. hehehe... maybe he'll ask me if I want a job someday :) I can dream, right?

All in all, this is the best work-related event I've ever been to, and it's probably the best weekend of my working life thus far. I definitely consider the 30 minutes I spend with Brad to be the best 30 minutes of my career to this point.

If you ever hear of an 'open spaces' event, I highly recommend you jump on the opportunity to attend. Also, if you ever want to go to ALT.NET events, hopefully I'll see you there.