Friday, January 30, 2009

IDisposable and Garbage Collection

I've had several discussions recently with friends and associates about garbage collection, IDisposable, and the disposal pattern recommended by Microsoft.  These discussions have brought to light many misconceptions, a few incorrect statements, and some good and bad advice.  In particular, the 'final straw' that led me to write this blog post is the post by Jeff Atwood (@CodingHorror) on his blog, and the response by Jeff Tucker on his blog (Agilology).

I tend to lean more towards agreeing with Jeff Atwood (@CodingHorror)'s post, in that I believe that calling Dispose is absolutely an optimization.  That said, it's an optimization that in some cases simply should not be optional for all but those people who understand the precise implications of not doing so.  For instance, SqlConnection.  It's absolutely best practice to dispose of SqlConnection as soon as you can do so in applications that might use connection pooling.  On the other hand, there are resources such as FileStream, DataSet, and the many WaitHandle-derived types that you may want to dispose of early, but it's absolutely an optimization (either in terms of memory footprint or resource contention) to do so early, not a requirement.

Among the things that can be said about IDisposable and GC, there are a few that I want to get out of the way first.  First of all, the GC does not care about IDisposable at all!  It is simply not aware of whether your class implements IDisposable or not.  That said, many classes that implement IDisposable also implement a finalizer (discussed later), of which the GC is intimately aware.

Garbage Collection

First, before I go too far, let me make a brief description (simplified, of course) of the garbage collector and the process it follows to collect your unused objects.  Basically put, you go along happily creating new objects from the managed heap, and the GC follows behind you and computes which of those objects are no longer 'reachable' from your program and cleans up after you.

The GC divides the garbage collection process into stages, called "generations".  All objects start their lifetime in generation 0.  Objects that "survive" a GC in generation 0 move to generation 1.  Those that survive generation 1 move to generation 2.  Earlier generations are collected more frequently than later generations.

The collection pass is actually two passes - the marking pass, and the reclamation (collection) pass.  The marking pass is where the GC goes through all GC roots in the application and marks 'reachable' objects and all objects reachable from them as "live".  The collection pass is where the GC goes through all objects in the generation(s) being collected and frees those objects that aren't being collected (it may also relocate objects to compact memory if it decides this is useful).

There's a bit more complexity to it when finalizers are involved.  When an 'unmarked' object to be collected contains a finalizer that hasn't been suppressed using GC.SuppressFinalize(this) (presumably in IDisposable.Dispose) then the object is not freed, but rather is moved from the finalization queue to the freachable queue - and is marked (and all objects reachable from it are also marked).  Obviously the computation is done in a way that makes sense, not iteratively as it sounds from my description, but that complexity is not important to our discussion.

For more information on the GC, I recommend Jeffrey Richter's CLR via C#, an excellent book on this and many other topics related to advanced .NET programming.  However, I would caution you against reading his "Weak Events" example in that chapter as it is wrong (my next blog post will describe why and how to detect the invalidity of his approach).  Also, there are several blogs that are good for GC and other debugging bits, such as Maoni's or Tess's blogs.  A good start might be Tess's post here. (special thanks to my buddy Rich Lang for his tips on resources to recommend).

IDisposable and its uses

There are several different reasons people might use IDisposable.  Of them, there are two that are the most popular and probably the only ones that "normal" developers should ever put into action.  The first, most obvious, use of IDisposable is when your object needs to own 'unmanaged' resources, either directly or indirectly.  For instance, if you're writing a .NET class that manages some resource that you obtain via a P/Invoke call to some unmanaged library and you need to "free" or "release" that resource when your object is no longer being used.  In this case (direct ownership of unmanaged resources), IDisposable is not strictly necessary, but a finalizer is absolutely necessary.  If the object you're trying to manage is a Win32 handle (closed by CloseHandle), you should probably look at the SafeWaitHandle and SafeFileHandle classes, as well as the SafeHandleMinusOneIsInvalid and SafeHandleZeroOrMinusOneIsInvalid classes.

The other popular use of IDisposable is for RAII (a C++ concept - "resource acquisition is initialization", whereby a resource is acquired as a constructor call, and released when it goes out of scope - i.e. a using statement body).  An example of this usage is the TransactionScope object, where you acquire the transaction by "newing up" a TransactionScope object (in a C# "using"), and you release it when you exit the using statement.

I'll take these two uses in turn.

Resource Ownership

I call the first of the two use cases for IDisposable "Resource Ownership" as your object is the consumer of some resource either directly or indirectly and should free those resources when applicable.  There are two forms of resource ownership, direct and indirect.  Direct is, as it sounds, when your object has direct ownership over a resource.  If the resource is unmanaged (that's really what we're talking about here), you must implement a finalizer for your object, and in that finalizer you should dispose of the resource.  Also, since your finalizer is only executed when your object is collected, it's generally a good idea to give users of your object the opportunity to release the resource 'early'.  For this reason, you implement IDisposable and the disposal pattern (described below).

If, on the other hand, you only have indirect ownership of unmanaged resources, you don't need a finalizer.  Instead, you should only provide the IDisposable interface and implement the disposal pattern.  If you provide a finalizer when it isn't needed, you will, in effect, be delaying the GC cleanup of your object unnecessarily, since objects with finalizers survive at least two collections after they are freed, if not more.

RAII in C#

I refer to the second of the two cases for IDisposable as "Resource Acquisition".  The canonical example for this is TransactionScope, in my mind, but another example is a Mutex acquisition class (not to be confused with the FCL's Mutex class) or any other similar class.  These classes can make your code much easier to maintain and read if used properly, but can introduce some very difficult to detect bugs if used improperly, so use them with caution.  For this use of IDisposable, you aren't really using IDisposable because you own resources, but rather because you're building a class that should have acquire/release semantics and the syntax for doing so with C#'s "using" statement is very nice and clean.

There are several places where this pattern is used, TransactionScope is the one that comes to mind for me in the FCL, but Oren (Ayende Rahien) uses this pattern in Rhino Mocks and you see it in several other frameworks.  Jeffrey Richter describes it in his book in the chapter on memory management, and makes the recommendation that if you are building a library to be used by others outside of your production code, you should make your RAII (my name, not his) objects reference types, but if you are using them only internally, they can be made very efficient through the use of value types that implement IDisposable.  If you are writing these types for libraries, you should take great care to ensure that they free the resources they own at most once, and should think very carefully about whether these types need a finalizer (I believe the jury is out on this one, but I'd say they do and should implement the disposal pattern just as if they were managing a unmanaged resource).

It should be noted that while I call this "RAII in C#" it really isn't quite the same as RAII in C++, since in C++ destructors are guaranteed to be called when the object goes out of scope, whereas there's really nothing in the C# language / compiler that requires you put these objects in a "using" block, and thus there's nothing that guarantees that their Dispose() method gets called automatically if you choose not to use the using construct.

The disposal pattern and its canonical implementation

Many resources describe the canonical implementation of IDisposable via the disposal pattern, so I'm not going to go into excruciating detail here.  I'll describe the basics of the pattern and refer the reader to other sources for exact details.  The core idea of the disposal pattern is that there are two ways in which you might want to cleanup after your object: finalization and explicit disposal via IDisposable.Dispose.  If your object is one that should have a finalizer, then the disposal pattern should absolutely be followed.  If you don't have a finalizer, then you don't strictly need to follow the disposal pattern (often this is the case for RAII applications that don't acquire resources that will deadlock the application if they aren't released, or those that are guaranteed to be properly used - because you're writing both the object and all code that uses it).  Even so, you're probably best to follow the disposal pattern every time you implement IDisposable and just leave the parts of the pattern empty that don't apply to your particular application.

The basic rules of the disposal pattern are:

  1. finalizers should not refer to other managed objects during finalization, since those objects may have already had their finalizers called.
  2. IDisposable.Dispose() should call IDisposable.Dispose on any objects owned by the implementing object that are IDisposable.
  3. IDisposable.Dispose() and the finalizer should BOTH free any unmanaged resources owned by the object.
  4. IDisposable.Dispose() should call GC.SupressFinalize(this) to mark the work of the finalizer as being already done.
  5. if your objects have shared state, then the finalizers should have code to guarantee that two finalizers being called at the same time is thread-safe.
  6. finalizers should not assume they are being called on any particular thread - therefore they cannot access TLS (thread-local storage) in any way, shape, or form!
  7. calling IDisposable.Dispose() shouldn't throw an exception if called more than once.
  8. method calls to any methods other than disposal methods (or the finalizer) should throw ObjectDisposedException if Dispose (or the finalizer) has already been called.

Microsoft's recommended approach for implementing the disposal pattern is to have a non-public (protected) virtual (unless your class is sealed) method called Dispose on your object that takes a boolean argument called "disposing".  This method should be called both by the finalizer and by IDisposable.Dispose, and if you implement a "Close" convenience method or some other method that does the same thing as IDisposable.Dispose, that should also call this single-argument version of Dispose.  In this Dispose method, your class should free any unmanaged resources, and if "disposing" is true, should also call Dispose on any IDisposable members of your class.  It should also set a flag so you know to throw ObjectDisposedException when any of your other methods or properties are accessed.  Finally, the Dispose method should call GC.SuppressFinalize(this) to notify the GC that the finalizer need not be called.  Then, the object should implement the finalizer as a call to Dispose with disposing = false, and IDisposable.Dispose as a call to Dispose with disposing = true.

Sample code is as follows:

   1: class MyClass: IDisposable
   2: {
   3:   [DllImport(...)] // assume this is correctly specified.
   4:   private extern void Free(IntPtr handle);
   6:   private IntPtr _UnmanagedThing;
   7:   private FileStream _LogFile;
   9:   private bool _IsDisposed;
  11:   public MyClass(FileStream logFile, IntPtr unmanagedThing)
  12:   {
  13:     // check arguments and don't allow finalizer if they aren't valid.
  14:     GC.SupressFinalize(this);
  15:     if(logFile == null)
  16:       throw new ArgumentNullException("logFile");
  17:     if(unmanagedThing == IntPtr.Zero)
  18:       throw new ArgumentException("unmanaged thing is invalid!",
  19:                                   "unmanagedThing");
  20:     GC.ReRegisterForFinalize(this);
  22:     _UnmanagedThing = unmanagedThing;
  23:     _LogFile = logFile;
  24:   }
  26:   protected virtual void Dispose(bool disposing)
  27:   {
  28:     // we can skip doing anything if it's already been done.
  29:     if(_IsDisposed)
  30:       return;
  32:     if(disposing)
  33:     {
  34:       // dispose of managed resources here, since we
  35:       //   were called from IDisposable.Dispose()
  36:       _LogFile.Dispose();
  38:       // make sure we know that we're disposed for other calls.
  39:       _IsDisposed = true;
  40:     }
  42:     // free unmanaged resources in either case (IDisposable or
  43:     //   Finalize) and make sure the finalizer doesn't get called
  44:     //   later by the GC.
  45:     Free(_UnmanagedThing);
  46:     GC.SuppressFinalize(this);
  47:   }
  49:   public void Dispose() { Dispose(true); }
  50:   ~MyClass() { Dispose(false); }
  52:   public void DoSomething()
  53:   {
  54:     // some function not related to disposal of the object,
  55:     //   but requiring valid state...
  56:     if(_IsDisposed)
  57:       throw new ObjectDisposedException();
  58:   }
  59: }

Again, if you are working with a handle that is typical of Win32, you should look at the classes mentioned above (SafeHandle and it's family of derived-classes) as they do much of the work for you.  You should also probably have a look at CriticalFinalizerObject as well, and the MSDN topic "Safe Handles and Critical Finalization", especially if you expect your code to be run in a hosted environment other than traditional .NET applications (i.e. IIS7, COM+, SQL Server, etc.).

So what's the point?

So, now, as I reread the beginning of my blog post, I wonder - what was the point I was trying to make?  Well, it's basically this - the use of IDisposable to dispose of objects early is an optimization, assuming the disposal pattern was correctly implemented by author of the objects you are calling.  That is, unless you are calling RAII-style objects, in which case forgetting to call Dispose and not using these objects in a using block could be disastrous to your program.  On the other hand, there are several cases where it's extremely important to dispose of objects as soon as you are done with them, for instance when dealing with SqlConnection. 

On the other hand, there are plenty of objects that are fine to allow GC to collect them and "finalize" them, and there is little or no perf impact to doing so (possibly even a positive impact of not forcing early cleanup).  As a for-instance, consider a managed class that wraps an unmanaged resource that is not subject to contention (like some sort of unmanaged object in a library that is using the C/C++ heap to allocate these objects).  If you create a large number of these objects, but are not at risk of running out of memory, it can be much faster to allow the GC to collect these objects (through finalization) than having your code call Dispose on all of them and forcing early cleanup (forcing the application to incur the cost of freeing this unmanaged memory on your user threads instead of the finalizer thread).

As with any of my posts and most of the advice on the CLR in general, the most important takeaway from this blog post should be "learn the details and use your own judgement".  Happy coding.

Friday, January 23, 2009

Detecting if other instances of your app are running.

I got a question from a friend of mine the other day about why Process.GetProcesses() was returning an error when used under Terminal Services by a user that doesn't have the Debug Programs privilege.  It was because Process.GetProcesses() was trying to return the list of ALL processes running on the machine (including those not started by the current user).  This worked just fine when a single user was logged into the TS server, but when more than one user was logged in it failed.  My friend was very confused by this, until I told him that non-admin users need the Debug Programs privilege in order to open the process token of a process not created in their session.

Of course, his next question was "how do I get Process.GetProcesses() to return only the processes accessible to the current user's session?".  At that point I took a step back and asked "why?".

In fact, what he was trying to do was to detect whether another instance of his application was running, in order to gracefully tell the user that only one instance of the program can run at a time.  I asked him why he wasn't using a Mutex and the answer surprised me, just a bit.  He said "we were worried about what would happen if the process unexpectedly exited".  My next question was yet another "why?".

Apparently there's some confusion about how Mutexes work in Windows, as well as how the Mutex method that I've used for about 10 years to detect multiple instances of an app works.  I'm not sure where I first learned this technique from, but I've been using it since my Win32 programming days and it's worked like a charm ever since then.  I've now used it in several reincarnations (from VB6, Delphi, C++, and now .NET) and it's never failed me.

The first misconception that almost everyone has when hearing about this approach is that it has something to do with Mutexes (at least in the way they are used in multithreaded programs).  In fact, any Kernel named object could be used for this approach, including even a "delete on close" file.  However, I've always used mutexes because that's what was recommended to me, and I don't think it makes much difference since you never actually 'wait' on the mutex so it's not really important what type of object it is.  The idea of this technique is not to use a mutex, but to use the 'kernel namespace'.  Essentially, by creating a mutex in the naming scope appropriate for your purposes, you are 'registering' that name with the OS.  Then, if you were the first one to register the name, you continue, otherwise you show your message and quit.

The pattern looks like this:

bool createdNew;
MutexSecurity ms = new MutexSecurity();
MutexAccessRule mar = new MutexAccessRule(
new SecurityIdentifier(WellKnownSidType.WorldSid, null),
Mutex m = new Mutex(
out createdNew,
// show a message that only one instance is allowed and exit

where the first part (the MutexSecurity stuff) guarantees that anybody can access the mutex (the security probably doesn't need FullControl, but I didn't want to think about what it really wants, so I left it at that, no real security risk here since the only thing that could happen is other apps could grab hold of this mutex and use it for their own purposes, but that's not really going to do much).  The point of this is that the other instances of your app may not be started by the same user - this guarantees that only one app per machine is startable (because I'm using the Global namespace for my mutex name).  If, on the other hand, you want a per-session limit of a single instance (rather than per-machine) you could use the local namespace (change Global\ to Local\), which is local to each TS session.

The second part of that mess is the mutex creation code.  This code (as written) will create the mutex if it doesn't exist or return the existing one if one already exists with that name (and you have permissions to open it).  If you created it (because it didn't exist), then 'createdNew' will be set to True.  Otherwise, it will be false.  Either way, you get a valid Mutex object.

Then, you should keep this Mutex object alive for the lifetime of your application (i.e. save "m" in a variable that has the same lifetime as your application so that it doesn't get GCed and disappear on you).  A good way to do this is to make it a local variable in your Program.cs's main function.  When your application is done shutting down, you can close the mutex either as soon as you think it's ok for other instances to start, or let Windows reclaim the mutex for you.

Generally, I put this code in a separate library function that I can call from everywhere.  That function generally looks something like the following (in .NET):

/// <summary>
Attempts to create an application isolation mutex, and
/// return it to the caller. If the caller isn't the first
/// to create the mutex (i.e. it already exists) then we
/// return null to indicate that the caller "lost".
/// </summary>
/// <param name="objectName">
The name to use for the mutex,
/// should begin with Global\ if you want per-machine
/// isolation, or Local\ if you want per-session isolation.
/// If you want per-machine/per-user isolation (slightly
/// different from per-session) then you should mangle the
/// mutex name by putting the username in it somewhere.</param>
/// <returns>
Null if the mutex already existed, or the mutex
/// if it was created by this function. You should keep the
/// mutex in scope somewhere until you are ready to release
/// the isolation. You shouldn't use this mutex for locking
/// or anything else - just forget it's a mutex altogether.
/// </returns>
public IDisposable GetAppIsolationHandle(string objectName)
// setup the mutex security settings.
var ms = new MutexSecurity();
var sidWorld =
new SecurityIdentifier(WellKnownSidType.WorldSid, null);
var mar = new MutexAccessRule(

// create the mutex and return it if it's "ours".
bool createdNew;
var mutex = new Mutex(
out createdNew,
if (createdNew)
return mutex;

// return null if the mutex isn't "ours".
return null;

Then, you can call this code in your Program.cs as follows:

var isolationHandle
= GetAppIsolationHandle(@"Global\MyApp");
if (isolationHandle == null)
MessageBox.Show("Sorry, only one at a time!");
Application.Run(new Form());

which, of course, looks really nice (at least in my opinion).  If you had special "shutdown" stuff to do after Run returns, you can put that outside the using block (assuming it doesn't need to be isolated).