Friday, January 30, 2009

IDisposable and Garbage Collection

I've had several discussions recently with friends and associates about garbage collection, IDisposable, and the disposal pattern recommended by Microsoft.  These discussions have brought to light many misconceptions, a few incorrect statements, and some good and bad advice.  In particular, the 'final straw' that led me to write this blog post is the post by Jeff Atwood (@CodingHorror) on his blog, and the response by Jeff Tucker on his blog (Agilology).

I tend to lean more towards agreeing with Jeff Atwood (@CodingHorror)'s post, in that I believe that calling Dispose is absolutely an optimization.  That said, it's an optimization that in some cases simply should not be optional for all but those people who understand the precise implications of not doing so.  For instance, SqlConnection.  It's absolutely best practice to dispose of SqlConnection as soon as you can do so in applications that might use connection pooling.  On the other hand, there are resources such as FileStream, DataSet, and the many WaitHandle-derived types that you may want to dispose of early, but it's absolutely an optimization (either in terms of memory footprint or resource contention) to do so early, not a requirement.

Among the things that can be said about IDisposable and GC, there are a few that I want to get out of the way first.  First of all, the GC does not care about IDisposable at all!  It is simply not aware of whether your class implements IDisposable or not.  That said, many classes that implement IDisposable also implement a finalizer (discussed later), of which the GC is intimately aware.

Garbage Collection

First, before I go too far, let me make a brief description (simplified, of course) of the garbage collector and the process it follows to collect your unused objects.  Basically put, you go along happily creating new objects from the managed heap, and the GC follows behind you and computes which of those objects are no longer 'reachable' from your program and cleans up after you.

The GC divides the garbage collection process into stages, called "generations".  All objects start their lifetime in generation 0.  Objects that "survive" a GC in generation 0 move to generation 1.  Those that survive generation 1 move to generation 2.  Earlier generations are collected more frequently than later generations.

The collection pass is actually two passes - the marking pass, and the reclamation (collection) pass.  The marking pass is where the GC goes through all GC roots in the application and marks 'reachable' objects and all objects reachable from them as "live".  The collection pass is where the GC goes through all objects in the generation(s) being collected and frees those objects that aren't being collected (it may also relocate objects to compact memory if it decides this is useful).

There's a bit more complexity to it when finalizers are involved.  When an 'unmarked' object to be collected contains a finalizer that hasn't been suppressed using GC.SuppressFinalize(this) (presumably in IDisposable.Dispose) then the object is not freed, but rather is moved from the finalization queue to the freachable queue - and is marked (and all objects reachable from it are also marked).  Obviously the computation is done in a way that makes sense, not iteratively as it sounds from my description, but that complexity is not important to our discussion.

For more information on the GC, I recommend Jeffrey Richter's CLR via C#, an excellent book on this and many other topics related to advanced .NET programming.  However, I would caution you against reading his "Weak Events" example in that chapter as it is wrong (my next blog post will describe why and how to detect the invalidity of his approach).  Also, there are several blogs that are good for GC and other debugging bits, such as Maoni's or Tess's blogs.  A good start might be Tess's post here. (special thanks to my buddy Rich Lang for his tips on resources to recommend).

IDisposable and its uses

There are several different reasons people might use IDisposable.  Of them, there are two that are the most popular and probably the only ones that "normal" developers should ever put into action.  The first, most obvious, use of IDisposable is when your object needs to own 'unmanaged' resources, either directly or indirectly.  For instance, if you're writing a .NET class that manages some resource that you obtain via a P/Invoke call to some unmanaged library and you need to "free" or "release" that resource when your object is no longer being used.  In this case (direct ownership of unmanaged resources), IDisposable is not strictly necessary, but a finalizer is absolutely necessary.  If the object you're trying to manage is a Win32 handle (closed by CloseHandle), you should probably look at the SafeWaitHandle and SafeFileHandle classes, as well as the SafeHandleMinusOneIsInvalid and SafeHandleZeroOrMinusOneIsInvalid classes.

The other popular use of IDisposable is for RAII (a C++ concept - "resource acquisition is initialization", whereby a resource is acquired as a constructor call, and released when it goes out of scope - i.e. a using statement body).  An example of this usage is the TransactionScope object, where you acquire the transaction by "newing up" a TransactionScope object (in a C# "using"), and you release it when you exit the using statement.

I'll take these two uses in turn.

Resource Ownership

I call the first of the two use cases for IDisposable "Resource Ownership" as your object is the consumer of some resource either directly or indirectly and should free those resources when applicable.  There are two forms of resource ownership, direct and indirect.  Direct is, as it sounds, when your object has direct ownership over a resource.  If the resource is unmanaged (that's really what we're talking about here), you must implement a finalizer for your object, and in that finalizer you should dispose of the resource.  Also, since your finalizer is only executed when your object is collected, it's generally a good idea to give users of your object the opportunity to release the resource 'early'.  For this reason, you implement IDisposable and the disposal pattern (described below).

If, on the other hand, you only have indirect ownership of unmanaged resources, you don't need a finalizer.  Instead, you should only provide the IDisposable interface and implement the disposal pattern.  If you provide a finalizer when it isn't needed, you will, in effect, be delaying the GC cleanup of your object unnecessarily, since objects with finalizers survive at least two collections after they are freed, if not more.

RAII in C#

I refer to the second of the two cases for IDisposable as "Resource Acquisition".  The canonical example for this is TransactionScope, in my mind, but another example is a Mutex acquisition class (not to be confused with the FCL's Mutex class) or any other similar class.  These classes can make your code much easier to maintain and read if used properly, but can introduce some very difficult to detect bugs if used improperly, so use them with caution.  For this use of IDisposable, you aren't really using IDisposable because you own resources, but rather because you're building a class that should have acquire/release semantics and the syntax for doing so with C#'s "using" statement is very nice and clean.

There are several places where this pattern is used, TransactionScope is the one that comes to mind for me in the FCL, but Oren (Ayende Rahien) uses this pattern in Rhino Mocks and you see it in several other frameworks.  Jeffrey Richter describes it in his book in the chapter on memory management, and makes the recommendation that if you are building a library to be used by others outside of your production code, you should make your RAII (my name, not his) objects reference types, but if you are using them only internally, they can be made very efficient through the use of value types that implement IDisposable.  If you are writing these types for libraries, you should take great care to ensure that they free the resources they own at most once, and should think very carefully about whether these types need a finalizer (I believe the jury is out on this one, but I'd say they do and should implement the disposal pattern just as if they were managing a unmanaged resource).

It should be noted that while I call this "RAII in C#" it really isn't quite the same as RAII in C++, since in C++ destructors are guaranteed to be called when the object goes out of scope, whereas there's really nothing in the C# language / compiler that requires you put these objects in a "using" block, and thus there's nothing that guarantees that their Dispose() method gets called automatically if you choose not to use the using construct.

The disposal pattern and its canonical implementation

Many resources describe the canonical implementation of IDisposable via the disposal pattern, so I'm not going to go into excruciating detail here.  I'll describe the basics of the pattern and refer the reader to other sources for exact details.  The core idea of the disposal pattern is that there are two ways in which you might want to cleanup after your object: finalization and explicit disposal via IDisposable.Dispose.  If your object is one that should have a finalizer, then the disposal pattern should absolutely be followed.  If you don't have a finalizer, then you don't strictly need to follow the disposal pattern (often this is the case for RAII applications that don't acquire resources that will deadlock the application if they aren't released, or those that are guaranteed to be properly used - because you're writing both the object and all code that uses it).  Even so, you're probably best to follow the disposal pattern every time you implement IDisposable and just leave the parts of the pattern empty that don't apply to your particular application.

The basic rules of the disposal pattern are:

  1. finalizers should not refer to other managed objects during finalization, since those objects may have already had their finalizers called.
  2. IDisposable.Dispose() should call IDisposable.Dispose on any objects owned by the implementing object that are IDisposable.
  3. IDisposable.Dispose() and the finalizer should BOTH free any unmanaged resources owned by the object.
  4. IDisposable.Dispose() should call GC.SupressFinalize(this) to mark the work of the finalizer as being already done.
  5. if your objects have shared state, then the finalizers should have code to guarantee that two finalizers being called at the same time is thread-safe.
  6. finalizers should not assume they are being called on any particular thread - therefore they cannot access TLS (thread-local storage) in any way, shape, or form!
  7. calling IDisposable.Dispose() shouldn't throw an exception if called more than once.
  8. method calls to any methods other than disposal methods (or the finalizer) should throw ObjectDisposedException if Dispose (or the finalizer) has already been called.

Microsoft's recommended approach for implementing the disposal pattern is to have a non-public (protected) virtual (unless your class is sealed) method called Dispose on your object that takes a boolean argument called "disposing".  This method should be called both by the finalizer and by IDisposable.Dispose, and if you implement a "Close" convenience method or some other method that does the same thing as IDisposable.Dispose, that should also call this single-argument version of Dispose.  In this Dispose method, your class should free any unmanaged resources, and if "disposing" is true, should also call Dispose on any IDisposable members of your class.  It should also set a flag so you know to throw ObjectDisposedException when any of your other methods or properties are accessed.  Finally, the Dispose method should call GC.SuppressFinalize(this) to notify the GC that the finalizer need not be called.  Then, the object should implement the finalizer as a call to Dispose with disposing = false, and IDisposable.Dispose as a call to Dispose with disposing = true.

Sample code is as follows:

   1: class MyClass: IDisposable
   2: {
   3:   [DllImport(...)] // assume this is correctly specified.
   4:   private extern void Free(IntPtr handle);
   6:   private IntPtr _UnmanagedThing;
   7:   private FileStream _LogFile;
   9:   private bool _IsDisposed;
  11:   public MyClass(FileStream logFile, IntPtr unmanagedThing)
  12:   {
  13:     // check arguments and don't allow finalizer if they aren't valid.
  14:     GC.SupressFinalize(this);
  15:     if(logFile == null)
  16:       throw new ArgumentNullException("logFile");
  17:     if(unmanagedThing == IntPtr.Zero)
  18:       throw new ArgumentException("unmanaged thing is invalid!",
  19:                                   "unmanagedThing");
  20:     GC.ReRegisterForFinalize(this);
  22:     _UnmanagedThing = unmanagedThing;
  23:     _LogFile = logFile;
  24:   }
  26:   protected virtual void Dispose(bool disposing)
  27:   {
  28:     // we can skip doing anything if it's already been done.
  29:     if(_IsDisposed)
  30:       return;
  32:     if(disposing)
  33:     {
  34:       // dispose of managed resources here, since we
  35:       //   were called from IDisposable.Dispose()
  36:       _LogFile.Dispose();
  38:       // make sure we know that we're disposed for other calls.
  39:       _IsDisposed = true;
  40:     }
  42:     // free unmanaged resources in either case (IDisposable or
  43:     //   Finalize) and make sure the finalizer doesn't get called
  44:     //   later by the GC.
  45:     Free(_UnmanagedThing);
  46:     GC.SuppressFinalize(this);
  47:   }
  49:   public void Dispose() { Dispose(true); }
  50:   ~MyClass() { Dispose(false); }
  52:   public void DoSomething()
  53:   {
  54:     // some function not related to disposal of the object,
  55:     //   but requiring valid state...
  56:     if(_IsDisposed)
  57:       throw new ObjectDisposedException();
  58:   }
  59: }

Again, if you are working with a handle that is typical of Win32, you should look at the classes mentioned above (SafeHandle and it's family of derived-classes) as they do much of the work for you.  You should also probably have a look at CriticalFinalizerObject as well, and the MSDN topic "Safe Handles and Critical Finalization", especially if you expect your code to be run in a hosted environment other than traditional .NET applications (i.e. IIS7, COM+, SQL Server, etc.).

So what's the point?

So, now, as I reread the beginning of my blog post, I wonder - what was the point I was trying to make?  Well, it's basically this - the use of IDisposable to dispose of objects early is an optimization, assuming the disposal pattern was correctly implemented by author of the objects you are calling.  That is, unless you are calling RAII-style objects, in which case forgetting to call Dispose and not using these objects in a using block could be disastrous to your program.  On the other hand, there are several cases where it's extremely important to dispose of objects as soon as you are done with them, for instance when dealing with SqlConnection. 

On the other hand, there are plenty of objects that are fine to allow GC to collect them and "finalize" them, and there is little or no perf impact to doing so (possibly even a positive impact of not forcing early cleanup).  As a for-instance, consider a managed class that wraps an unmanaged resource that is not subject to contention (like some sort of unmanaged object in a library that is using the C/C++ heap to allocate these objects).  If you create a large number of these objects, but are not at risk of running out of memory, it can be much faster to allow the GC to collect these objects (through finalization) than having your code call Dispose on all of them and forcing early cleanup (forcing the application to incur the cost of freeing this unmanaged memory on your user threads instead of the finalizer thread).

As with any of my posts and most of the advice on the CLR in general, the most important takeaway from this blog post should be "learn the details and use your own judgement".  Happy coding.

No comments: