Welcome to Atalasoft Community Sign in | Join | Help

Using, Lambda, and RAII

One of the things like most about C++ for production coding is RAII.  I can make an object on the stack clean up resources by putting the cleanup code in the object’s destructor.  When the function/method goes out of scope, the destructor is called and then resources get cleaned up.  It’s a nice way to prevent memory leaks, for one.

C# doesn’t have this.  The closest thing to it is try/finally, and I don’t like that as a solution.  You have inevitably write code like this:

SomeResource r = null;
SomeOtherResource k = null;
try {
    // code for resources;
}
finally {
    if (r != null) r.Dispose();
    r = null;
    if (k != null) k.Dispose();
    k = null;
}

This feels clumsy and is largely due to the try block and the finally block having completely different scopes.  I;d prefer that they shared the same scope, even though it violates POLA.  Maybe if finally was set off with -{and }- it would signify a continuation of the previous scope.  Hell, catch should have that option too.

The real cool thing is that C# already has syntactic sugar to do something like this in the using() { } block, so we can use that to clean up.  Let’s start with a common task as a problem.  I’m working with a Stream and I want to do some processing on it which may fail.  My code insists that the stream position get restored on the way out.  So I’ll start off with this code:

void Process(Stream stm)
{
    long savePos = stm.Position;
    // do work
    stm.Seek(savePos, SeekOrigin.Begin);
}

Ok, that works.  But as the do work section grows, it can be easier to forget about that Seek at the end, and hey – what happens if you return or throw.  So you learn and refactor your code to look like this:

void Process(Stream stm)
{
    long savePos = stm.Position;
    try {
        // do work
    }
    finally {
        stm.Seek(savePos, SeekOrigin.Begin);
    }
}

That’s better.  It’s more resilient and does what we want, but I still don’t like it.  I don’t like that savePos’s declaration, assignment, and use is so badly located.  Further, since save pos is a normal local variable, it can get written over – maybe not intentionally.

So now let’s use using to help us out.  To do this, we’re going to start with a generic helper class that implements IDisposable (see my earlier blog IDisposable Made E-Z):

public class ResourceReleaser<T> : IDisposable
{
    private Action<T> _action;
    private bool _disposed;
    private T _val;
    public ResourceReleaser(T val, Action<T> action)
    {
        _action = action;
        _val = val;
    }
    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }
    public ~ResourceReleaser()
    {
        Dispose(false);
    }
    protected virtual void Dispose(bool disposing)
    {
        if (_disposed)
            return;
        if (disposing)
        {
            _disposed = true;
            _action(_val);
        }
    }
}

this now lets me implement the same stream code like this:

using(ResourceReleaser<long> r = new ResourceReleaser<long>(stm.Position, pos => stm.Seek(pos, SeekOrigin.Begin)) {
    // do work
}

Basically, resource released will encapsulate a single generic value at construction that it will pass to its Action delegate at disposal.  This will restore the stream position as expected.  I still don’t like it as there is a lot of cruft before the meat.  I can get around this to a certain extent by writing the following code:

public class StreamPositionRestorer : ResourceReleaser<long>
{
    public StreamPositionRestorer(Stream s) : base(s.Position, x => s.Seek(x)) { }
}

which defines a special purpose class to do the work we want.  Now my using block looks like this:

using (StreamPositionRestorer s = new StreamPositionRestorer(stm)) {
    // do work
}

and now I’ve got a chunk of code with which I’m happy.  Well, almost.  There is one other problem that can come up.  I can shoot myself in the foot badly by creating a StreamPositionRestorer outside of the using block.  If I do that, then the Stream will have its position changed at some indeterminate point in the future, courtesy of the garbage collector.  So, given that’s in issue, I really want this:

long savePos = stm.Position;
using (() => stm.Seek(savePos)) {
   // do work
}

so now the using block takes either an expression of type IDisposable or an Action and the action is executed when scope is done.  Written in C# without using, it would look like this:

long savePos = stm.Position;
Action action = () => stm.Seek(savePos, SeekOrigin.Begin);
try {
   // do work
}
finally {
   action();
}

note that I’ve taken the generic parameter away.  It doesn’t help us since it will be evaluated at action execution time and we need early binding.  It still has the problem of savePos being writable in the lifetime of the try block.  So how about it Anders?  using with lambdas?

Posted by Steve Hawley | 0 Comments

Fixing Future Mistakes

There is a process in hardware and OS management that I like to call Configuration Jeopardy.  It’s when you’re working on a particular task and you spend hours or days trying to solve a problem that you know should be easy to solve but neither Google nor in-built documentation are any help because you simply don’t know the right vocabulary.  I’ll take a single line to add in an obscure file in a subdirectory /etc for $500, Alex.

Whenever I manage to solve a problem like this, I want to find a way to amortize the cost of finding the answer for the poor sot who has to do it again in the future.

Here’s an example – I had upgraded an OCR engine as part of my tasks for DotImage 8.0.  All OCR engines that I’ve worked with can be a little quirky in terms of how to get them going and how to find their resources.  I’ve put in infrastructure in our code to make this as easy as possible, including code that will use an Atalasoft registry entry, if present, to help find OCR resources.  In the upgrade, I had unit tests that ran fine on my machine and ran fine in the continuous integration build, but were failing the nightly build.  Eventually, I narrowed down the difference to the fact that my machine and the continuous integration build had no problems finding the resources, but the nightly couldn’t.

I put in debug checks and eventually found that the issue was that the build server was 64 bit but since the OCR engine is 32 bit, it was looking in the WoW 32 bit registry.  That cost me 3 days of fiddling with debug scaffolding and waiting for nightly builds and unit tests – and would’ve cost more if I hadn’t been bouncing this problem off of one of our other engineers.

So how do you prevent this in the future?  One is to have a clear process in version upgrading that includes this (note to self – we need this).  Another is to put a sign post in the point of failure making it clear what to do.  To address the second part, I added a registry subkey in the area I had looked at that was named something like “WhenYouAddASubKeyHereAlsoAddItToXX”, where XX was the location in the WoW node.

I like to think of it like the Boy Scout adage of always leaving a place in better shape than when you found it.

Even More IEnumerable<T> Fun

This post is going to cover how to use (and abuse) extension methods to make it easier to write compilers and interpreters or to write code metrics tools.

Right now, it’s straight forward to loop over a set of paths, load assemblies (that can be loaded) and then loop over the types and then the methods.  Your code ends up fairly ugly though.  I know because I’ve written that code for unit testing.

I have a chunk of code that from a set of assemblies, gives me a list of classes that inherit from ImageCommand and themselves are concrete.  This code is a static method called ImageCommandForEachExcept – essentially, it runs a delegate on each class.  The code is straight forward, but it is ugly.  I wrote it to the .NET 1.1 framework and was limited by the set of tools that I had at the time.

Here is what I wanted to write (had it been available at the time):

 
public static IEnumerable<ImageCommand> GetImageCommands(IEnumerable<Type> types)
{
    var imageCommands = from t in types
                        where
                            t.IsPublic && !t.IsAbstract && !t.IsInterface && t.IsSubclassOf(ImageCommand)
                        select t;
    Type[] emptyTypes = new Type[0];
    foreach (Type t in imageCommands)
    {
        ConstructorInfo ci = t.GetConstructor(emptyList);
        if (ci == null)
            continue;
        ImageCommand command = ci.Invoke(null) as ImageCommand;
        if (command == null)
            continue;
        yield return command;
    }
}

from here, I can do:

foreach (ImageCommand command in GetImageCommands(myAssembly.GetTypes()) { … }

and that’s fairly beautiful, as far as code goes, but I want more – I’d like to be to, given a folder, get all the types from all the assemblies within that folder, so I created the following class:

public class AssemblyEnum : IEnumerable<Assembly>
{
    private IEnumerable<string> _paths;
    private static bool IsDll(string path)
    {
        string ext = Path.GetExtension(path);
        return ext != null && ext.ToLower().EndsWith("dll");
    }
    public AssemblyEnum(string path)
    {
        if (IsDll(path))
        {
            _paths = new string[] { path };
        }
        else
        {
            _paths = Directory.GetFiles(path, "*.dll");
        }
    }
    public AssemblyEnum(IEnumerable<string> paths)
    {
        _paths = paths;
    }
    public IEnumerator<Assembly> GetEnumerator()
    {
        foreach (string path in _paths)
        {
            if (!IsDll(path))
                continue;
            Assembly assem = null;
            try
            {
                assem = Assembly.LoadFile(path);
            }
            catch
            {
            }
            if (assem != null)
                yield return assem;
        }
    }
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

AssemblyEnum, given a path or a set of paths enumerates all the dll’s and attempts to load an assembly from the path.

Given that, I create the following extension methods:

public static IEnumerable<Assembly> Assemblies(this IEnumerable<string> paths)
{
    return new AssemblyEnum(paths);
}
public static IEnumerable<Assembly> Assemblies(this string path)
{
    return new AssemblyEnum(path);
}

which let me do this:

var assemblies = @“path\to\my\folder”.Assemblies();

now, given an Assemby, I can get all the types out of it with the following extension methods:

public static IEnumerable<Type> Types(this Assembly assem)
{
    Type[] types = null;
    try
    {
        types = assem.GetTypes();
    }
    catch
    {
        types = new Type[0];
    }
    return types;
}
public static IEnumerable<Type> Types(this IEnumerable<Assembly> assemblies)
{
    return assemblies.SelectMany(x => x.Types());
}

This lets me do this:

var types = @“path\to\my\folder”.Assemblies().Types();

which gives me an IEnumerable<Type> for all types in all the assemblies.  It also avoids a problem that Assembly.GetTypes() has, which is it throws if the assembly can’t be loaded.  Honestly, though, I don’t like the .Assemblies().Types() – I’d like to shorten that up a little bit, so let’s create a few more extension methods:

public static IEnumerable<Type> Types(this string path)
{
    return path.Assemblies().SelectMany(x => x.Types());
}
public static IEnumerable<Type> Types(this IEnumerable<string> paths)
{
    return paths.Assemblies().SelectMany(x => x.Types());
}

Now, I can simple to this:

var types = @“path\to\my\folder”.Types();

which is a heinous abuse of extension methods – a string going to a list of types?!  Yikes – the problem really is that a path can be represented by a string, but a string is not really a path.

Now, before you think that I have gone far enough, let me throw in four more extension methods:

public static IEnumerable<MethodInfo> Methods(this string path)
{
    return path.Types().SelectMany(x => x.GetMethods());
}
public static IEnumerable<MethodInfo> Methods(this IEnumerable<string> paths)
{
    return paths.SelectMany(x => x.Methods());
}
public static IEnumerable<MethodInfo> Methods(this IEnumerable<Assembly> assemblies)
{
    return assemblies.Types().SelectMany(x => x.GetMethods());
}
public static IEnumerable<MethodInfo> Methods(this Type type)
{
    return type.GetMethods();
}
These let me get all the Methods from a path to a dll (or path to a folder of dlls), a collection of paths to dlls, a collection of assemblies, or a type.

So now, I can do this:

int publicCount = @"path\to\my\folder".Types().Where(x => x.IsPublic).Count();

which gives me the count of all the public classes in all the assemblies in the folder.  I can also get a list of all the public methods in public classes by doing this:

var methods = @”path\to\my\folder”.Methods().Where(x => x.IsPublic && x.DeclaringType.IsPublic);

because Methods() is built to using nested calls to SelectMany and so on, it is lazy – so that cost of getting the methods is cheap.  So consider this, I’m writing a compiler or interpreter with type inference.  If I have an IEnumerable<ObjectsICanCallWithinThisScope>, then finding a matching set of methods is a LINQ operation.  In fact, writing a linker is a LINQ operation.

Given earlier work here on making streams or tokens enumerable, I’m thinking that a compiler should start to look like this under the hood:

IEnumerable<AstNode> BuildAst(Stream stm, Grammar g)
{
    IEnumerable<Token> scanner = new Tokenizer(stm, g);
    Parser p = new Parser(scanner, g);
    return p.Parse();
}

then you can write a code generator that operates only on IEnumerable<AstNode>.  You can write optimizers that operate on IEnumerable<AstNode> – nanopass becomes a closer reality because enumeration of the tree is so easy comparatively.  You can find nodes that are candidates for strength reduction via a LINQ query:

from p in nodes where p is AstArithmeticNode && IsConstantExpr(p.LeftChild) && IsConstantExpr(p.RightChild) select p;

Tools I Use –or– Shameless Plug Day

Sure, every developer uses his/her compiler, editor, debugger, IDE, etc. but does it stop there?  Heck no.

Here are the set of tools I use in addition, in no particular order:

  • Visual Studio 2008 with the following additions
    • TestDriven.Net – lets me run my NUnit unit tests in the IDE
    • Home Grown comment macro:
Imports System
Imports EnvDTE
Imports EnvDTE80
Imports EnvDTE90
Imports System.Diagnostics
Public Module Module1
    Private Function GetUserName() As String
        GetUserName = System.Environment.UserName
    End Function
    Sub InjectChangeComment()
        ActiveDocument().Selection().Text = "// " + System.DateTime.Now.ToString("MM-dd-yy") + " " + GetUserName() + vbTab + vbTab + vbTab
    End Sub
End Module
  • AQTime – I use this for performance measurement and finding leaks.
  • Araxis Merge – this lets me merge changes – the folder merge is a huge time saver when I get a new drop of a library.
  • Adobe Acrobat – Still the best tool for document interchange for anything that needs specific formatting.
  • Adobe Photoshop – I use PhotoShop for doing mockups, getting a second read on “unusual” files, etc.
  • AsTiffTagViewer – I use this for disassembling TIFFs.  I should really write my own using dotImage’s TIFF Tag routines, but AsTiffTagViewer works so well already…
  • .NET Reflector – I used to use ILDasm, but Reflector has been terrific for looking at code I’ve written to see what the compiler generated.
  • Pidgin – my current IM client.  It appears to suck the least.  Remember, communication is as important as writing code.
  • FxCop – Microsoft’s static analysis tool for .NET.
  • NUnit – I use TestDriven.Net far more often, but there are times when NUnit is a better choice.
Posted by Steve Hawley | 1 Comments

More IEnumerable<T> Fun

This blog post will be about a practical example of using IEnumerable<T> to make solving common problems easier.

Here’s a common abstract problem – walk a tree of nodes visiting each node and possibly perform an operation on a node’s contents.  An concrete version of that is finding one or more files within a file system.

The typical way to do that is with recursion.  And to you I say, that way madness lies.  The implementation, however is simple:

public class RecursiveDirectoryWalker
{
    public static List<string> WalkPath(string path, Func<string, bool> directoryFilter, Func<string, bool> fileFilter)
    {
        if (path == null)
            throw new ArgumentNullException("path");
        List<string> paths = new List<string>();
        WalkPath(path, directoryFilter, fileFilter, paths);
        return paths;
    }
    private static void WalkPath(string path, Func<string, bool> directoryFilter, Func<string, bool> fileFilter, List<string> paths)
    {
        string[] files = Directory.GetFiles(path);
        foreach (string file in files)
        {
            if (fileFilter == null || fileFilter(file))
            {
                paths.Add(file);
            }
        }
        string[] dirs = Directory.GetDirectories(path);
        foreach (string dir in dirs)
        {
            if (directoryFilter == null || directoryFilter(dir))
                WalkPath(dir, directoryFilter, fileFilter, paths);
        }
    }
}

In this code, I use a helper routine to pass a list of "found paths”.  A found path is a path for which the fileFilter returns true (or always if fileFilter is null).  The helper routine takes a list of strings to which it will add found files, rather than returning them.  This makes the recursion somewhat easier to write.

The problem here is the recursion.  If you have any kind of realistic stack limit, I will deliver you a file system which will overflow your stack.  Plain and simple, you can never ship production code based on this routine.  It is a time bomb waiting to happen.

Instead, routines like this should be either heap based or tail recursive.  Since C#, at present, doesn’t support tail recursion, I’ll cover heap-based.  To do heap based, the typical approach is to use a heap-based stack rather than the VM’s stack.  This is usually thought of as a stack of current state (or the deltas to return you to your state).  There’s still another approach, which works well for this problem: a queue of work to do.  The task is this: get the head of the work queue, process the files found there, then for each directory found enqueue it as work to do later.

I’m going to go one step further still and implement this directory walker non-recursively and as IEnumerable<T>:

public class DirectoryWalker : IEnumerable<string>
{
    private string _seedPath;
    Func<string, bool> _directoryFilter, _fileFilter;
    public DirectoryWalker(string seedPath) : this(seedPath, null, null)
    {
    }
    public DirectoryWalker(string seedPath, Func<string, bool> directoryFilter, Func<string, bool> fileFilter)
    {
        if (seedPath == null)
            throw new ArgumentNullException(seedPath);
        _seedPath = seedPath;
        _directoryFilter = directoryFilter;
        _fileFilter = fileFilter;
    }
    public IEnumerator<string> GetEnumerator()
    {
        Queue<string> directories = new Queue<string>();
        directories.Enqueue(_seedPath);
        Queue<string> files = new Queue<string>();
        while (files.Count > 0 || directories.Count > 0)
        {
            if (files.Count > 0)
            {
                yield return files.Dequeue();
            }
            if (directories.Count > 0)
            {
                string dir = directories.Dequeue();
                string[] newDirectories = Directory.GetDirectories(dir);
                string[] newFiles = Directory.GetFiles(dir);
                foreach (string path in newDirectories)
                {
                    if (_directoryFilter == null || _directoryFilter(path))
                        directories.Enqueue(path);
                }
                foreach (string path in newFiles)
                {
                    if (_fileFilter == null || _fileFilter(path))
                        files.Enqueue(path);
                }
            }
        }
    }
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Instead of using static methods, I now have a class which implement IEnumerable<string>.  In the class, I also include a seed path for where to start searching and the filters.

GetEnumerator is simplicity – there are two queues, a queue of files and a queue of directories.  While the queue of files is not empty, we yield return the head.  Otherwise, if the queue of directories is non-empty, we dequeue a directory and use that as a seed for finding more files and more directories.

The surprising thing is that both solutions are pretty close to the same length.  The non-recursive version is also quite readable, thanks to the use of yield return.  I especially like how it transforms the client code, in that I can now write something like this:

foreach (string s in new DirectoryWalker(@"C:\tfsroot\ISLib 8.0", null, (x => x.EndsWith(".obj"))))
{
    Console.WriteLine(s);
}

Which is nice, short and sweet.  Also, the recursive version accumulates the results into a list and returns it whereas this version works piecemeal.  That isn’t to say that we couldn’t build the recursive version around IEnumerable<T> – we can.  It just fits nicely into the state-machine style of the non-recursive version.  The other nice thing about using IEnumerable<T> is that many directory search routines carry in a predicate to ask whether or not you should cancel the search and to provide UI feedback/IO coverup – this is not necessary here, just break out of the for loop.

One thing I brushed over is the filter functions.  I’m using Func<T, TResult> to shorthand up a delegate definition that I can use for asking the question “do I want this directory/file”.  This type of function is called a predicate.  In my initial version, I made specific delegates, but I decided to use the generic Func<> instead.  For the usage example, I put in a lambda expression for the predicate “does the path end with .obj”.  In practice, it might be better to use Path.GetExtension() to check how it ends and use a case insensitive comparison.

Making JIT Code Easier

Check out Nanojit – this is a cross-platform, cross-target JIT compiler.  Given a set of abstract instructions and an appropriate back end, you can go from your abstract syntax tree to callable code in short order.

How fast can you implement a compiler?  Depends on your source language, but given a decent parser/scanner generator (say ANTLR), it’s a short trip to an AST.  Given an AST, it’s a short trip to LIR, the pseudo assembly language used for Nanojit.

I’m disappointed in the the type system being so basic, but in that regard it is very similar to C—.

Here’s a small sample using nanojit written by Chris Double.  Nifty.

Posted by Steve Hawley | 0 Comments

Pizza As UI Part II

In Part I, I wrote about the effect of information overload, poor communication, and misplaced convenience on user experience.  I’m going to discuss information overload in more detail and speak in a more meta sense about tools.

Once again, I’m going to pick on Antonio’s (although I could choose any number other pizza places) pizza box:

Ok, here we see a set of radio buttons that are used to mark a box to indicate its contents.  There are several things wrong with this, so I’ll hit them one at a time.

First, radio buttons are the wrong UI element.  Radio buttons are supposed to be used to set off a group of “select one of many”.  Maybe these are really check boxes with unusual skinning, but even so it is the wrong UI element.  What if I want double mushrooms?  How do I indicate that in a way that is obvious and unambiguous?  A better choice would be an entry blank followed by a times symbol then the topping (ie, __ × Peppers).  In real UI terms, this could be a numeric entry box or better yet a pop-up which includes a reasonable limit to pizza toppings Side note: what is the reasonable limit?  For pizza, it might be 5 or 6, but realize that whenever you try to create a reasonable limit, a customer will try to do something crazy, like this 100 patty burger requested at an