It's time to leave the secondary, external structure of our programs behind. If you can treat the reflected code from a programming language like an abstract data structure, why can’t you just keep the source itself in a similarly abstracted data structure? Isn’t the structure of a program more similar to a graph, than a list? Besides the momentum of the past, what is keeping us tied to having our code in this format?



Recently Patrick Smacchia, lead developer of the NDepend project, contacted me.  He offered me a free pro NDepend license. I have wanted to explore NDepend for a good while and so I accepted his offer.  While I haven’t yet had a chance to install or play with it, I have read some of the documentation. This post is not about NDepend but has come about in part as a result of my exposure to it.


The Problem

Does the order of functions or classes in a file necessarily have any meaning? Only the meaning you assign when you make that file. Every person who modifies that file will most likely have their own meaning in mind. Even slightly different ideas about code structure will lead to divergent designs within that code. After a few contributors, even with impeccable programming style, it becomes very difficult to move around in a file.  The file becomes a soupy mess of functions and very difficult to penetrate.

A partial solution is to separate out things into as many files as possible. This has its own issues. Even at the optimal balancing point between file size and number of files, as long as multiple people are working in the same space, divergent designs will appear. Secondly, it's difficult to work in many files at the same time. Here at Atalasoft, we keep things fairly well separated out. Because of this the file tabs in my copy of Visual Studio are always overflowing to the point where it’s hard to keep track of what is where. It’s a huge pain to move from file to file to make minor changes and tracking down exactly which file contains exactly you are looking for can be excruciatingly slow.

The very same thing applies to the folder structure of a program. Even slightly different ideas about how files should be kept between engineers quickly make a folder structure contain divergent designs. This file structure often evolves to be significantly different from the internal structure of the code itself. Having the program contained in two separate structures (filesystem and class/namespace) in a recipe for madness in any case.

Many methodologies have been invented to help with this problem. None of them work completely.  The problem is not in the methodologies except in that they are anchored to flat files. For instance, Microsoft has added the ability to mark out regions of code in its editors. They did this in order to help with this very problem but, regions are just a hack. They don’t address the underlying issue of what causes that unnecessary code to pollute your view in the first place.

This is also the main issue which is holding down the functional programming movement. Large functional programs are extremely hard to organize as there is very little structure built into most functional languages. This leaves each programmer on the team to structure things in his or her own way.

All of this brings us to the root cause of the problem: No two programmers think identically alike. It follows that because of this they will always build structures, both file and programmatic, in different ways. Many people talk of the problem being a lack of training or learning but there is no way to completely impose the exact same structure on the output of all programmers. We are creative creatures, not machines. It is simply impossible to constrain a human being in such a way.


The Freedom of Leaving Files Behind

If we cannot make different programmers use the same structure, doesn’t it follow then that we should try to make as much of this structure as possible unnecessary? The less the structure matters the less enforcement will be needed.

Is that really a good enough reason? We have always kept our code in files and got along fine, and so it's good enough right?

Every time you make a change is it necessary to have all of the file’s content there, where you can see it? Of course not, most of it is clutter that just gets in your way. What if a different ordering/meaning would be more useful for a particular task? Anyone would admit that it would be a huge waste of time to reorganize every file for each separate task.

What if we could just change how we are querying our code structure instead?

If we kept our code in queryable data structures it would be easy to lay our environment in any way we chose. All of these things would be not only possible but easy. You could also, for instance, show a method and everything which references it. The possibilities for code visualization are limitless.

Keeping our code in flat files is also limiting how well we can understand and visualize more complex structures. As long as we are viewing our code in flat files, it will be extremely difficult to think about it in any other terms.  The real boon of moving on is the power and understanding we will gain from being able to visualize the structure of our programs in any way we choose.


Momentum of the Past

It’s obvious. The reason we still keep our code in flat files is because that is what we have always done. All of our tools have been written in this context. There would be almost no infrastructure for someone writing code which was to be kept in a database much less a custom designed data structure.  Almost all compilers need a complete program, or at least a very large and well defined modular section of a program, in order to generate machine code.

Interpreted languages have been leading the way in making smaller and smaller independent chunks of programs compilable. Some newer languages are changing this though. Microsoft’s C# has a snippet compiler. F# and Clojure can be compiled one function at a time.

A language which is not tied to traditional compling and linking would be ideal for research into keeping code in abstracted data structures. A dependancy tree later you've got a compiled program.


First Steps

A good first step would be an IDE/Editor that can manage all of the code in a database and allow the programmer to dynamically construct queries to build views and otherwise manipulate the code. The environment could then generate flat files in order to be compatible with current compilers.

What will eventually be needed is a whole host of tools designed around the idea.  This would be a great opportunity for any large software company. It’s a whole new world of tools to build for which there would be little initial competition.


JetBrains and Language Oriented Programming

The Language Oriented Programming movement has been playing with similar ideas for years but they seem to have gone almost nowhere. At the end of 2005 Martin Fowler wrote an article entitled “Language Workbenches: The Killer-App for Domain Specific Languages?”, but where is my interactive language workbench? There have been a few articles from JetBrains which contain similar ideas. It seems as though they have given their focus to metaprogramming and generating domain specific languages. Hopefully they haven’t left the Language Workbench idea by the wayside.


Much thanks to Lou Franco for the info on Language Oriented Programming and Language Workbenches.

Steve Hawley has posted an interesting response with ideas about LINQ and flat files here.