Tuesday, December 23, 2008 4:23 PM
by
RickM
How will you parallelize your existing codebase? Try R.A.S.P.
There has been much talk of how we will be writing all of our new code with parallelization in mind. However, what of our existing code? It’s unlikely that everyone will just suddenly dump decades of existing code and write everything from scratch. In this article I’m going to provide a simple methodology for how we might deal with the ever building problem of parallelizing our existing mountains of code. Comments and contributions are welcome.
Methodologies of the Past
From the STL to .NET, the frameworks we have constructed our applications around have been heavily dependent on the idea of an application having a single thread. Given that the foundation of what we have all been using for a very long time was constructed around this preconception, it’s unreasonable to expect that much of our existing code will ever be fully parallel. Even those that wrote code on top of a thread safe framework may find that years of patches and and poor design decisions make ground up parallelization impossible.
If we set our expectations reasonably, we see that we should instead focus on leveraging parallelism to improve the performance of the slowest parts of our software. From this viewpoint, parallelization is an optimization problem. Like all optimization, the difficulty of parallelizing code will have much to do with the methodologies which were used to write it.
Of course, object-oriented code being written with modern S.O.L.I.D. principles and will be easier to parallelize than older procedural code. At the same time, a poorly organized codebase or poorly written code will always make change difficult and so also hard to parallelize. This is why well written code is worth the investment. We will see the investment paying off in spades for the companies who have bothered to care about code quality. Others who find they have spaghetti code under the hood will find they will need to deeply segregate and modularize before parallelization is possible.
A Methodology for Revisiting the Past
In most cases it would be a poor choice to implement your own threading API. Efficient and easy to use parallelization APIs are coming to (or already part of) every commonly used language and framework. Most of these APIs are not only built on top of years of research, they also have been written and debugged by a large number of people with specific expertise. These APIs are a godsend because they will allow most developers to parallelize existing software with a minimum amount of pain. The parallelization of existing code bases will be much the same as any other kind of performance tuning.
The key will be using a profiler to identify places in the code that would be sped up by parallelization and leveraging these new APIs to take advantage of the available hardware. The exciting part is that this can be done with any existing profiler and many existing APIs. The unfortunate part is that because memory sharing is such a big issue, parallelization requires a degree of separation beyond other types of optimization and so is likely to require some amount of refactoring.
Not all types of performance problems are conducive to being solved by parallelization, careful evaluation of the problem at hand is required. Also, as with anything that requires significant code change, building a solid test fixture is key to introducing as few bugs as possible. By leveraging the ideas of avoiding premature optimization, pragmatic unit testing, using existing APIs, and mindful refactoring it will be possible to introduce parallelization into many already existing projects with a manageable amount of risk.
What is RASP?
While not included in the acronym, the first step in any kind of optimization is profiling. Before you can begin to parallelize your code, you must determine where the bottlenecks might be. A broadly defined list of parallelizable things to look for would be, to quote Rich Hickey, “independent data/work, moderate-to-course-grained work units and/or complex coordination logic that would be simplified with threads”. A couple quick examples of low hanging fruit to be on the lookout for would be slow iterative loops and blocking I/O. It is important to note that as a general guideline it would be wrong to parallelize anything if it would not significantly increase the speed of your software.
For each of the bottlenecks found while profiling, parallelization is best separated into four steps:
| Review: | Review code to determine if it is a good candidate for parallelization. |
| Anchor: | Create a unit test fixture to ensure that the behavior of the to be parallelized code does not change. |
| Separate: | Ensure that the to be parallelized code has no shared memory constraints. |
| Parallelize: | Minimally refactor for parallelization while leveraging an existing API to do the heavy lifting. |
As the specifics of what each of these would entail depends greatly on exactly which platform and language is in use, I will not go into them deeply now. Overall, it’s a simple methodology but I think both sufficient for the task at hand and broadly applicable.
Conclusion
Review, Anchor, Separate, Parallelize. It’s not intended to be a difficult concept but instead to provide a simple path to parallelization. I would be very interested in hearing about any opinions on what RASP might be missing or how it may be better clarified. While I didn’t have time to discuss them deeply in this post, parallelization patterns are also a key concept in using RASP as if you can’t easily identify what can be parallelized than it would be impossible to use any parallelization methodology. In the future I hope flush out RASP further as well as discuss parallelization patterns in depth.