Monday, April 27, 2009 8:21 PM
Image Processing as Sets of Transformations
In the image processing world, like most computational problems, we often think our work is composed of only two basic ideas: representation and transformation. Of course, one may have many layers of both representations of transformations and transformations of representations which can make things appear quite complex at times.
However, the problem is much more simple than it appears. This is because a representation can be considered as a transformation from a zero or identity state. Thus, in writing a symbolic language for image processing, we are left with only a single idea to consider: transformations. By composting layers of transformations we can apply image processing techniques in way which is not only bidirectional and platform agnostic but also comes along with a host of other benefits.
Let us consider a simplified example of processing an image:
1) We read in a file (representation) and use a codec (transformation) to convert it into a format understood by our API (representation).
2) We then perform some type of algorithm on that data (transformation) which results in some type of output (representation).
3) Finally, via another codec (transformation), another file is saved to disk (representation).
In most cases there are a great number of intermediate representations. Each is a full copy of the previous iteration with whatever changes have been so far applied. Essentially, the same information is copied over and over again in memory. We do allow for some kinds of in-place processing, however, this is bad as when the operation has been completed, the previous representation has been destroyed.
Instead, what if we batched up sets of transformations? This could have many benefits:
1) The most obvious benefit is that of parallelization. Even at the simplest level of functional composition, these transformations could be handed off to a cluster for asynchronous processing or saved for a later batch processing job.
2) With an intermediate symbolic transformation language, processing algorithms could potentially be combined and reduced to produce a single transformation out of many. This would significantly reduce the processing overhead as well as the number of intermediate memory representations.
3) An intermediate symbolic language which encompassed both codec and processing may make it possible to push the processing transformation through the codec transformation and in so doing no longer need to have any intermediate memory representation. This could provide significant memory and processing speed time benefit.
4) The intermediate symbolic language could be saved into the files themselves thus removing the need for the codec to be present on the end machine. Admittedly, the user would also need the image language interpreter.
5) Instead of applying simple image processing algorithms to an image, the symbolic representation could be appended to the end of the file. This would be quite similar to layers in practice. In this way it would be possible to view the image at all stages of transformation.
6) For large or proprietary transformations, the representation could be kept on the internet and either be downloaded or, in the case where the owner did not want to expose their algorithm, a flattened representation could be sent out and a processing delta could be sent back.
Of course, when I speak of data I don’t only mean the image itself. This technique could also be applied to many classes of data or algorithm. Most notably for us, image metadata.
My initial goal is to build a basic codec representation along with some simple transformations. Currently, I am researching bidirectional, reversible and declarative languages as examples. With F# as a base language I believe it will be possible to build something portable to other ML variants.