Language Underbelly
For the past week and a half, I’ve been implementing a scripting language as part of some R&D work. It’s always exciting to make the puppet dance, as it were. Implementing the language was far easier than in the past as this time around I have generics and lambda expressions at my fingertips.
Languages are funny beasts because there are three goals that are often at odds with each other:
- Language Purity – how self-consistent is the language and semantically pure is it?
- Language Pragmatism – how does the language fare when it is used for practical problems?
- Language Reality – how does the language fare in the reality of oddball hardware and interoperating with legacy systems?
Here is an article that talks about some of the hidden features of C that are in gcc and are leveraged by the Linux kernel.
In most cases, they are handled through macros which, if the underlying option is unavailable, can be set to no-ops or some other equivalent mechanism, but these to me are scary changes as they represent a serious break in language portability.
Where these cases happen there is a conflict between Language Reality and Language Purity. The use of, for example, the likely() and unlikely() macros are a hack to help the optimizer understand when a branch is more likely to be taken or not, which can help significantly in optimization. That’s fairly benign as these things go.
Worse are things like ranges in case expressions in switch statements, which will break any other compiler than gcc. This is dangerous because it inexorably ties Linux to the gcc feature set and not the C feature set.
Historically, the keyword register is a conflict between Language Purity and Language Pragmatism. Register declarations are a way to hint to the compiler that a particular variable should end up in a register (hopefully for performance). It has come to pass, however, that the register keyword is pretty much ignored by current compilers. This is a lesson in short sightedness: avoid adding keywords to your language to solve a short term problem when they adversely affect the language in the long term.
The rub is that while Language Purity is important, languages that are pragmatic don’t get used and die an early death.
Perl is the most prominent example of a language that gave itself over to pragmatics and reality and has nearly no purity. It the main reason why I feel dirty after I’ve written in Perl.