Thursday, June 11, 2009 1:42 PM
A Short History of Programming Languages
Recently, I was reading David R. Tribble’s annotated version of Dijkstra’s famous letter “Go To Statement Considered Harmful”. While in the process of reading, it occurred to me that I did not really understand the history of language abstraction. To remedy this I’ve done some research and put together the following post. I hope you find it as educational to read as I found to write.
Programming languages are often spoken of in terms of their level of abstraction. To this end there is a somewhat official classification system. In said system, each generation in the hierarchy represents another level of abstraction away from the machine hardware.
First-generation programming language (1GL) – Binary
I think there is a world market for maybe five computers.
It makes sense Watson would say this seeing as how the earliest computers were programmed entirely in binary. These computers were programmed with no abstraction at all. I, for one, do not envy our forefathers in regard to this task. While the programs were small, all operations, data and memory had to be managed by hand in binary.
- Introduced in the 1940s
- Instructions/Data entered directly in binary
- Memory must be manually moved around
- Very difficult to edit/debug
- Simple programs only
Architecture specific binary delivered on Switches, Patch Panels and/or Tape.
Second-generation programming language (2GL) – Assembly
He who hasn't hacked assembly language as a youth has no heart. He who does as an adult has no brain.
Assembly languages were introduced to mitigate the error prone and excessively difficult nature of binary programming. While still used today for embedded systems and optimization, they have mostly been supplanted by 3GL languages due to the difficulties in controlling program flow.
- Introduced in the 1950s
- Written by a programmer in an intermediate instruction language which is later compiled into binary instructions
- Specific to platform architecture
- Designed to support logical structure, debugging
- Defined by three language elements: Opcodes (CPU Instructions), Data Sections (Variable Definitions) and Directive (Macros)
Almost every CPU architecture has a companion assembly language. Most commonly in use today are RISC, CISC and x86 as that is what our embedded systems and desktop computers use.
Third-generation programming language (3GL) – Modern
“Real programmers can write assembly code in any language.”
- Larry Wall
Third generation languages are the primary languages used in general purpose programming today. They each vary quite widely in terms of their particular abstractions and syntax. However, they all share great enhancements in logical structure over assembly language.
- Introduced in the 1950s
- Designed around ease of use for the programmer
- Driven by desire for reduction in bugs, increases in code reuse
- Based on natural language
- Often designed with structured programming in mind
Most Modern General Purpose Languages such as C, C++, C#, Java, Basic, COBOL, Lisp and ML.
Fourth-generation programming language (4GL) – Application Specific
"A programming language is low level when its programs require attention to the irrelevant."
-Alan J. Perlis
A fourth generation language is designed with making problems in a specific domain simple to implement. This has the advantage of greatly reducing development time cost. At the same time there is the disadvantage of increasing developer learning cost.
- Introduced in the 1970s, Term coined by Jim Martin
- Driven by the need to enhance developer productivity
- Further from the machine
- Closer to the domain
Some examples: SQL, SAS, R, MATLAB's GUIDE, ColdFusion, CSS
Fifth-generation programming language (5GL) – Constraint Oriented
“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”
- Tony Hoare
It has been argued that there is no such thing as a 5GL language. This seems to me ridiculous as working with domain specific syntax is hardly an abstractional dead end. This cynicism is likely a result of the many false claims of 5GL for the sake of marketing.
Many researchers speak of 5GL languages as constraint systems. The programmer inputs a set of logical constraints, with no specified algorithm, and the AI-based compiler builds the program based on these constraints.
- Introduced in the 1990s
- Constraint-based instead of algorithmic
- Used for AI Research, Proof solving, Logical Inference
- Not in common use
Some examples: Prolog, Mercury
An interesting history lesson, although, I can’t help but feel that categories beyond 3GL are somewhat arbitrarily defined. I do agree that 4GL is an abstraction on 3GL. Perhaps however, there are other directions which are equally abstract in relation to 3GL. Perhaps after concrete logic based systems, free form natural language should have been fourth. This could be followed by thought based, which I feel would be the ultimate level of abstraction for human interaction.
Also, to my great disappointment, I was unable to find out who coined most of the “# generation language” etymologies. As usually in computer science it is possible to gain insight on a concept by examining the author’s other works, in this case that option seems unavailable.
Introduction to Assembly Language
Generations of Programming Languages