When you are handed a string, integer, or any value type, can you know what it really represents? Can you define the range of appropriate behaviors for that data? Can you tell if it's formatted correctly? The problem is, in all of these cases, you can't. You can't be sure of it's meaning, it's format or even how to treat it. This is why ambiguous types break the object-oriented programming paradigm and should be avoided whenever possible.
It's laughable to think that a Physical Engineer or Physicist would work without the aid of units. This is because, when you have undefined units, you can't verify that the thing you get at the end of a computation is defined in a consistent way.
In computing, the counterpart to the Physical Engineer is the Software Engineer. Why is the Software Engineer not held to the same type of design integrity? The use of ambiguously defined variables is one of the largest sources of bugs in modern software development and yet we ignore it. In fact, with many dynamically typed languages we are moving in the direction of more and more ambiguity.
In C# and Java, strings and value types are defined only in terms of their lexical and
mathematical operations. Except from the context in which they are used, they carry no additional information about the meaning of their
content. In order to perform any non-trivial operation on that data, you have to make assumptions about it's meaning, which in turn leads to bugs.
Is what you are writing really object oriented?
This may seem like a silly question to ask in the modern programming era. Almost everyone is using an object oriented language these days and its mostly taken for granted. However, as any beginning programmer knows, it is very
easy to program in an object oriented programming language while
completely ignoring the underlying paradigm.
We don't use OO because it's an agreed standard for implementing a programming language, we use it to solve a specific set of problems. These problems are directly related to the modeling of data and data manipulation. Groups of data subtypes are classified into a larger object and operations on that data are defined in such a way as to model behavior in terms of that classification. If the paradigm is ignored these issues become manifest as ambiguity and disorganization.
It follow that using ambiguous data-types in a public API goes directly against not only the object oriented paradigm but also the broader goal of data classification and program integrity.
Why not pass strings and value types as arguments to public methods?
Using an ambiguously defined data type instead of a well defined object
means that the domain and formatting of that data is left open to
question. It also means that if a user of that data formats it
inappropriately it may find it's way into an operation for another type of data and cause havoc.
For a programmer that is familiar with object oriented programming it often comes down to a choice of encapsulation versus convenience. I know as well as anyone that having to define an object to carry your data when you could just pass in a string and a couple of integers can feel tedious. However, by taking the easy way out, you are setting things up to fail down the road.
Consider, is it possible that sometime in the future someone will be using the API you are defining? Will the person looking at that code know the specific format you chose for that string? What about the range of valid values for those integers? What kinds of assumptions about that data will be made that you may have not considered? If they mess up that format or get the range wrong, at what point will it become obvious?
There are a lot of advantages to using objects instead: An object which was designed to carry that same data could verify that it is well defined when it is constructed. It can hand back that data formatted or manipulated in many different ways. You can well defined questions which can be asked about that data which makes program control flow more obvious. Best of all, the code for all of these things is centralized and up to the discretion of the designer of said object. Using a well defined object, two people with different assumptions about that data but asking the same question will get the same result.
Of course, at the lowest level a CPU will be operating on basic value
types and so our code will always reflect that to some degree.
However, because ambiguous types are extremely dangerous, they should
almost always be encapsulated and well defined.
What tools are available to me in .NET?
If you are a .NET programmer the best thing you can do right now is to become familiar with the already existing classes available for making your strings more well defined such as System.Uri and System.IO.FileInfo. I've put up a question on stackoverflow in order to try to build up a list of available container classes.
Beyond the use of predefined classes, it's best to make your own encapsulation objects with heavy up front validation. You can then use extension methods to make native .NET classes take your new validated type. It would be worth putting together a library of free encapsulation classes, structs and extension methods to allow for easy interoperability.
Beware, the .NET Framework encourages ambiguous types.
Unfortunately, as well designed as it is for many things, the .NET Framework is pretty bad about ambiguity. Based on classes such as System.IO.Path and System.IO.FileStream which for some reason take paths represented as strings, you might even say that ambiguity is encouraged. Consider the vast number of methods in .NET classes which take unencapsulated strings and scalar types.
The most unfortunate side effect of this design is that users of the .NET API may come to believe that this is the way things should be in a proper object oriented API. Also, as a consequence, if you want your own product's API to be easily understood by a .NET user you have to follow the same destructive conventions.
F# helps to solve this problem with Units of Measure.
In most object oriented languages, if you were to encapsulate every single scalar value that was passed into a method, it would be quite a lot of extra coding. Microsoft's newest programming language, F#, has a feature called Units of Measure which allows a programmer to optionally specify both meaning and behavior for classes of scalar types.
A scalar with a unit of measure is a real type which is enforced by the compiler. When an operation is performed the resulting type is that of the combined units used for the calculation, just as they would be in physics or engineering. This is because F# is designed in part to be used by engineers and scientists. As a side effect we as programmers get to reap the same benefit.
This type of scalar type classification has long been missing from object oriented languages. It's a huge step forward and I hope other programming languages move to adopt it quickly.
@Mark and Jon:
I think I may not have fully expressed what I meant by "handed a
string or integer". The context am talking about is post-compilation
when variable names no longer have real meaning. In this context
(discounting reflection on variable name) all you can tell about an
integer is where it is coming from and it's value. An executing program
does not have access to documentation.
It's also important to consider reflective programming. It is generally agreed that reflective
programming is where the object oriented world is headed. However, when
you have a scalar value, you can tell very little about it through
reflection. While it is possible to retrieve the variable's name using reflection, actually
using the name of said variable to carry type information is a methodology which is extremely
prone to errors from mistyping. Errors which will not be caught at
compile time and which may lead your program to incorrect paths of
Also, I want to note that while a string is not a scalar value it does suffer from the same type of ambiguity. For this reason it, and other "base" types, should be handled similarly.