My daughter is learning how to read right now. As I was thinking about this blog post, I just walked past my wife and her working on some very basic reading skills. It is quite a bit of work to teach her everything she needs to know to read and write the English language.
In fact, it will be years of hard work before she’ll actually be able to read and write with any measure of competence—at least by our adult standards. We tend to take language for granted, but spoken and written languages are difficult—exceptionally difficult.
Even as an adult, writing this post is difficult. The words don’t flow perfectly from my mind. I strain to phrase things in the proper way and to use the proper punctuation.
But, even though it is difficult to learn a written language, we make sure our kids do it, because of the high value it gives them in life. Without the skills to read and write a language, most children’s future would be rather bleak.
The more and more I thought about this idea, the more I realized how simple programming languages are compared to the complexity of an written or spoken language.
The argument for more complexity
The irony of me arguing for more complexity and not less doesn’t escape me, but even though I strive to make the complex simple, sometimes we do actually need to make things more complicated to achieve the best results possible.
I’ve thought about this for a long time and I believe this is the case with programming languages. Let me explain.
Before I get into programming languages specifically, let’s start off by talking about human languages.
I speak and write English. English is considered to be the language with the largest total vocabulary and also one of the most difficult languages to learn, because of the flexibility in the ways in which you can compose sentences with it.
It is very difficult to learn English. I am fortunate that I am a native English speaker and grew up learning English, but for many non-native English speakers, the language continues to be a challenge-even years after they are “fluent” in the language.
There is a huge benefit though, to being fluent in the English language-expressiveness. I don’t profess to be an expert in foreign languages—I only know a little bit of Spanish, Brazilian Portuguese and Japanese, myself—but, I do know that English is one of the most expressive languages in existence today. If you want to say something in English, there is most likely a word for it. If you want to convey a tone or feeling with the language—even a pace of dialog, like I just did now—you can do it in English.
As I said, I can’t speak for other languages. But, having lived in Hawaii, I can tell you that Hawaiian is a very small language and it is difficult to express yourself in that language. Sign language is another example of a very small language which is fairly easy to learn, but is limited in what it can convey and the way it can convey it.
I say all this to illustrate a simple point. The larger the vocabulary of a language and the more grammatical rules, the more difficult it is to learn the language, but the greater power of expressiveness you have with that language.
Breaking things down even smaller
I promise I’ll get to programming languages in a little bit, but before I do, I want to talk about one more human language concept—alphabets or symbols.
The English alphabet has 26 letters in it. These 26 letters represent most of the sounds we use to make up words. 26 letters is not a small number of characters, but it is not a large amount either. It is a pretty easy task for most children to learn all the letters of the alphabet and the sounds they make.
The text you are reading right now is made up of these letters, but have you ever considered what would happen if we had more letters in the alphabet? For example, suppose instead of 26 letters, there were 500 letters. Suppose that we made actual symbols for “th”,”sh”,”oo” and so forth. Suppose we made the word “the” into a symbol of its own.
If we added more letters to the alphabet, it would take you much longer to learn the alphabet, but once you learned it you could read and write much more efficiently. (Although, I’d hate to see what the 500 letter keyboard would look like.)
My point is that we are trading some potential in the expressiveness we can pack into a limited number of symbols for some ease in learning a useful set of symbols.
As you were reading this, you might have thought that this is exactly what languages like Chinese and Japanese do—they use a large number of symbols instead of a small alphabet. I don’t know enough about these languages to know the answer for sure, but I’d bet that it is much easier to read a Chinese or Japanese newspaper than it is to read an English one—or at least faster.
We could take the same exercise and apply it to the number system. Instead of using base 10, or having 10 symbols in our number system, we could have 100 or even 1000. It would take a long time to learn all our numbers, but we’d be able to perform mathematical operations much more efficiently. (A smaller scale example of this would be memorizing your times tables up to 99 x 99. Imagine what you could do with that power.)
What does all this have to do with programming languages?
You really are impatient aren’t you? But, I suppose you are right. I should be getting to my real point by now.
So, the reason why I brought up those two examples before talking about programming languages is because I wanted you to see that the vocabulary and grammar of a language greatly influence its expressiveness and the basic constructs of a written language, greatly influence its density; its ability to express things concisely.
Obviously, we can’t directly map human written languages to programming languages, but we can draw some pretty powerful parallels when thinking about language design.
I’ve often pondered the question of whether or not it is better to have a programming language that has many keywords or few keywords. But, I realized today that was an over simplification of the issue.
Keywords alone don’t determine the expressiveness of a language. I’d argue that the expressiveness of a language is determined by:
- Number of keywords
- Complexity of statements and constructs in the language
- Size of the standard library
All of these things combined work together to make a language more expressive, but also more complicated. If we crank up the dial on any one of these factors, we’ll be able to do more with the language with less code, but we’ll also increase the difficulty of learning the language and reading code written in that language.
Notice, I didn’t say in writing the language. That is because—assuming you’ve mastered the language—the language actually becomes easier to write when it has more constructs. If you’ve ever run across someone who is a master of Perl, you know this to be true. I’ve seen some Perl masters that could write Perl faster than I thought possible, yet when they came back to their own code months later, even they couldn’t understand it.
Looking at some real examples
To make what I am saying a little more concrete, let’s look at a few examples. I’ll start with C#, since it is a language I am very familiar with. C# is a very expressive language. It didn’t start out that way, but with all the keywords that have been added to the language and the massive size of the base class libraries, C# has become very, very large.
C# is an evolving language. But, right now it has about 79 keywords. (Feel free to correct me if I am wrong here.) As far as languages go, this is pretty large. In addition to just keywords, C# has some complex statements. Lambda expressions and LINQ expressions immediately come to mind. For someone learning C#, the task can be rather difficult, but the reward is that they can be pretty productive and write some fairly concise code. (At least compared to a more verbose language like C or C++.) Java, is pretty close in most of those regards as well.
But, take a language like Go. Go is a language with only 25 keywords. It makes up for this by having some fairly complex language constructs and having a pretty robust standard library. When I first learned Go, it took me perhaps a week to feel like I had a pretty good grasp of the language. But, it took much longer to learn how to use Go properly. (And I still have plenty to learn.)
At the far end of the spectrum, we have languages like BASIC. Different BASIC implementations have different keyword counts, but most of them are pretty low and the constructs of the language are very simple. BASIC is a very easy language to learn. But, because it is so easy to learn BASIC and BASIC is so simple, the average programmer quickly outgrows the capabilities of the language. BASIC isn’t very expressive and it takes many more lines of code to write the same thing you could write in a few lines of C# or Go.
For a much more comprehensive overview of differences between programming languages, I’d recommend Programming Language Pragmatics. It does into details about many different languages and the differences between them.
What more complex programming languages buy us
It feels really weird to be arguing for something to be more complex, since the mission of this blog is to make the complex simple, but in the case of programming languages, I think the tradeoff of increased complexity is worth the cost of extra learning time.
Consider how much more complicated the English language is than any programming language. To be able to read the very words you are reading now, you have to understand a vocabulary of several thousand words, recognize most of those words on sight, and understand a very complicated set of mechanics which govern the grammar of the language. There aren’t even concrete rules, much of what is “right” or “wrong” is based on context.
Yet, even with all this complexity, you are able to do it—our brains are amazing.
Now, imagine what would happen if we decided that English was too difficult of a language and that we needed to dumb it down. What if we dropped the vocabulary down to say 200 words and we got rid of the complex rules. What you would have is basically a Dr. Seuss book or some other early reader type of children’s book. It would be very difficult for me to convey the kinds of thoughts I am conveying to you right now with those restrictions.
When you compare even the most complex programming language to the English language, it is no contest. The English language is far more complex than any programming language we have ever conceived of. I don’t know of a programming language that the average person couldn’t learn reasonably well in a year’s worth of time. But, if you were to try and teach someone written English in a year—well, good luck to you.
If we created much more complex programming languages, we would have a much larger learning curve. But, in exchange, we’d have a language—that once mastered—would allow us to express algorithmic intent at a level we can’t even imagine now.
Not only would we be able to express our intent more clearly and more concisely, but we’d also greatly reduce the total lines of code and potential for bugs in our software. Less code equals less bugs.
Now, I’m just playing around mentally here. I “half” believe what I am saying, because I am just exploring ideas and possibilities. But, even in this mental exercise of thinking about what would happen if we created programming languages as complex as written languages, I can’t ignore the drawbacks.
Obviously, the biggest drawback would be the learning curve required to learn how to program. Learning how to program—at least how to do it well—is pretty difficult now. I still think people make it more complicated than it needs to be, but software development is a much more difficult vocation to pick up than many other career choices.
If we created more complex programming languages, we’d have to count on many more years of learning before someone could even really write code or understand the code that is already written. It might take 4 or 5 years just to understand and memorize enough of the language to be able to use it effectively.
We could of course combat this to some degree by starting beginners on easier languages and advancing them up the chain to more complex ones. (In fact, writing this article has convinced me that would be the best way to learn today. We shouldn’t be starting developers with C# or Java, but instead should teach them very simple languages.)
We would probably also be forced down a smaller path of innovation, as far as programming languages go. The world can support 100’s of simple programming languages, but it can’t easily support that many complex languages. We might end up with one universal language that all programmers used. A language of this size would be very unwieldy and hard to advance or change. It would also take a massive effort to create it in the first place, since written languages developed naturally over hundreds of years.
That’s enough fun for now
After writing this article my brain is hurting. I’ve been considering writing this post for awhile, but I wasn’t sure exactly where I stand on the issue. To be completely honest with you, I still don’t. I do think that more complex programming languages would offer us certain benefits that current programming languages do not, but I’m not sure if the drawbacks would be worth it in the end or even what a significantly more complex programming language would look like.
What about you, what do you think? Am I just crazy? Is there something significant I missed here?