People believe that long descriptive variable names and function names always makes code easier to read. Frequently the opposite is true.
I have a passion for both user interface design and clean code. Interestingly there is a lot overlap between the two in how you think. UX theory gives some insights which is instructive for how you should write your code to be readable.
Consider a web page with a list of options you can do:
I want to customize tools...
I want to have custom shows...
I want to do custom animations...
This is bad design, and something people often do. They put the same words and phrases in the beginning of every option. It makes it hard for the user to scan available choices. It is better to factor out the repetition:
I want to:
However this is not perfect. It is important to consider that humans don’t really read words. As children we read each letter to determine the word. But grownups identify the overall shape of words, to determine which word it is. So ironically reading words in western languages is quite similar to Chinese.
To identify distinct choices, the shape of words and sentences ought to have different shapes. In the example above, each choice has similar shape.
It becomes easier to scan the choices, if we instead arrange the words such that the sentence has a unique shape. The first word of each sentence is most important in distinguishing the choices. Hence we should attempt to give each first word a unique shape:
Shape is also something you have to consider with variable names.
index1, index2, index3, indexi, indexj, indexk
number1, number2, number3
Is a lot harder to distinguish from each other, due to the dominat shape being almost identical, compared to:
i, j, k
x, y, z
Which brings me to an important point often overlooked.
Variable names should not be considered in isolation. Make them distinguishable, not just descriptive. Understanding code relies on seeing things in combination. You don’t read a variable name in pure isolation.
Keep It Short
People don’t like to read. That is why good explanations attempt to cut down the length of sentences and supplement explanations with pictures and illustrations. In fact this whole article ought to have been shorter and use more illustrations to make the point. However writing shorter text is actually more time consuming. You have to spend more time thinking about how to shorten it and retain the meaning. As Blais Pascal said:
If I had more time, I would have written a shorter letter.
Almost all code is written with this in mind. That is why we use white space and indentation to visually show relationships rather than writing explicitly what it is. In most text editors we also use colors to communicate important parts of the code such as keywords, beginning and end of blocks etc.
Hence long variable names and function names is a burden. We should always strive to cut down on the length of the code, where it does not obscure the meaning.
The way we deal with this both in code and in user interfaces is by hiding details, and only show them as needed. Rather than showing the full code at all times, we split the code into multiple files. The files are further subdivided into functions, modules or classes.
Likewise in a user interface, not all functionality is visible on one page. Instead functionality is placed in separate dialogs and functionality is hidden behind multiple tabs, so you can focus on one thing at a time.
In user interfaces we have the concept of progressive disclosure , where further details is progressively disclosed as needed. One could argue a function is a form of progressive disclosure. At first the information is condensed to just the function name. If that is not descriptive enough, you may jump to the function definition to read the documentation or code.
Humans prefer to read abstract or condensed forms of something and only when certain terms are not understood do we continue to look at the details. We don’t want all the details dumped on us immediately. Let me illustrate this with an example. What is easier to read, this:
2 + 4*3
Or a full description such as:
Multiply four with three and add the result to two
Le me give you an another example with the ideal gas law from thermodynamics. If I expressed it with descriptive names on variables and operators, then we would describe it as:
The product of gas pressure and gas volume is equal to the number of gas molecules measured in moles times the gas constant, times the temperature of the gas in Kelvin.
Do you find that easy to read? Probably you will find it easier to read the relationship expressed in this way:
PV = nRT
P = pressure
V = volume
n = number of gas molecules measured in moles
R = the gas constant
T = temperature, measured in KelvinV = volume
n = number of gas molecules measured in moles
R = the gas constant
T = temperature, measured in Kelvin
This presents the relationship as a sort of progressive disclosure. You can see quickly the full relationship, but should you be uncertain about any of the variables, you can reveal more details by looking at the description of each variable below.
According to the current philosophy of long descriptive variable names it should have been written more like:
gas_pressure * gas_volume = number_of_molecules * gas_constant * gas_temperature
The problem with this description as pointed out earlier is that it affords poor scanning. Human short term memory is affected by the length of words. Writing out the full descriptions requires maintaining more in memory at the same time to see the relationship.
That is why we split programs into files which are split into classes, which are split into methods, to keep the number of elements we are relating to at any given time down. We it is human ability to abstract and remove details which allow us to maintain an understanding of the world around us.
It is then curious that people believe long function names and variable names are automatically better. Rob Pike have the best rules for variable and function naming that I’ve come across. What he states is that the length of variable and function names should be context dependent.
Inside a short function, local variables exist in a clear context. If the function is about gas pressure and temperature, then it would be more obvious what variables named
T represent. If you see these variables in the middle of some other huge function or they are global variables, it would be next to impossible to decipher what they mean.
When deciding on a variable or function name consider the context. Will the function or variable only be used locally within a function or a file, or is it global?
Are the variables related to a specific domain, which the reader of the code needs to be familiar with? E.g. there is no point in long descriptive variable names in physics code, if the reader of the code has no clue about physics. Instead variable names should reflect the established conventions of that domain.
Engheim’s Variable and Functioning Naming Rules
I do in fact use very long descriptive variable names on occasion. One case is when factoring out functions used once. Then I often prefer long descriptive names. Likewise for any seldom used function with global scope.
However I try to use established conventions and abbreviations to shorten things. I feel a bit bad saying this, because I think people use abbreviations and acronyms far too frequently. Just listen to a guy in the military for 5 minutes. So my rule is to only use the most common ones. Abbreviations I use:
- func function
- ptr pointer
- no number of
- num number
- cmd command
- col column. Mostly because it matches well with row. If it was alone, without row I would spell it out as column. Again context is important.
For very local context and small functions I use a lot of single letters. However I NEVER use these short names in a global context.
- i, j, k for loop indicies. But only locally. I don’t store indices inside structs or classes which are just named i, j or k.
- f to refer to file in the context of file operations.
- x, y for math operations, such as arguments to sine, cosine, logarithm etc, or to denote x, y coordinates.
- xs, ys for vectors/arrays of numbers. This is common in Haskell code, so I am careful with it elsewhere.
- p, q for points in a coordinate system.
- v, u for 2D/3D vectors in relation to geometry.
- s, str for strings in short string manipulation functions or very local context.
- T, T1, T2 to refer to a type when dealing with type parameters in templated code.
When defining a larger composite type I will avoid most of these. E.g. I would not refer to a string as simply s or a file as f. However for coordinates it makes sense to say x, y even in larger classes.
While I do use a lot of short variable names, I am probably a lot more active in commenting my code than what I see as normal. People might argue, why not just write longer function and variable names instead and avoid comments?
The simple answer is that I follow the principle of progressive disclosure as used in GUIs. A comment doesn’t have to be read, while a function and variable name has to. A comment is hence an aid when you need it.
For this reason I try to avoid writing much comments inside the code itself since that breaks the flow. I prefer to write longer descriptions in front of types and functions.
Inside functions I avoid writing what the code does, focusing instead on why it does that. This is important since programming languages do not have constructs to convey intent. But they tend to show very well what they do.
In front of types and functions I try to write what they are for rather than what they are.
Comments also allow you to create a context for your code, so that short variable names make sense. If you have made clear that the context is computational geometry, then p, q, x, y, z, u and v variables make a lot more sense.