Test-Driven vs REPL-Driven Development
Reflections on the advantages of a TDD and REPL based approach to software development

Few things gets software developers as passionate as discussions about best engineering practices. Should you do test-driven development, behavior driven development or even REPL driven development?
While I am quite opinionated on software development, I don’t want to claim there is one right true path. The style of development that will work best for you depends both on your traits, the language you use and the problem you are trying to solve.
Here I will try to make the case for REPL-driven development over TDD, based on my experiences. I will like to build up my case around a made up example, a nonsense function to assemble some parts into a larger whole:
function assemble(t::Thingy, d::Doodad, s::Stuff)
bolts = get_bolts(HEX, 4)
goo = get_glue(SUPER)
gizmo = put_together(flip(t), s, bolts)
tighten!(gizmo)
doohickey = glue(gizmo, d, goo)
dry!(doohickey) if isfaulty(doohickey)
throw(AssemblyFailed("Faulty parts, buy new ones!")
end return doohickey
end
We imagine that this is the correct function implementation. It is the goal of our development efforts. At the end we should have a finished assembled doohickey
object which we can return to the caller of the assemble
function.
But as with any software development, you don’t know the types, objects and problem very well when you start. You may have misunderstood the API you are using. Let us imagine that your initial coding attempt looks something like this:
function assemble(t::Thingy, d::Doodad, s::Stuff)
bolts = get_bolts(FLANGED, 10)
goo = get_glue(EXPOXY) gizmo = glue(t, s, goo) # wrong assumption on input/output
dry!(gizmo) doohickey = put_together(gizmo, d, bolts)
tighten!(doohickey) return doohickey
end
You have gotten the types of bolts to use, and the type of glue wrong. Also you are gluing together stuff which should have been screwed together with bolts and visa versa. Also you forget that the thingy part t
needs to be flipped before it is put together with the stuff s
. You may be wrong about the type of object you get back when you glue or put something together. In short one gets lots of assumptions wrong when doing software development.
The trick is to find an effective process to take you from something that is wrong to something that is correct.
The Test-Driven Approach
The test-driven approach to this is to write a unit test for the assemble function. In fact with TDD, you write the test before you even define an empty function.
const thing = Thingy()
const dud = Doodad("qwerty")
const stuff = Stuff(42)@testset "Test if valid object created" begin
gizmo = assemble(thing dud, stuff)
@test !isnothing(gizmo)
end
Please note I am using Julia in my code examples. Hopefully this doesn’t matter much to the case I am making. It is worth nothing that in Julia tests are organized into test sets which roughly corresponds to test functions in other testing frameworks. The test above is testing if we got a valid object as output. thing
, dud
and stuff
are objects defined outside the test set so they can be reused in multiple tests.
This test gets written before assemble
exists so we can run our test and actually test that the test runs. That is an important TDD principle.
Next a TDD practitioner makes a minimal change to make the simplest test pass, such as writing an implementation like this:
function assemble(t::Thingy, d::Doodad, s::Stuff)
return Gizmo()
end
Once this works, the TDD practitioner will add another test which is expected to cause failure. The idea is to basically go through each requirement for the function and test them in turn. Each time we make a modification to our function to make sure the next test succeeds on the second try.
A Problem with Test-Driven Development
With test-driven development we are running tests over and over again making changes repeatedly until all our tests succeed. Here is an example from an actual Julia package I wrote, when I deliberately introduced failure:
(LittleManComputer) pkg> test
Testing LittleManComputer
Status `/private/var/folders/n7/v31m12_x6lj1qpqrbsg7ln300000gn/T/jl_73BcOb/Project.toml`
[c742fd3c] LittleManComputer v0.1.0 `~/Development/Julia/LittleManComputer`
[8dfed614] Test
Status `/private/var/folders/n7/v31m12_x6lj1qpqrbsg7ln300000gn/T/jl_73BcOb/Manifest.toml`
[c742fd3c] LittleManComputer v0.1.0 `~/Development/Julia/LittleManComputer`
[2a0f44e3] Base64
[8ba89e20] Distributed
[b77e0a4c] InteractiveUtils
[56ddb016] Logging
[d6f4376e] Markdown
[9a3f8284] Random
[9e88b42a] Serialization
[6462fe0b] Sockets
[8dfed614] Test
Assemble mnemonic: Test Failed at /Users/erikengheim/Development/Julia/LittleManComputer/test/assem_tests.jl:45
Expression: assemble_mnemonic(["ADD", "12"]) == 111
Evaluated: 112 == 111
Stacktrace:
[1] top-level scope at /Users/erikengheim/Development/Julia/LittleManComputer/test/assem_tests.jl:45
[2] top-level scope at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[3] top-level scope at /Users/erikengheim/Development/Julia/LittleManComputer/test/assem_tests.jl:41
[4] top-level scope at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /Users/erikengheim/Development/Julia/LittleManComputer/test/assem_tests.jl:5
Test Summary: | Pass Fail Total
All Tests | 56 1 57
Assembler tests | 35 1 36
Symbol table | 8 8
Instruction set | 10 10
Regression test | 9 9
Assemble mnemonic | 8 1 9
Disassembler tests | 11 11
Simulator tests | 10 10
ERROR: LoadError: Some tests did not pass: 56 passed, 1 failed, 0 errored, 0 broken.
in expression starting at /Users/erikengheim/Development/Julia/LittleManComputer/test/runtests.jl:4
ERROR: Package LittleManComputer errored during testing
Try glancing at that and figure out quickly what went wrong where. Now not every test framework necessarily produce a lot of output. But the point is that there will always be a lot of output not directly related to what you are doing. Visual noise getting in your way.
If the tests of our assemble
function fails, it only tells you that the output is wrong. It doesn’t tell us where in over multiline function we made a mistake. The tests only see the outside. We can attempt to make modifications internally to see if we can make the test succeed. However this can require numerous iterations. If the problem is not immediately clear I hope to clarify it later when I get more in detail about REPL based development.
REPL Driven Development
REPL is short for read-evaluate-print-loop. It means an interactive terminal such as your Bash shell or DOS command line interface, where you type a command and see an immediate response. A command or expression is read and then evaluated. The result is then printed to screen.
Here is a simple example of a REPL session in the Julia programming language. But this is not unique to Julia. In fact it was popularized with the LISP programming language and Python and Ruby are even better known languages allowing this approach to development.
julia> 2 + 3
5julia> length("hello")
5julia> map(sqrt, [9, 16])
2-element Array{Float64,1}:
3.0
4.0
With REPL based development, testing and coding is more of a merged interactive task. The way you go about it is not strictly defined the way TDD is. It is more of a loosely defined practice and philosophy. Thus when developing the assemble function using a REPL based approach our interactions may look more like this:
julia> t = Thingy()
Thingy()julia> d = Doodad("qwerty")
Doodad("qwerty")julia> s = Stuff(42)
Stuff(42)
We begin by setting some sensible values for what would be the inputs for our assemble
function. Next we start to type in the lines of code we would normally have typed to implement that assemble
function. Each time we type a line and hit enter we see some output.
julia> bolts = get_bolts(FLANGED, 10)
10-element Array{FlangedBolt,1}:
FlangedBolt()
FlangedBolt()
⋮
FlangedBolt()
julia> goo = get_glue(EXPOXY)
EpoxyGlue()
There is absolute rule for how you do this. You could have been writing out an initial version in your editor and then copy paste each line into the REPL. Some editors even integrate with a REPL, allowing you to send the line at your current cursor position to the REPL for evaluation.
Personally I actually do the typing in the REPL, for a couple of simple reasons: You got a live environment, which allows you to do function, variable and type completions with tab. This often goes beyond what you can do in an IDE. E.g. in Julia you get completions on keys in a dictonary. A static analysis of code cannot tell you what the keys of a dictionary is.
What I see as a benefit over a TDD approach is that you are continuously given feedback as you type. For instance I may call the glue
function and see an error right away:
julia> gizmo = glue(t, s, goo)
ERROR: MethodError: no method matching glue(::Thingy, ::Stuff, ::Glue)
Closest candidates are:
glue(::Gizmo, ::Doodad, ::Glue) at sticky-stuff.jl:538
It tells me that I made some wrong assumptions about the inputs to the glue
function. I thought it took a thingy, some stuff and then the glue to put them together. In reality glue
is for gluing together gizmos and doodads. Who knew!?
In a statically typed language, a compiler could catch this. But in a dynamically typed language you would either have to run a REPL, a test or some kind of linter/static analysis tool.
Some problems can only be caught at runtime. That is where a REPL is beneficial. You see the problem right there and then as you execute a particular line. Perhaps you ran another version of the glue function and got the following output:
julia> gizmo = glue(d, goo)
Doohickey(1331)
You thought you where supposed to get a Gizmo
object back but actually you got a Doohickey
object. Or maybe you got the right object but it is not quite what you expected.
For instance perhaps you thought this would give you the remainder:
julia> 5 ÷ 2
2
But actually it was an integer division and you needed to use another operator:
julia> 5 % 2
1
With a REPL approach you are continously running code and looking at outputs. Every time you are typing a line of code you get to verify if you are doing a sensible thing. If you get the wrong results, you can quickly bring the previous line of code back from history with the ↑ arrow key and modify it.
In the final correct version I gave for this code you had to flip the thingy t
. With a REPL you can try doing it wrong once and then make corrections until output looks good.
gizmo = put_together(flip(t), s, bolts)
Rapid Iteration in Context
This is what I see as the major advantage of a REPL approach. You are rapidly iterating in context. With a TDD approach you don’t really know which line of code is the culprit, only the test that failed and in which function. Yes, sometimes you actually made a coding mistake which produce an error on the correct line. But that will require either scanning through the output from running the test or clicking on some error output that jumps you to the right spot in the code. Either way you need to deal with a lot of line noise.
With a REPL approach you are focused right onto one line of code at the time, look at what output that gives and whether that make sense.
REPL is also a natural bottom-up approach. As you get things working in the REPL, you naturally start stuffing things into functions to avoid repetitions. Here is a simple example of a one line function definition for turning snake case to camel case:
julia> camel_case(s) = join(uppercasefirst.(split(s, '_')))
camel_case (generic function with 1 method)julia> camel_case("hello_how_are_you")
"HelloHowAreYou"
As you are building larger expressions in the REPL through trial and error, you naturally develop a habit of stuff these longer single line expressions into functions. Thus a REPL driven approach naturally leads to the creation of of lots of smaller functions. These functions then tend to be combined into other small functions. Thus you grow things from the bottom up.
Each time you got a sensible definition of a function, you copy and paste it into your editor, such as this camel_case(s)
example.
How Does This Scale?
If this was all there was to it, then this would of course not scale. You cannot manually test functions in a REPL each time you make code changes in a larger project.
However REPL based development does not mean an absence of testing. Rather it typically means you write tests after you have developed some functionality properly.
Unlike TDD, you don’t write the tests ahead of type. Instead as you develop the code in the REPL interactively you are both exploring how to create the code as well as how to test it. This new insight may help you define sensible tests to avoid future regressions in the code you have written.
Objections to non-TDD Based Development
There are some obvious objections to this. The idea of TDD is that it forces you to think about how a function should behave first, thus avoiding that you accidentally start testing for the things you know works. The idea is also that by forcing you to think of tests first you are exposed to the edge cases and things you need to think about as you develop your code.
It is also worth keeping in mind that nothing prevents you from using a mixed approach. You could write tests first, run them and then use a REPL based approach to develop the function being tested. Although I would argue this goes somewhat against the philosophy of TDD. It means you are essentially performing testing as you develop which is not stored in a formal unit test.
But to get back to the central point. In my limited experience with TDD, one of the problems I have kept seeing is that a TDD practitioner is not necessarily certain about some new API. Say you are developing some functionality for Bitcoin trading. You have never worked on this before, and you are quite unfamiliar with the API.
I would argue that a REPL based approach has the advantage in that it allows you to experiment with the API. You can try out different input and look at outputs.
Fundamental Issue — Reading is Easier than Writing
I think an analogy is useful in expressing what I see as the fundamental problem of TDD. Anyone who has ever tried learning a foreign language should know that reading a language is substantially easier than writing it. Likewise listening to someone speak is easier than speaking yourself.
Why is that? We humans are pattern recognizers. We can look at images, listen to sound and identify patterns. We are significantly worse at producing these patterns ourselves.
It is not without reason that a lot of data science and scientific computing is really about taking a ton of data and producing a visual representation? Why is that so important? Why not just look at the raw numbers?
It is because humans are exceptionally good at analyzing pictures. We have brains tuned to recknognizing complicated patterns in complex images. REPL based development does in my opinion utilize this same superior human ability. It relies on constantly showing output from functions we run. Recknognizing what this output is and means, is substantially easier than trying to formulate ahead of time what that output should be.
While there are merits to writing tests first, we are also handicapping ourselves by doing so. We deprive ourselves of one of the most powerful tools at our disposal, which is our ability to recognize and reason about patterns. In an idealized world where we have a firm understanding of the APIs we are using and the system we are building so that we can create highly detailed and accurate specifications, we have a good shot at writing tests first.
This is the equivalent of being asked to write an essay in your mother tongue. It is not hard. You can write in your mother tongue effortlessly. But in reality we are seldom in that situation. Frequently we don’t even properly understand what we are supposed to make. Nor do we really comprehend the APIs we need to use.
The Power of Prototyping and Iteration
This applies to numerous human endeavors. It is not without reason why rapid iteration and prototyping has been so successful in many fields. Experimenting and making simplified versions of what you intend to be the final product is a good way of developing a better understanding of what you are supposed to make.
Through this process you may create a lot of throw away code. Code that really just served as a springboard to the actual solution. Here there is certainly a key aspect of your personality at play. Frequently when I write a Medium article, I realize that I have to completely change the title and intro, because I ended up going into a very different direction than I thought the writing would take me.
Writing is the thought process. As you write a story or code, you are also developing your own understanding. The story or the coded solution becomes clearer in your mind.
Thus sometimes I don’t even write in a REPL. I approach code more like that of an essayist. I simply start writing to see how the structure develops. At that point it is not that important that it compiles or is entirely correct. Rather I am trying to develop a sense of the contours or shape of the software.
This could develop with several modifications without anything actually being tested. The reason is that a lot is simply throw away stuff. The code only gets written to clarify my own toughts. It is scaffolding. It is when I am pleased with the general shape of the code, that I might start putting parts into a REPL and go through the whole iterative approach of creating a working function.
It is also part of the reason why I am not all that fond of IDEs. IDEs tend to constantly harass you and tell you that your syntax or types are wrong. But when you are into an essay mode and just trying to hash out your thoughts, these are needless distractions.
Imagine a novelist being constantly alerted by his or her word processor that they have not yet defined or introduced the character they are writing a scene for? That is seldom the most helpful advice.
Of course not everybody works like this. Everybody has their own style. Some like to get the code they write down as accurately and correct as possible the first time. For me it depends on what mode I am in. Am I actually trying to create a final working function or am I just in exploratory thinking mode?
My objection to TDD, is that it feels like a straight jacket. It forces you to spent a lot of time writing tests which may not exist tomorrow because you found a better way of structuring your code.
The Intentions of My Advice
My intention here is not tell you to follow another fad. If you are successfully doing TDD, then there is no need to care all that much about what I am writing here. If it works for you, that is great. My intention is simply to explain why it may not work for everyone. Indeed maybe you are one of those who cannot seem to get it to work satisfactory for you. Hopefully I am able to put some words and observations about the same kinds of things you have experienced.
Another aspect I did not make sufficiently clear from the onset is that my analysis here is heavily based on specifically using a REPL friendly language such as LISP or Julia. I cannot effectively use a REPL in C++, so REPL based development is kind of out of the question.
What do I do then? I still generally don’t do TDD. Instead I make small prototypes, which are quick to run. If working on a larger system, I might prototype new functionality in a separate little program I can iterate faster on. The outcome of that becomes the starting-point to add functionality to the larger system.
But I think that in a statically typed language such as Java or C++, a TDD approach would likely have a stronger appeal to me since you don’t have the option of keeping a system live which you continuously iterate on. Also with static type checking, running a tool (the compiler) to analyze all the code you wrote an giving you a report is generally how the whole process works. You are never just compiling single functions. And so the way the compiler works and the way a test suite works is not all that different. Most IDEs also have convenient ways of jumping to the code with errors or failing tests.
Anyway perhaps I have inspired you to experiment with REPL based development. Let me know your experience!