How to Organize Large Code Bases

4 min readNov 15, 2020

The key to organizing large code bases is to think in terms of libraries and layers. What specifically goes into one file I don’t see as all that important. However by bundling code together in a file which is often modified together you can simplify your workflow.

Java approach is very well suited for large, long living projects. It might sound like rigidity, but such a code organization prevents mess even if you have gigabytes of code.

That is a bit of a strange metric. I am used to talking about lines of code. Most of my career I have worked on code with millions of lines. That is considered large. My last was about 3–4 million lines.

I realize that such code bases are not an everyday norm for most developers, so I understand why they don’t like such a rigidity.

No, this is just something Java developers have internalized. Java swallowed the OO hype of the 90s completely and has gone as far as enforcing this rigid OO thinking into even how files are organized. Suggesting this is the only sensible way of organizing a large project, is essentially to insist that object-oriented programming is the only way to write large software.

Me or my colleagues have never had problems working in languages which don’t enforce this object-orientation orthodoxy. Sure large software projects presents lots of problems. But this was the last of our problems. Proper structuring of layers has usually been more challenging.

The OO thinking of the 90s which Java is a product of is IMHO the source of a lot of terrible choices with respect to large software development. I saw that clearly in our own code base. E.g. excessive inheritance hierarchies. I think Go is a much more sensible approach to large scale software development.

Results were really impressive: tens of extension methods for many classes were scattered through the whole codebase, many of them repeated each other because there were no observability of what methods actually exists.

I don’t know Kotlin well enough to really comment on this. But this feature is not all that different from Swift class extensions which again is similar to Objective-C categories. I have never in the many years I did Objective-C/Swift development had any problems like this. But perhaps that is partly because Apple had long ago established sensible conventions in how to deal with this, that people tended to follow.

Seems to me more like Kotlin developers are not used to conventions. But it is young language without the kind of maturity Objective-C had.

Adding extensions at very different places is part of important abstraction mechanisms at your disposal. E.g. if you have a data structure, which you want to add a category for visualization, then you want that category added in the visualization layer and not together with the class where it is define, as this would be a non-GUI layer.

Finding the file where some particular class resided was a quite painful effort, because developers named files arbitrarily, placed several classes into one file and, of course, not renamed files even if classes were renamed and the file name completely lost connection to classes contained inside.

I find it rather strange that you would actively search for filenames in a large project. When you got thousands of files that is a pretty slow way of finding anything. You use tools to jump to definitions, or you do source code search or grepping. The name of the file should be the least of your concerns.

One of the key reasons I abandoned Windows as a developing platform was because searching 3–4 million lines of code was just too slow on Windows. Since it is something I would frequently do, I saved a lot of time using Linux instead.

From this point of view worse results in number crunching for Java are not so important. In the vast majority of real Java apps number crunching is not necessary at all.

It was just something I remarked on because I got into speaking of monads in relation to error handling. It reminded me how how ironic it is that a language such as Java where its adherents are so obsessed about being object-oriented cannot actually treat numbers as first class objects.

So the issue here isn’t Java is slow at number crunching. The irony is that Java deliberately choose to not let numbers be objects so it could do number calculations faster. Thus Java broke with the beautiful object-oriented uniformity of Smalltalk where even booleans values are objects, and got pretty much nothing back for it. Instead you end up with a kludgy language where numbers have to be boxed in. Languages like Julia, Swift, Python, Ruby and I believe even Go manage fine without such kludgy solutions.

So, while talking about languages and their features it always worth to keep in mind where main areas of application of language. From this point of view worse results in number crunching for Java are not so important.

Absolutely true, although I cannot actually think of anything where I would deliberately pick Java if it could be avoided. If I had to be on the JVM I’d go with Clojure or Kotlin. Otherwise I’d pick something like Go or Python. If possible I would of course always go for Julia 😜

How to Organize Large Code Bases

Written by Erik Engheim

Responses (1)