Programming with Functions #5: Immutability

One very cool thing about programming with functions is that data we work on does not have to be mutable so often. In theory — and languages like Haskell — it’s possible to never use mutability, but the alternative solutions can sometimes be pretty complicated or at least very weird-looking for an average coder. Fortunately, there’s a growing set of programming languages that started as mundane, imperative languages, and now they are getting more and more functional while maintaining backward compatibility with their paleolithic versions, and that among other things means that we still can use mutable data if we need to.

And, of course, there’s Scala, which from the very beginning was designed to connect both worlds.

So instead of trying to avoid mutability at all costs, I’d like to point you here to the talk on “Effective Scala” from Scala Days 2018 in Berlin, by Bill Venners and Frank Sommers, and their concept of the Berliner architectural pattern. In short, we can think of every application as having three layers: the external interface, the internal interface, and space in-between. The external interface lets our application communicate with other programs, be it the backend, other web services, or the operating system (and through the OS with the user). Those other programs may work in a way that makes it difficult for us to write the external interface in a purely functional way. Note that I don’t say that it’s never possible, but quite often it’s much simpler or faster to just have some temporary mutable data structure here and there instead.

Our application will also have its own data storage in the form of a database, configuration files, and so on (mostly it’s just a database and files). Those are also not under our full jurisdiction, so we need the internal interface which lets us communicate with the operating system or the SQL server. But access to files and the database is slow, so for better performance, we usually build some caches in the memory, and shortcuts which make searching quicker, and almost by definition they usually include some mutable components. Again, we may try to go full-FP, and sometimes it makes sense, but in other cases, it’s just not worth it.

The Berliner pattern tells us that we don’t need to full-FP. Instead, we should accept that mutability is a part of our application, but we can push it away: outside and inside. We start from the middle part where all our business logic resides, and where functional programming really shines, and slowly work in both directions. In the end, we will be left with mutability only in the thin layers of external and internal interfaces, in places where it makes sense to have it.

I especially like that this pattern works for both new applications and when refactoring old ones. Back in the first episode I made this example of a code written in the imperative style:

val bagOfBlackCats = new mutable.Bag[Cat]() // empty mutable collection
for (cat in bagOfCats) {
if (cat.colour == Black) bagOfBlackCats.add(cat) // initialization
}
bagOfBlackCats // it will always be mutable,
// even if you never want to modify it again

In the imperative style of programming you will often find the following pattern: A variable is initially set to some default value, an empty collection, an empty string, a zero, or null. Then some initialization code is run in a loop to create the actual useful value step-by-step. After this process, the value assigned to the variable does not change anymore — or if it does it’s done in a way that could be replaced with resetting the variable to its default value and running the initialization again. But the possibility to modify it stays, even though not needed. For the whole life of the program, it hangs like a loose end of an electric cable, inviting everyone to touch it.

Functional programming gives us ways to build useful value without the need for initial default values and temporary mutability. The data structure, even a very complex one, can be computed through extensive use of a higher-order function, and then assigned to a constant, preventing future modifications. If we need an updated version of it, we can simply create a new data structure instead of modifying the old one.

val bagOfBlackCats = bagOfCats.filter(_.colour == Black)

Why does it matter?

First, the more restricted our data is, the safer it is for future development. By saying that this data structure should not be modified we not only give the compiler more information, but also future developers working on the same piece of code. Leaving them in no doubt about what they can do with it. It will result in fewer bugs in the future.

Second, race conditions. If two or more threads access the same data and can modify it, very quickly we may end up in an invalid state and the exact reason for it will be difficult to debug. We try to prevent this using mutexes, monitors, and other patterns. Making the data immutable does not solve all our problems, but it certainly makes our work easier.

Third, initial default values in variables are evil. All the time programmers use them and then forget to change them to something meaningful. They’re the source of a lot of bugs, especially the infamous “null” reference.

The concept of immutability gets a bit more complicated when we talk about data structures, and in particular: collections. You may have already seen something like this chart before:

In Scala, the collections library is divided into two parts: mutable and immutable, with immutable being the default one. This is a different distinction from the one between val (values) and var (variables). Let me quickly explain, so we will have common ground for the rest of the chapter.

An immutable collection is one that cannot be changed after it’s created. It serves as a source of data for any subsequent operation, but the result of that operation will be another collection or whatever data structure. The original collection will stay unchanged. On one hand, this may sometimes lead to suboptimal performance — what if you just want to replace one element in the collection, do you need to copy all the other elements to the new one? And that’s why you can also use mutable counterparts of each collection, such as mutable.ListBuffer for List, mutable.HashMap for Map, or mutable.ArrayBuffer for… err... Vector [link]. They are what you see in the left-bottom corner of the chart. You keep such a mutable collection as a val, because you don’t want to replace the reference to it after you assign it, but its contents are mutable. Think of it as a builder: create it at the beginning of a sequence of CPU-heavy operations, do those operations, and dismiss it afterward. If you need the result, copy it to an immutable collection. Keep the mutable collection local — create it only inside a method. Never keep a reference to it as a field of a class, so it will never be possible that another thread could try to access it.

Instead, if you must expose data that can be updated, use an immutable collection, but keep a reference to it in a var. Code which is supposed to make an update will get that reference, copy the data and make some changes (perhaps using mutable collections), and will then — in one atomic operation — replace the original reference with the reference to the new immutable collection. In the meantime, other places in the code will use the old reference which, after the moment of the update, will start to point to obsolete data, but at least it will be consistent — you can be sure that the data will not change until you explicitly access the source again.

But immutability is not there just to save us from unsafe operations. Consider this: If you have variables and you want to combine them and return a result, you need to use a function. Every time you call the function, the variables can have different values, so every time you have to access those values again. You can’t assume that you know them already. And every time the result can be different.

But if instead of variables you have constants, and if the function is pure — we will come back to what it means for a function to be pure in a moment — then the result will be always the same, and since the constants don’t change, it’s enough if you access them once, compute the result, remember it, and the next time you can just return what you remembered. Essentially, the function turns into another constant, only… its computation is lazy, meaning it’s not done when the constant is declared, but when it’s accessed for the first time.

val cat = Cat(name = ..., age = ..., colour = ...)def changeCatColour(cat: Cat): Cat = async {
val bestColour: CatColour = await { chooseTheBestColour() }
// waits for the UI thread, each time the response can be different
cat.copy(colour = bestColour)
}

but:

val cat = Cat(name = ..., age = ..., colour = ...)
val bestColour = Black // it is known
...
// here we always know which `cat` and `bestColour` we use
lazy val bestColourCat: Cat = cat.copy(colour = bestColour)

There’s no magic here. Under the hood, a lazy val is a wrapper over an uninitialized (or initialized to None or null) private variable and a piece of initialization code. The first time the lazy val is accessed the code is run and the result is assigned to the variable. In every consecutive case, the stored value will be returned instead of running the code again.

This is not the actual code but it should give you some idea how it works:

private var _bestColourCat = Option.empty[Cat]
def bestColourCat: Cat = _bestColourCat match {
case Some(_cat) => _cat
case None =>
val _cat = cat.copy(colour = bestColour) // the actual lazy val initialization
_bestCatColour = Some(_cat)
_cat
}

Let’s consider the implications. One is of course that we save the computer some work. Instead of calling the function over and over, we now can call it once, and just use the computed result forever after. That’s already good for performance, but that’s not all. Laziness means that we compute the result only if it is needed. If we don’t need it, we may never need to do it. No more computations which result in data structures that are subsequently ignored because the program’s logic chooses another path. The fastest code, after all, is that which is never called.

It is a truth universally acknowledged that a low-level language is more performant than a higher-level, more abstract and functional one. When we compare the same algorithm, for example, sorting, written in two languages, the one written in C or C++ will work faster than the one written in Scala or Haskell. But the keyword here is “same”. Sorting, just as many other operations, may have simpler and more complex implementations. Simpler ones are easier to write but are often slower or good in the common case, but horrible in corner cases. Functional programming helps us write complex code and laziness plays an important part in it. Combining laziness and immutability, with programming with functions, we can divide complex code into simpler pieces and use them only when necessary. It’s safer to write, and quicker. The code is more concise. In the real world, it means that if we have two programmers, one writing in a low-level and the other in a high-level language, and we give them both the same task and the same deadline, the first will probably go for a simpler implementation, while the other will be more eager to try a more complex version and tinker with the corner cases handling.

I think that’s enough for today. In the next chapter we will deal with thread safety and how immutability helps with it. Thanks for reading.

Previous: Programming with Functions #4: unapply and the newtype pattern

Next: Programming with Functions #6: Thread safety

Scala. Rust. Bicycles. Trying to mix kickboxing with aikido. Trying to be a better person too. Similar results in both cases. 🇪🇺 🇵🇱

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store