Scala collections performance

TL;DR - The mutable ArrayBuffer offers brilliant performance for tail additions, sequential and random access; beating everything else in my tests. Immutable collections are not so clear cut - for sequential access Lists are a good option, for random access Vectors are best

Recently I've been working with Scala. Scala's collections api is much richer than Java's and offers mutable and immutable implementations for most of the common collection types. I've always been interested in algorithm and data structure performance so I decided to run some benchmarks to see how the collections performed. I was most interested in the relationship between mutable and immutable collections.

Note: Scala does list the expected performance characteristics of the collections but it doesn't show the relative performance of the types.

Two very common scenarios are appending to a collection (e.g. reading from a file or database cursor) and then iterating through the collection. It's widely known that prepending to a list then reversing it can yield greater performance (for some implementations) so I also tried this.

As you can see in the source code, all I am doing is iterating through a list of 1000 Ints and appending each element to a collection of a given type. It's trivial but it's also very common. Here are the results:

Appending

Appending 1000 elements to the given collection. Benchmarks marked with M use mutable collections

Benchmark Throughput (ops/s)
M ArrayBuffer 173,677
M ArrayBuilder 117,886
M ListBuffer 69,692
List (prepend) 60,677
M Array * 49,790
M ListBuffer (prepend) 39,799
Queue 38,482
M MutableQueue 30,996
M MutableList 30,839
Stack (prepend) 29,370
Vector (prepend) 26,910
Vector 26,882
M MutableQueue (prepend) 17,007
M ArrayBuffer (prepend) 16,634
M MutableList (prepend) 16,047
Array (using a view) ** 14,604
Queue (prepend) 353
List 307
Stack 98

* As you can see in the source, I actually allocated a 1000 element array then copied each element into the slot so it's not strictly an append operation. Whether this is a valid benchmark is subject to interpretation (see below also)

** For the Array benchmarks I need to get the current index of the source List. I've done this using zipWithIndex. A common optimisation when using zipWithIndex is to call it on a view however this yields worse performance! As I've said many times before - don't assume A will be quicker than B - prove it! The overhead of the optimisation quite often outweighs the performance gains.

Mutable vs Immutable collections

We can get very good performance out of the standard immutable List. Don't assume that immutable collections are inherently slow - they're not.

Lists

Immutable lists offer good performance and as you'll see in the traversal benchmarks, traversal is also performant. Prepending to an immutable list then reversing it is much faster than simply appending. This is to be expected but the magnitude of the difference surprised me.

Prepending to a mutable list and reversing it is slower than simply appending. This makes perfect sense when you read this comment in the scala documentation "A MutableList consists of a single linked list together with a pointer that refers to the terminal empty node of that list". The expensive part of the append operation is getting to the end of the list, the mutable list maintains a pointer so we don't need to traverse the list to get to the end.

Queues and Stacks

Admittedly Queues and Stacks are suited to specific use cases which are not represented here. However in these tests they are slower than Lists, not by a significant margin but enough to make me think that it may be better to use Lists in their place. Scala's collection api makes it trivial to implement our own Stacks and Queues on top of lists.

Vectors

Prepending and appending to Vectors yields similar performance. This is to be expected as Vectors are optimised for random access

Arrays, ArrayBuilder and ArrayBuffer

The mutable ArrayBuffer offers amazing performance and surprisingly it's faster than the ArrayBuilder (a lower level api) and raw Array access. Assuming a mutable data structure is suitable for my use case, ArrayBuffer would be my go to for performance focussed code blocks.

My comparison with raw array access is a little questionable because it's very hard to separate out the actual overhead of the array access from the overhead of running the benchmark. In particular the way I've run all these benchmarks is to pass in a list, traverse it and append/set each element on the collection under test. The traversal of the original list will incur an overhead but it should be consistent throughout the benchmarks. I could have simply performed System.arrayCopy which would likely be very fast but this would then be an apples with oranges comparison and I wanted to benchmark incremental appends.

Iterating

Iterating a collection is also very common so I iterated through 1000 elements in a given collection and summed the results. I could have used Scala's sum() helper function but I wanted to avoid any optimisations that it may introduce. I just wanted to iterate the collection but I need to do something with the elements or the JIT will drop the dead code.

Iterating over 1000 elements of the given collection. Benchmarks marked with M use mutable collections

Benchmark Throughput (ops/s)
M ArrayBuffer 1,226,833
M Vector 402,363
M ListBuffer 412,568
List 396,586
Queue 363,200
Array 321,700
ArrayBuilder 163,464
Stack 54,152
M MutableList 37,536
M MutableQueue 33,620

Again the ArrayBuffer trumps everything else. The ArrayBuilder benchmark is pretty pointless as it simply calls result() to get an array then does exactly the same thing as the Array benchmark. I've included it in case someone asks "what about the ArrayBuilder?" Why ArrayBuffer yields such good performance deserves further investigation.

Random access

I find random access to indexed and linear sequences less common but I decided to benchmark them anyway. I simply created a collection of Ints and then accessed them randomly using the apply method. The results were quite predictable:

Benchmark Throughput (ops/s)
M ArrayBuffer 72,734,680
Vector 62,082,283
M Array 60,777,210
M ListBuffer 1,107,126
M MutableList 936,839
M MutableQueue 907,515
List 313,253
Queue 161,754
Stack 45,474

Once again ArrayBuffer is the winner followed closely by Vectors and Arrays. Predictably Linear Seqs offer much lower performance for random access due to the need to traverse the collection to find the element. What is surprising is the performance increase of a mutable List/ListBuffer vs the immutable List (3x faster). A mutable collection should not be inherently faster for random access to elements.

How useful are micro benchmarks?

To some people micro benchmarks are useless. They will argue that any difference in collection performance will be vastly outweighed by network calls, disk IO etc in the real world. It's certainly true that SQL queries, IO etc will have a big impact on performance but not all applications and use cases are IO/DB bound.

Even for IO bound use cases we may still be able to squeeze a little bit more performance from good data structure and algorithm selection. It could be enough to hit that elusive NFR.

Write your own benchmarks

Micro benchmarks are very sensitive to the parameters used. Your own use case will be quite different to the trivial use case I have simulated. Collection size, the type of objects in the collection and the operations you perform on the elements will all affect performance and may render these benchmarks invalid. Feel free to clone my github repo and use it as a template for your own benchmarks and let me know your results!

Write your own collections

Scala's collection Api is great, coming from Java it was a revelation and I'm impressed with how well thought out it is. The api also makes it easy to write our own collections and Programming in Scala (a superb book) includes examples of this. Don't be afraid of writing your own use case specific collections. In another next post I'll show how we can do this and try to beat the ArrayBuffer benchmark.