Google Summer of Code 2018 - Gonçalo Tomás

In this post I go over the the work I did in this year’s edition of Google Summer of Code (GSoC) in detail. I had to hand in a slightly rushed technical report as a final deliverable for my project, where I documented the work I did with bare minimum details. Now I’ve had a couple more days to recall my whole experience and write down a proper blog entry.

Preface

While adding support for Riak in the benchmark that I’m implementing, I needed to fork the Riak Erlang client to be able to work with modern Erlang versions. Since I was getting to know the database more and more, it made sense to me to reach out to Russell Brown in order to get a project for this year’s GSoC. To make a long story short, he accepted to mentor me in a project, we had some difficulties in getting Riak accepted in time for GSoC but in the end I got in. Sweet.

The project

I offered to do some work in places I know were needed, but Russell told me there is a community of people experienced with Riak that are slowly making contributions and apparently there are multiple pain points being ironed out, which led to discussing other ideas. Russell proposed to rework Riak Test since it was infamous for being tough to set up, to the point where people wouldn’t even try to use it. The idea seemed challenging, and I thought that looking at some Riak test suites would be a good way to figure out how some of the internal components work, so I said yes.

Opening the hood on Riak Test

The code within the Riak Test repository is composed of three main components: a custom test runner, a code intercept library very useful for the types of tests that are actually performed and lastly some useful modules for startup and teardown of Riak clusters along with extra set up of the testing environment.

The test runner

The easiest way to understand how the old test runner worked is to look at some of the test suites. Fortunately there are some trivial examples which I’m going to show you. Here is a test suite that is supposed to fail:

%% @doc A test that always returns `fail'.
-module(always_fail_test).
-export([confirm/0]).

confirm() ->
    fail.

And here we see an example of how to build a passing test:

%% @doc A test that always returns `pass'.
-module(always_pass_test).
-behavior(riak_test).
-export([confirm/0]).

confirm() ->
    pass.

There are several things to unpack here. The first is that confirm/0 is the callback that the test runner calls, and the test result is an atom with the logical value of the test success. You are supposed to add the setup, test and the teardown code inside this callback, effectively making these test modules contain logic for a single test. At least that might have been the idea, but…

— In theory there is no difference between theory and practice. In practice there is.

There were some test suites with hundreds of lines of code that were all packed into the confirm/0 callback. To make things worse there is no sort of detailed reporting, which made failing test cases extremely frustrating. You have a test consisting of a bunch of code you might not be familiar with that was passing one day and failing the next, and the only bit of information you’d get was the fail result: sometimes there wasn’t even a function name or line number to refer to, and you’d be on your own.

Today we have Common Test which is excellent to build the sorts of intricate test cases that you can probably imagine Riak has, but I thought that at the time Riak Test was made things might not have been the same. While I was looking into the source code Russell brought Gordon Guthrie and Bryan Hunt onboard and none of them was aware of any real reason why Basho would try to reinvent the wheel in this way, especially taking into account what looked to be a botched end result. Bryan described it as “a classic example of technical debt”, a description I found intriguing. As a student I’m usually racing to jump from one project to the next, and I generally don’t stop to think about code that I already wrote; technical debt for me was a buzzword that gets thrown around in software engineering classes and yet there it was. To have experienced it first hand¹ gave me a perfect impression of why that was such an important problem to fix, as this was restricting contributions to the fairly small number of people that could successfully operate Riak Test.

Using Riak Test involves booking an entire morning

Getting Riak Test to run wasn’t exactly difficult once you know how it works, but it still takes ages to run² the simplest of test cases: the project build script would download and generate releases for multiple Riak versions, which isn’t exactly ideal if you wish to get something running very quickly. Despite this problem being exacerbated in the first run, it became obvious that we needed to improve in this dimension as well.

Test suites are just one big test with lots of nested cases

Let’s look at a slightly more complex example of a test suite and its confirm/0 callback:

-module(basic_command_line).

-behavior(riak_test).
-export([confirm/0]).

confirm() ->
    %% Deploy a node to test against
    lager:info("Deploy node to test command line"),
    [Node] = rt:deploy_nodes(1),
    ?assertEqual(ok, rt:wait_until_nodes_ready([Node])),
    %% Verify node-up behavior
    ping_up_test(Node),
    attach_direct_up_test(Node),
    status_up_test(Node),
    console_up_test(Node),
    start_up_test(Node),
    getpid_up_test(Node),
    %% Stop the node, Verify node-down behavior
    stop_test(Node),
    ping_down_test(Node),
    attach_down_test(Node),
    attach_direct_down_test(Node),
    status_down_test(Node),
    console_test(Node),
    start_test(Node),
    getpid_down_test(Node),
    pass.

This looks like my code when I started messing around with EUnit 😬

Since we have what is conceptually equivalent to set up code, this is where it would make sense to use Common Test. Instead of having all of the test code clumped up in a single place, you could just add a proper init_per_suite and end_per_suite and then have each one of these tests be actual test cases.

There was a distinct pattern among some test files I looked at where tests were factored out into functions that mainly consisted of assertions³, but there’s a lot happening behind the scene. For starters, the test runner would load code required for most test cases, which you can’t do trivially in either EUnit or Common Test⁴. Riak Test was also terminating clusters automagically - regardless of the test suite, you’d see setup code but usually not a single line of teardown code.

Code Intercepts

Another constituent of Riak Test is a code intercept library. I hadn’t really seen anything like it before, but I guess you could think of it like a mocking library, although it differs from it in intent⁵. I believe it works by recompiling modules with the mocked functions instead of the original ones, loading them to the BEAM VM and keeping the original compiled module handy in case the user wishes to clean any intercepts.

Simply put this allows you to wreak havoc over any system, a crucial mechanism for studying and testing the resilience properties of Riak: if you claim your database can make read and write operations with 2 working nodes in a cluster of 5 then you better have a good way to prove it. In practice this is done by purposely making certain parts malfunction or simply stop and then testing that Riak is still able to operate or recover from those failures. I find it conceptually similar to chaos monkey.

I did not work on this code specifically, but we saw potential use for it outside of Riak so we set an additional goal of making it available as a standalone package.

Riak specific test library

The remaining component of Riak Test is the code that works in tandem with the test runner, responsible for setting up the test environment, starting and stopping nodes, joining nodes into a cluster and other miscellaneous behaviour. This was one of my main focuses during GSoC.

Starting out

At this point you can probably understand why Russell suggested that I worked on Riak Test and why it desperately needed some attention. Now that we know we need to make Riak Test better, we need to find a good way to do it.

It was fairly obvious that we needed to use Common Test as it is the tool for complex testing in Erlang. Seeing how modern Erlang applications are built as rebar3 projects, there was also little sense in reworking Riak Test without using it as a build tool. Now to figure out a way to do it.

Laying the rebars

In order to make a rebar3 project that responsible for making Riak tests, the trivial approach is to just have it list riak as a dependency and organize the project roughly as follows:

riak_test
├── include             (headers go here)
├── intercepts          (code overrides for riak modules go here)
├── src                 (empty, just app file)
└── test                (test suites go here)

This is a fairly standard rebar3 structure apart from the intercepts folder, but it’s a great starting point. Now to configure the project to list Riak as a dependency⁶:

{deps, [
    {riak, {git, "https://github.com/basho/riak", {branch, "develop"}}}
]}.

Unsurprisingly, this didn’t work at the first attempt. Let’s quickly glance over some of issues I had:

rebar3 has dropped support for R16, the latest Erlang version supported across Basho’s codebase. I decided to leave a binary of the last R16 compatible version in the repo, and as Riak is brought back from the distant Erlang past we can just delete the binary and use the latest available version.
Some of the dependencies didn’t compile with rebar3’s slightly different compile step, so I needed to make some changes to them.
Other old dependencies were able to be overridden with hex equivalents, but this caused a whole class of errors about version mismatches like I had never seen before.

All of these issues took me a fair bit of time to push through, and I only ended up having a working prototype by late July!

The chamber of 32 doors

Once the compile step didn’t fill my terminal with bright red text it was time to move on to the test suites themselves and get them to work with Common Test. This would be boring were it not for all the interesting ways that the Basho team found to test Riak. It was a matter of picking one out of dozens of test suites and try to port it to Common Test.

I had to break up all of the code into test cases with appropriate names, separate the setup code into init_per_suite and make sure nothing broke in the process. There was consensus in the idea that I should first convert an easy test suite, ensure it ran correctly⁷ and only then move to all the others, and so I started with the verify_build_cluster suite.

Converging to the end result

At this stage I was able to do a clean git clone of my repository and have it compile, and I had one test suite ported over to Common Test, and it was time to plug the wires and start everything up. There probably won’t be any big surprises looking at the current dependencies list:

{deps, [
    %% override setup which is incompatible with rebar3
    {setup, "2.0.2"},
    {lucene_parser, {git, "https://github.com/goncalotomas/lucene_parser", {branch, "master"}}},
    {riak_search, {git, "https://github.com/goncalotomas/riak_search", {branch, "develop"}}},
    {riak, {raw, {git, "https://github.com/basho/riak", {branch, "develop"}}}},
    {rt, {git, "https://github.com/goncalotomas/rt_lib", {branch, "master"}}},
    {intercepts, {git, "https://github.com/goncalotomas/intercepts", {branch, "master"}}}
]}.

Both lucene_parser and riak_search are deprecated now so in time we should be able to make them go away.

Gentlemen, start your Riaks

Making everything come together was the last and most difficult task! Rebar3 didn’t do everything we needed in order to get a proper testing environment since we not only need Riak as a dependency, we actually need to generate a release of it. Despite hitting a limitation rebar3 demonstrated quite a bit of flexibility, as proved by the hooks provider. All I needed to do was add a script that generated a Riak release manually and register that as a post_compile hook, and rebar3 would gladly do everything for me⁸.

Other applications that involve NIFs and more complex operation also needed to have their release generated. Seems obvious, but actually determining the origin of the cryptic error messages I was getting was not easy.

Avoiding Release Apocalypse

Generating a Riak release is a complicated step that takes a long time and it was one of bad side effects of the old Riak Test. To make sure we didn’t end with the same issue in the new version I added an extra step to the post_compile hook that checked if some key files existed and only carried on with the release generation if the files were not there. No time wasted!

Pressing the big red button

The last step of the way was making sure that the tests actually ran. I made a big effort in using the old Riak Test code that concerned the setup and teardown of nodes and was eventually successful in making things work, almost putting an end to this story.

What was missing was the teardown code which was missing from a surprising amount of test suites, and I’m assuming it is missing because the test runner must have been responsible for automatically taking the nodes down. So in the suite conversion process I had to add teardown code in the end_per_suite function.

Multiple suites…?

Things were going great right after my first successful attempt with the verify_build_cluster suite and I stumbled on a unpredicted issue: there was metadata being left behind that was preventing the suite from being run multiple times in succession, but it would cause

Fixing this final issue involved adding another script to be called within each test suite inside init_per_suite. The reason for cleaning the environment before running the suite is because sometimes when failing, the test suites would skip end_per_suite, potentially leaving previous metadata. Making sure that we use a clean devrel before running the tests should be enough to guarantee determinism.

From this point onwards the remaining time was spent converting as many suites as we could, eventually ending up with a bunch of them. And that pretty much wraps up GSoC!

The End

There are still test suites that need to be migrated to Common Test somewhere in the next few months but the majority of the work is complete 🎉. There were plenty of times I facepalmed from obvious things I missed and countless hours spent trying to debug extraordinarily ciphered error messages. Those moments did not make it to the post itself in the interest of time but will remain with me hopefully as experience that will prevent me from doing similar mistakes in the future.

Parting remarks

I feel humble to have received help from several people I look up to, and I’m most thankful for the opportunity that I was given. Firstly, to my mentors: Bryan, Gordon and Russell, that despite having decades of experience were very considerate and patient with me, even if I messed-up an obvious part. I leave GSoC with an exceptional sense of accomplishment for making an important contribution that is hopefully able to spark other future ones.

Thomas Arts of Quviq deserves a special thanks for gracefully providing me with an EQC license during Google Summer of Code.

There is also a bunch of people in the Erlang Slack channels that have my gratitude for their help in better understanding how rebar3 and Common Test work. Answers to some tricky questions came from Fred Hebert (@ferd), Tristan Sloughter (@tsloughter), Bryan Paxton (@starbelly), Craig Everett (@zxq9) and Brujo Benavides (@elbrujohalcon).

One of my first tasks was to download Riak Test and actually make it work, an adequate task to perform while I was skimming through the code trying to figure out how it all worked. ^[return]
A fresh clone of Riak Test takes more than an hour to finish the compile step on my 2016 Macbook Pro. ^[return]
This is how you’d get reporting abilities, through the exceptions that were generated if the assertions weren’t true. ^[return]
In this case I’m talking about loading the same set of required modules for multiple test suites without running into a maintainability nightmare. ^[return]
I will need to dive deeper into the technique behind the intercepts code to determine if or how it is different from something like meck, a popular Erlang mocking library. ^[return]
The old Riak Test project does not list Riak as a dependency, opting for multiple ad-hoc scripts to do that instead: https://github.com/basho/riak_test/blob/develop/rebar.config#L13-L24 ^[return]
Easier said than done. ^[return]
Well, it did take a long time before I realised how to generate a rebar2 release from within a rebar3 hook, but in the end what matters is that it was actually possible. ^[return]