Song recommendations as a C# Impureim Sandwich
A refactoring example. This is an article in a larger series about functional programming design alternatives. I'm assuming that you've read the previous articles, but briefly told, I'm considering an example presented by Oleksii Holub in the article Pure-Impure Segregation Principle. The example gathers song recommendations for a user in a long-running process. In the previous article I argued that while the memory requirements for this problem seem so vast that an Impureim Sandwich appears out of the question, it's nonetheless worthwhile to at least try it out. The refactoring isn't that difficult, and it turns out that it does simplify the code. Enumeration API # The data access API is a web service: "I don't own the database, those are requests to an external API service (think Spotify API) that provides the data." Tweet, Oleksii Holub, 2022 In order to read all data, we'll have to assume that there's a way to enumerate all songs and all users. With that assumption, I add the GetAllSongs and GetAllUsers methods to the SongService interface: public interface SongService { Task GetAllSongs(); Task GetAllUsers(); Task GetTopListenersAsync(int songId); Task GetTopScrobblesAsync(string userName); } It is, of course, a crucial assumption, and it's possible that no such API exists. On the other hand, a REST API could expose such functionality as a paged feed. Leafing through potentially hundreds (or thousands) such pages is bound to take some time, so it's good to know that this is a background process. As I briefly mentioned in the previous article, we could imagine that we have a dedicated indexing server for this kind purpose. While we may rightly expect the initial data load to take some time (hours, even), once it's in memory, we should be able to reuse it to calculate song recommendations for all users, instead of just one user. In the previous article I estimated that it should be possible to keep all songs in memory with less than a gigabyte. Users, without scrobbles, also take up surprisingly little space, to the degree that a million users fit in a few dozen megabytes. Even if, eventually, we may be concerned about memory, we don't have to be concerned about this part. In any case, the addition of these two new methods doesn't break the existing example code, although I did have to implement the method in the FakeSongService class that I introduced in the article Characterising song recommendations: public Task GetAllSongs() { return Task.FromResult(songs.Values); } public Task GetAllUsers() { return Task.FromResult(users.Select(kvp => new User(kvp.Key, kvp.Value.Values.Sum()))); } With those additions, we can load all data as the first layer (phase, really) of the sandwich. Front-loading the data # Loading all the data is the responsibility of this DataCollector module: public static class DataCollector { public static async Task CollectAllTopListeners(this SongService songService) { var dict = new Dictionary(); foreach (var song in await songService.GetAllSongs()) { var topListeners = await songService.GetTopListenersAsync(song.Id); dict.Add(song.Id, topListeners); } return dict; } public static async Task CollectAllTopScrobbles(this SongService songService) { var dict = new Dictionary(); foreach (var user in await songService.GetAllUsers()) { var topScrobbles = await songService.GetTopScrobblesAsync(user.UserName); dict.Add(user.UserName, topScrobbles); } return dict; } } These two methods work with any SongService implementation, so while the code base will work with FakeSongService, real 'production code' might as well use an HTTP-based implementation that pages through the implied web API. The dictionaries returned by the methods are likely to be huge. That's a major point of this exercise. Once the change is implemented and Characterisation Tests show that it still works, it makes sense to generate data to get a sense of the memory footprint. Table-driven methods # Perhaps you wonder why the above CollectAllTopListeners and CollectAllTopScrobbles methods return dictionaries of exactly that shape. Code Complete describes a programming technique called table-driven methods. The idea is to replace branching instructions such as if, else, and switch with a lookup table. The overall point, however, is that you can replace function calls with table lookups. Consider the GetTopListenersAsync me

A refactoring example.
This is an article in a larger series about functional programming design alternatives. I'm assuming that you've read the previous articles, but briefly told, I'm considering an example presented by Oleksii Holub in the article Pure-Impure Segregation Principle. The example gathers song recommendations for a user in a long-running process.
In the previous article I argued that while the memory requirements for this problem seem so vast that an Impureim Sandwich appears out of the question, it's nonetheless worthwhile to at least try it out. The refactoring isn't that difficult, and it turns out that it does simplify the code.
Enumeration API #
The data access API is a web service:
"I don't own the database, those are requests to an external API service (think Spotify API) that provides the data."
In order to read all data, we'll have to assume that there's a way to enumerate all songs and all users. With that assumption, I add the GetAllSongs
and GetAllUsers
methods to the SongService
interface:
public interface SongService { Task> GetAllSongs(); Task > GetAllUsers(); Task > GetTopListenersAsync(int songId); Task > GetTopScrobblesAsync(string userName); }
It is, of course, a crucial assumption, and it's possible that no such API exists. On the other hand, a REST API could expose such functionality as a paged feed. Leafing through potentially hundreds (or thousands) such pages is bound to take some time, so it's good to know that this is a background process. As I briefly mentioned in the previous article, we could imagine that we have a dedicated indexing server for this kind purpose. While we may rightly expect the initial data load to take some time (hours, even), once it's in memory, we should be able to reuse it to calculate song recommendations for all users, instead of just one user.
In the previous article I estimated that it should be possible to keep all songs in memory with less than a gigabyte. Users, without scrobbles, also take up surprisingly little space, to the degree that a million users fit in a few dozen megabytes. Even if, eventually, we may be concerned about memory, we don't have to be concerned about this part.
In any case, the addition of these two new methods doesn't break the existing example code, although I did have to implement the method in the FakeSongService
class that I introduced in the article Characterising song recommendations:
public Task> GetAllSongs() { return Task.FromResult >(songs.Values); } public Task > GetAllUsers() { return Task.FromResult(users.Select(kvp => new User(kvp.Key, kvp.Value.Values.Sum()))); }
With those additions, we can load all data as the first layer (phase, really) of the sandwich.
Front-loading the data #
Loading all the data is the responsibility of this DataCollector
module:
public static class DataCollector { public static async Taskint, IReadOnlyCollection >> CollectAllTopListeners(this SongService songService) { var dict = new Dictionary<int, IReadOnlyCollection >(); foreach (var song in await songService.GetAllSongs()) { var topListeners = await songService.GetTopListenersAsync(song.Id); dict.Add(song.Id, topListeners); } return dict; } public static async Task string, IReadOnlyCollection >> CollectAllTopScrobbles(this SongService songService) { var dict = new Dictionary<string, IReadOnlyCollection >(); foreach (var user in await songService.GetAllUsers()) { var topScrobbles = await songService.GetTopScrobblesAsync(user.UserName); dict.Add(user.UserName, topScrobbles); } return dict; } }
These two methods work with any SongService
implementation, so while the code base will work with FakeSongService
, real 'production code' might as well use an HTTP-based implementation that pages through the implied web API.
The dictionaries returned by the methods are likely to be huge. That's a major point of this exercise. Once the change is implemented and Characterisation Tests show that it still works, it makes sense to generate data to get a sense of the memory footprint.
Table-driven methods #
Perhaps you wonder why the above CollectAllTopListeners
and CollectAllTopScrobbles
methods return dictionaries of exactly that shape.
Code Complete describes a programming technique called table-driven methods. The idea is to replace branching instructions such as if
, else
, and switch
with a lookup table. The overall point, however, is that you can replace function calls with table lookups.
Consider the GetTopListenersAsync
method. It takes an int
as input, and returns a Task
as output. If you ignore the Task
, that's an IReadOnlyCollection
. In other words, you can exchange an int
for an IReadOnlyCollection
.
If you have an IReadOnlyDictionary<int, IReadOnlyCollection
you can also exchange an int
for an IReadOnlyCollection
. These two APIs are functionally equivalent - although, of course, they have very different memory and run-time profiles.
The same goes for the GetTopScrobblesAsync
method: It takes a string
as input and returns an IReadOnlyCollection
as output (if you ignore the Task
). An IReadOnlyDictionary<string, IReadOnlyCollection
is equivalent.
To make it practical, it turns out that we also need a little helper method to deal with the case where the dictionary has no entry for a given key:
internal static IReadOnlyCollectionGetOrEmpty<T, TKey>( this IReadOnlyDictionary > dict, TKey key) { if (dict.TryGetValue(key, out var result)) return result; return Array.Empty (); }
If there's no entry for a key, this function instead returns an empty array.
That should make it as easy as possible to replace calls to GetTopListenersAsync
and GetTopScrobblesAsync
with dictionary lookups.
Adding method parameters #
When refactoring, it's a good idea to proceed in small, controlled steps. You can see each of my micro-commits in the Git repository's refactor-to-function branch. Here, I'll give an overview.
First, I added two dictionaries as parameters to the GetRecommendationsAsync
method. You may recall that the method used to look like this:
public async Task> GetRecommendationsAsync(string userName)
After I added the two dictionaries, it looks like this:
public async Task> GetRecommendationsAsync( string userName, IReadOnlyDictionary<string, IReadOnlyCollection > topScrobbles, IReadOnlyDictionary<int, IReadOnlyCollection > topListeners)
At this point, the GetRecommendationsAsync
method uses neither the topScrobbles
nor the topListeners
parameter. Still, I consider this a distinct checkpoint that I commit to Git. As I've outlined in my book Code That Fits in Your Head, it's safest to either refactor production code while keeping test code untouched, or refactor test code without editing the production code. An API change like the current is an example of a situation where that separation is impossible. This is the reason I want to keep it small and isolated. While the change does touch both production code and test code, I'm not editing the behaviour of the System Under Test.
Tests now look like this:
[] let ``No data`` () = Gen.userName |> Arb.fromGen |> Prop.forAll <| fun userName -> task { let srvc = FakeSongService () let sut = RecommendationsProvider srvc let! topScrobbles = DataCollector.CollectAllTopScrobbles srvc let! topListeners = DataCollector.CollectAllTopListeners srvc let! actual = sut.GetRecommendationsAsync (userName, topScrobbles, topListeners) Assert.Empty actual } :> Task
The test now uses the DataCollector
to front-load data into dictionaries that it then passes to GetRecommendationsAsync
. Keep in mind that GetRecommendationsAsync
doesn't yet use that data, but it's available to it once I make a change to that effect.
You may wish to compare this version of the No data
test with the previous version shown in the article Characterising song recommendations.
Refactoring to function #
The code is now ready for refactoring from dependency injection to dependency rejection. It's even possible to do it one method call at a time, because the data in the FakeSongService
is the same as the data in the two dictionaries. It's just two different representations of the same data.
Since, as described above, the dictionaries are equivalent to the SongService
queries, each is easily replaced with the other. The first impure action in GetRecommendationsAsync
, for example, is this one:
var scrobbles = await _songService.GetTopScrobblesAsync(userName);
The equivalent dictionary lookup enables us to change that line of code to this:
var scrobbles = topScrobbles.GetOrEmpty(userName);
Notice that the dictionary lookup is a pure function that the method need not await
.
Even though the rest of GetRecommendationsAsync
still queries the injected SongService
, all tests pass, and I can commit this small, isolated change to Git.
Proceeding in a similar fashion enables us to eliminate the SongService
queries one by one. There are only three method calls, so this can be done in three controlled steps. Once the last impure query has been replaced, the C# compiler complains about the async
keyword in the declaration of the GetRecommendationsAsync
method.
Not only is the async
keyword no longer required, the method is no longer asynchronous. There's no reason to return a Task
, and the Async
method name suffix is also misleading.
public IReadOnlyListGetRecommendations( string userName, IReadOnlyDictionary<string, IReadOnlyCollection > topScrobbles, IReadOnlyDictionary<int, IReadOnlyCollection > topListeners)
The GetRecommendations
method no longer uses the injected SongService
, and since it's is the only method of the RecommendationsProvider
class, we can now (r)eject the dependency.
This furthermore means that the class no longer has any class fields; we might as well make it (and the GetRecommendations
function) static
. Here's the final function in its entirety:
public static IReadOnlyListGetRecommendations( string userName, IReadOnlyDictionary<string, IReadOnlyCollection > topScrobbles, IReadOnlyDictionary<int, IReadOnlyCollection > topListeners) { // 1. Get user's own top scrobbles // 2. Get other users who listened to the same songs // 3. Get top scrobbles of those users // 4. Aggregate the songs into recommendations var scrobbles = topScrobbles.GetOrEmpty(userName); var scrobblesSnapshot = scrobbles .OrderByDescending(s => s.ScrobbleCount) .Take(100) .ToArray(); var recommendationCandidates = new List (); foreach (var scrobble in scrobblesSnapshot) { var otherListeners = topListeners.GetOrEmpty(scrobble.Song.Id); var otherListenersSnapshot = otherListeners .Where(u => u.TotalScrobbleCount >= 10_000) .OrderByDescending(u => u.TotalScrobbleCount) .Take(20) .ToArray(); foreach (var otherListener in otherListenersSnapshot) { var otherScrobbles = topScrobbles.GetOrEmpty(otherListener.UserName); var otherScrobblesSnapshot = otherScrobbles .Where(s => s.Song.IsVerifiedArtist) .OrderByDescending(s => s.Song.Rating) .Take(10) .ToArray(); recommendationCandidates.AddRange( otherScrobblesSnapshot.Select(s => s.Song) ); } } var recommendations = recommendationCandidates .OrderByDescending(s => s.Rating) .Take(200) .ToArray(); return recommendations; }
The overall structure is similar to the original version. Now that the code is simpler (because there's no longer any asynchrony) you could keep refactoring. With this C# code example, I'm going to stop here, but when I port it to F# I'm going to refactor more aggressively.
Sandwich #
One point of the whole exercise is to demonstrate how to refactor to an Impureim Sandwich. The GetRecommendations
method shown above constitutes the pure filling of the sandwich, but what does the entire sandwich look like?
In this code base, the sandwiches only exist as unit tests, the simplest of which is still the No data
test:
[] let ``No data`` () = Gen.userName |> Arb.fromGen |> Prop.forAll <| fun user -> task { let srvc = FakeSongService () let! topScrobbles = DataCollector.CollectAllTopScrobbles srvc let! topListeners = DataCollector.CollectAllTopListeners srvc let actual = RecommendationsProvider.GetRecommendations (user, topScrobbles, topListeners) Assert.Empty actual } :> Task
In the above code snippet, I've coloured in the relevant part of the test. I admit that it's a stretch to colour the last line red, since Assert.Empty
is, at least, deterministic. One could argue that since it throws an exception on failure, it's not strictly free of side effects, but that's really a weak argument. It would be easy to refactor assertions to pure functions.
Instead, you may consider the bottom layer of the sandwich as a placeholder where something impure might happen. The background service that updates the song recommendations may, for example, save the result as a (CQRS-style) materialised view.
The above test snippet, then, is more of a sketch of how the Impureim Sandwich may look: First, front-load data using the DataCollector
methods; second, call GetRecommendations
; third, do something with the result.
Conclusion #
The changes demonstrated in this article serve two purposes. One is to show how to refactor an impure action to a pure function, pursuing the notion of an Impureim Sandwich. The second is to evaluate a proof-of-concept: If we do, indeed, front-load all of the data, is it realistic that all data fits in RAM?
We have yet to address that question, but since the present article is already substantial, I'll address that in a separate article. Next: Song recommendations proof-of-concept memory measurements.
This blog is totally free, but if you like it, please consider supporting it.