Building a (Java-) Redis Clone, naively
Oren Eini (Ayende) has a wonderful blog series where he builds a small Redis clone in C#, and then investigates performance issues and improves on it. I got tempted to do a Java version, out of curiosity. I probably won’t go as far as Oren Eini with the profiling and optimization, but I’ll try to do at least some iterations.
Anyway, the start is a very naive implementation, without any performance considerations. The performance is tested with memtier_benchmark, a part of Redis. I’m using the benchmark at revision 422252cd6960a5ac88dbec047741973297ce1da9.
I ran the benchmark on these AWS machine types:
The memtier_benchmark runs on a
c6g.2xlarge
, which is an ARM Graviton instance with 8 cores and 16GByte memoryThe Redic clone runs on a
c6g.4xlarge
, which is an ARM Graviton instance with 16 cores and 32GByte memory
I’ve built and ran the memtier_benchmark in Docker:
sudo docker build --tag memtier_benchmark . sudo docker run --name redisb --rm --network host memtier_benchmark \ --server=$SERVER_IP --port=16379 \ -t 8 -c 32 --test-time=30 --distinct-client-seed -d 256 --pipeline=30
The benchmark parameters are otherwise the same as in Oren Eini blog: 8 Threads, one for each core of the client, 16 connections per thread, with 20% writes, 80% reads and a data size of 256. The test runs for 30 seconds.
The App runs on the latest Java JDK 18, with no tuning flags:
java -jar redis-experiment.jar
The server gets loaded while the benchmark runs:
Here is the summary of the naive implementation’s result:
============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 331686.25 --- --- 2.09981 1.75900 8.09500 48.63900 98433.33 Gets 3316816.37 1215996.13 2100820.23 2.09701 1.75900 8.09500 48.38300 436764.91 Waits 0.00 --- --- --- --- --- --- --- Totals 3648502.62 1215996.13 2100820.23 2.09727 1.75900 8.09500 48.38300 535198.25
Well, I am impressed by these numbers: We got over 3 million, GET’s per second. I repeat, over 3'000'000 GET’s per second!! The latency is not super consistent, but still the p99% stays under 10ms and p99.9 under 50ms. I hope I didn’t make a mistake resulting in these good numbers.
Keep in mind, that this is an artificial benchmark with a very simplified implementation and test setup.
Anyway, here is the code of the Redis clone, which is a 1:1 translation of Oren Eini’s C# code:
package info.gamlor.redis;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.net.InetSocketAddress;
import java.net.ServerSocket;
import java.net.Socket;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.Executors;
public class RedisMain {
public static void main(String[] args) throws IOException {
var scheduler = Executors.newCachedThreadPool();
var socket = new ServerSocket();
socket.bind(new InetSocketAddress("0.0.0.0", 16379));
System.out.println("App is listening on 0.0.0.0:16379");
var clone = new RedisClone();
while (true) {
var client = socket.accept();
scheduler.execute(() -> {
try (client) {
clone.handleConnection(client);
} catch (Exception e) {
e.printStackTrace();
}
});
}
}
}
class RedisClone {
private final ConcurrentHashMap<String, String> state = new ConcurrentHashMap<>();
public void handleConnection(Socket socket) throws Exception {
var args = new ArrayList<String>();
var reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), StandardCharsets.UTF_8));
var writer = new OutputStreamWriter(socket.getOutputStream(), StandardCharsets.UTF_8);
while (true) {
args.clear();
var line = reader.readLine();
if (line == null)
break;
if (line.charAt(0) != '*')
throw new RuntimeException("Cannot understand arg batch: " + line);
var argsv = Integer.parseInt(line.substring(1));
for (int i = 0; i < argsv; i++) {
line = reader.readLine();
if (line == null || line.charAt(0) != '$')
throw new RuntimeException("Cannot understand arg length: " + line);
var argLen = Integer.parseInt(line.substring(1));
line = reader.readLine();
if (line == null || line.length() != argLen)
throw new RuntimeException("Wrong arg length expected " + argLen + " got: " + line);
args.add(line);
}
var reply = executeCommand(args);
if (reply == null) {
writer.write("$-1\r\n");
} else {
writer.write("$" + reply.length() + "\r\n" + reply + "\r\n");
}
writer.flush();
}
}
String executeCommand(List<String> args) {
switch (args.get(0)) {
case "GET":
return state.get(args.get(1));
case "SET":
state.put(args.get(1), args.get(2));
return null;
default:
throw new IllegalArgumentException("Unknown command: " + args.get(1));
}
}
}
A few notes: Again, the implementation is straight forward, with not consideration for performance, logging or error handling. Also, it uses classic blocking IO with a thread per client. If you connect multiple thousands of clients, all these thread will become an issue. For that reason virtual threads are commint to Java, which will be a great solution for many apps. Further, there is also no memory pressure during the benchmark, giving the GC more breathing room.
Anyway, in the next blog post we do some light initial performance profiling and see what we can improve.