Using Compression for NATS Streams

NATS 2.10 is finally available and it comes with a lot of exciting new features. In this blog post, we’re looking at the new compression feature for NATS Streams.

Starting with NATS 2.10, you can configure the compression for every stream individually if you use the file storage type. Currently, there’s support for the S2 compression format but there may be other supported formats in the future. The S2 compression is an extension of Google’s Snappy compression with the goal of high throughput.

Let’s take a look at the storage savings when using the S2 compression.

Creating the streams

To test the compression efficiency, we create two streams with NATS CLI with the exact same configuration besides compression turned on/off.

nats stream add NO_COMPRESSION \
--subjects NO_COMPRESSION \
--replicas 1 \
--storage=file \
--defaults

nats stream add WITH_COMPRESSION \
--subjects WITH_COMPRESSION \
--replicas 1 \
--storage=file \
--compression s2 \
--defaults

Please note that you need at least version 0.1.0 of the NATS CLI, which is the first version that has the compression option.

Generating Test Data

To be able to compare the size of the two streams, we publish the same messages (32 bytes payload) to both of the streams with the help of the CLI. Then we take a look at the used storage with the help of du for both of the streams a few times after inserting a fixed amount of messages.

nats pub NO_COMPRESSION --count 10000 "4KJZE2Nx6IbTsEo8lvGJIg1XDz2njdLF"
nats pub WITH_COMPRESSION --count 10000 "4KJZE2Nx6IbTsEo8lvGJIg1XDz2njdLF"

You can see my full script for generating the test data/results here.

Test Results

Stream Size Comparison

Saved Storage

We can see that in this test setup, we have no storage savings until ~100,000 messages. In fact, we have a slight storage overhead (which I don’t understand so far). After that amount of messages, the compression kicks in and brings us up to 86 % of storage savings. This behavior initially puzzled me as I looked at the results. After examining the inner workings of `nats-server“, it all became much clearer to me. If you want to understand the compression logic better, read the compression logic section in this blog post.

NATS CLI Stream Size

With nats stream info […] we are able to check the stream size via the CLI. When looking at the graph below, it’s interesting that the reported size of the compressed stream is slightly higher than the uncompressed stream. So the real disk space used is currently not available via the nats-server API when having compression turned on.

NATS CLI reported stream sizes

Is it possible to turn on compression for existing streams?

Yes, it’s possible (which is really great) but it doesn’t give you storage savings for the existing messages just by turning it on. Let’s take a look at our NO_COMPRESSION stream and turn on S2 compression via the CLI:

nats stream edit NO_COMPRESSION --compression s2

After enabling it, the used disk space (checked with du -c [nats-folder]/jetstream/$G/NO_COMPRESSION) is exactly the same as before. Even when adding a few messages to the stream, the compression doesn’t effect for the existing messages. So let’s take a deeper look how compression logic looks like in nats-server.

Compression Logic in nats-server

NATS stores Stream messages in blocks on disk. Each block is stored in a separate file in the Jetstream storage directory, specifically in the [nats-folder]/jetstream/$G/[stream-name]/msgs folder. These blocks are represented by [index].blk files. The size of a block depends on various factors, such as encryption, retention policy, and the maximum bytes configured for the stream. For a limits-based stream without a MaxBytes limit or encryption enabled, the block size is 8 MB.

Before writing each message, the server checks if it has reached the end of the current block based on the current block size. If it has, the full block gets compressed. This means that not every new incoming message is compressed immediately, which helps maintain a good write throughput.

The compression process for a full block begins by reading the entire contents of the current block into memory before compressing it. The compression is performed solely in memory. Once compression is complete, the block is encrypted (if encryption is enabled). To ensure safety, the compressed/(encrypted) block is initially written to a new temporary file. After a successful write operation, the temporary file is then renamed to the original block name. Following this, a new (uncompressed) block gets created, and the new incoming message is written to that new block.

Wrap Up

I hope you enjoyed reading about the new compression feature of NATS Streams. The compression can save a significant amount of storage/money, especially in cloud environments. When appending messages to a stream, there is no immediate performance overhead for each individual message write, as compression is only executed when a block reaches full capacity.