The first release candidate of NATS 2.14 is out, and one of its standout JetStream additions is fast-ingest batch publishing: a new batch-publishing mode built purely for throughput, complementing the atomic batch publishing that shipped with 2.12.
This post compares fast-ingest to the existing atomic batches, shows how to use it via the orbit.go library, and benchmarks the three publishing paths (plain publish, atomic batch, and fast-ingest) on both memory and file storage.
Both modes batch multiple messages under a single logical operation, but they trade off differently:
| Atomic batch (2.12+) | Fast-ingest batch (2.14+) | |
|---|---|---|
| Guarantee | All-or-nothing: either every message persists or none do | Best-effort: messages can be lost; gap behavior is configurable (ok or fail) |
| Batch size limit | 1,000 messages | Unlimited |
| Server behavior | Stages messages until commit | Persists messages as they arrive |
| Flow control | None | Server-driven, with tunable ack frequency |
| Leader change mid-batch | Staged state lost, so batch fails | Continues if gap mode is ok; abandoned if fail |
| Intended for | Consistency across related messages | High-throughput ingest |
Reach for atomic batching when a group of messages must succeed or fail as a unit. Think “these three events belong to one transaction”. Reach for fast-ingest when you’re pumping telemetry, IoT samples, or log lines into JetStream as fast as possible and can tolerate occasional gaps. ADR-50 defines both.
One key protocol difference: atomic batches carry control information in headers (Nats-Batch-Id, Nats-Batch-Sequence, Nats-Batch-Commit). Fast-ingest encodes control information in the reply subject (<prefix>.<uuid>.<flow>.<gap-mode>.<seq>.<op>.$FI). That tiny shift gives fast-ingest its throughput edge: no per-message header parsing, and the server uses lightweight flow control instead of staging.
Fast-ingest, like atomic batching, is opt-in per stream:
cfg := jetstream.StreamConfig{ Name: "EVENTS", Subjects: []string{"events.>"}, AllowAtomicPublish: true, // atomic batches (allow_atomic) AllowBatchPublish: true, // fast-ingest (allow_batched)}The server added AllowBatchPublish in 2.14. The nats.go field already lives in the 2.14-dev branch but hasn’t landed on main or in a release yet (the latest tagged version, v1.51.0, still only has AllowAtomicPublish). Until it ships, set allow_batched: true manually when creating the stream via the JSON API, or pull from the 2.14-dev branch.
FastPublisherThe orbit.go API mirrors the existing BatchPublisher, but drops the atomicity guarantee:
import "github.com/synadia-io/orbit.go/jetstreamext"
fp, err := jetstreamext.NewFastPublisher(js)if err != nil { log.Fatal(err)}
for i := 0; i < 100_000; i++ { if _, err := fp.Add("events.sensor", payload(i)); err != nil { log.Fatal(err) }}
// Commit returns a final ack that includes the last persisted sequence.ack, err := fp.Commit(ctx, "events.sensor", payload(100_000))if err != nil { log.Fatal(err)}fmt.Printf("last sequence: %d\n", ack.Sequence)A few things worth calling out:
FastPublisher ties to a single goroutine. For parallel producers, create one publisher per goroutine.Close() instead of Commit() works when you don’t want to persist a final message. You just want a clean end-of-batch signal and ack.fp, err := jetstreamext.NewFastPublisher(js, jetstreamext.FastPublishFlowControl{ Flow: 200, // client-side max; server caps at min(500/N, Flow) where N = concurrent fast publishers on the stream MaxOutstandingAcks: 3, // client-side back-pressure threshold AckTimeout: 10 * time.Second,})Gap handling. By default, if the server notices a gap (lost message), it abandons the batch. Pair WithFastPublisherContinueOnGap(true) with WithFastPublisherErrorHandler(...) to keep the batch going and still hear about each dropped message (gaps are only surfaced through the error handler, there is no automatic logging):
fp, err := jetstreamext.NewFastPublisher(js, jetstreamext.WithFastPublisherContinueOnGap(true), jetstreamext.WithFastPublisherErrorHandler(func(err error) { log.Printf("fast-ingest: %v", err) // fires on BatchFlowGap and flow errors }),)Enough theory. Each cell below is the median of 5 timed runs of 100,000 messages (plus a 5,000-message warm-up), on an Apple M3 with 16 GB RAM and an APFS SSD. Single nats-server (v2.12.7 or v2.14.0-RC.1), R1 stream on loopback, no clustering. Payloads of 64 B, 256 B, and 4 KiB; batch sizes 10/100/1000 for both batch methods plus 2000/5000/10000 for fast-ingest. Fast-ingest uses orbit.go defaults (Flow=100, MaxOutstandingAcks=2); tuning comes further down.
On file storage, every write still goes through a syscall path and lands in NATS’s on-disk message blocks, even though the server only fsyncs periodically in the background (sync_interval, default 2 minutes; sync_always off by default). That per-write cost plus OS page-cache flushing narrows the spread between methods. Plain sync publish tops out around 32k msgs/s, and PublishAsync pulls that up to ~370k by not waiting for each ack.
At batch size 10, atomic batching runs slower than sync publish. You pay the batch setup cost without many messages’ worth of amortization. The crossover happens at batch size ≥ 100, where both atomic and fast-ingest surge past sync.
At batch size 1000, fast-ingest delivers ~360k msgs/s versus ~220k msgs/s for atomic, about +64 %. Larger fast-ingest batches don’t help further on file storage (2000, 5000, 10000 all hover in the ~340-365k msgs/s range); the disk saturates well before the protocol does, which puts fast-ingest roughly at parity with PublishAsync here.
One data point worth noting: atomic-batch throughput is identical between 2.12.7 and 2.14.0-RC.1 (~220k msgs/s at batch 1000 on both). Fast-ingest moves the needle in 2.14, not atomic-batch tuning.
Memory storage is where fast-ingest pulls ahead. With no disk in the path, per-message overhead becomes the bottleneck, and the lighter-weight control protocol matters:
publish-sync: ~35k/s (dominated by RPC round trip, about the same as file storage).publish-async: ~650k/s.atomic-batch at batch 1000: ~544k/s on 2.14 (and ~511k on 2.12).fast-ingest at batch 1000: ~922k/s.fast-ingest at batch 2000 / 5000 / 10000: ~941k / 960k / 981k msgs/s.That’s roughly +69 % over atomic at batch 1000 and +42 % over the fastest plain publish path. Fast-ingest keeps gaining modestly past batch 1000, whereas atomic can’t go there.
The numbers above use the FastPublisher defaults (Flow=100, MaxOutstandingAcks=2). The shipped defaults prioritize stable flow control over absolute peak throughput.
The FastPublishFlowControl knobs visibly change throughput. I ran a small sweep at batch=1000, 256 B (5 iterations per combo, median reported). The shipped defaults prioritize stable flow control over absolute peak throughput:
Flow is an upper bound the client gives the server; the server picks the actual cadence. Looking at the 2.14.0-RC.1 implementation (fastBatchInit and checkFlowControl), a solo publisher starts at min(500, Flow) and targets min(500 / N, Flow) where N is the concurrent fast publishers on the stream, ramping by doubling or halving until it matches. When a second publisher joins, the server drops the initial rate to 1 so it can coordinate.
Two consequences for tuning:
Flow=500, a solo publisher sees almost no change. The server’s internal target is already capped at 500 (411k vs 416k on file, 1.20M vs 1.14M on memory is the same cadence with slightly different MaxOutstandingAcks).Raising MaxOutstandingAcks from 2 to 5 helps when the publisher outpaces the ack loop, giving the client more room to keep adding while earlier batches settle.
Expect these numbers to change. The server source still carries TODO markers on both halves of the scheduler:
min(500, Flow) for the first publisher on a stream, 1 for anyone joining after. (fastBatchInit, L174)500 / N where N is the number of concurrent fast publishers on the stream, ramped via doubling/halving. The comment in the code says it should eventually weigh by average message size, RAFT in-flight pressure, and each publisher’s contribution. (checkFlowControl, L286-289)Once the scheduler gets smarter, the Flow=500 ceiling for a solo publisher and the linear 500/N split are likely to move.
ADR-50 notes that streams configured with persist_mode: async are compatible with fast-ingest. In that mode the server batches file writes asynchronously instead of flushing in place, so disk I/O leaves the publish hot path.
Stream requirement: persist_mode: async is mutually exclusive with allow_atomic — the server rejects stream creation with err_code 10052 if you set both. Atomic batching depends on synchronous staging, which is exactly what async flushing removes. So this mode is fast-ingest-only.
At 256 B on file storage, fast-ingest alone, default vs async persist mode (100k messages, median of 5):
| Batch | persist_mode: default | persist_mode: async | Delta |
|---|---|---|---|
| 100 | ~252k msgs/s | ~588k msgs/s | +133 % |
| 1000 | ~377k msgs/s | ~1.03M msgs/s | +172 % |
| 10000 | ~382k msgs/s | ~1.09M msgs/s | +185 % |
At batch=10000 the file-backed stream in async mode pushes ~1.09M msgs/s, slightly above the memory-storage peak we saw earlier (~981k msgs/s). Setting persist_mode: async effectively removes the “file storage is the bottleneck” narrative for fast-ingest.
The tradeoff sits in the durability model: writes that haven’t been flushed yet can be lost on an unclean shutdown. That’s fine for the workloads fast-ingest targets (telemetry, logs, sensor data with tolerable gaps) and already aligned with fast-ingest’s best-effort semantics, but it’s not a free lunch for durability-sensitive streams.
Pulling the best-case numbers for each method on file storage:
| Method | Server | Best throughput |
|---|---|---|
publish-sync | 2.14.0-RC.1 | ~32k msgs/s |
publish-async | 2.14.0-RC.1 | ~370k msgs/s |
atomic-batch (b=1000) | 2.14.0-RC.1 | ~220k msgs/s |
fast-ingest (b=1000) | 2.14.0-RC.1 | ~360k msgs/s |
fast-ingest (b=2000) | 2.14.0-RC.1 | ~364k msgs/s |
The takeaway: on realistic file-backed streams, fast-ingest matches PublishAsync in raw speed while giving you explicit batching semantics, flow control, and the ability to commit on a final message. On memory-backed streams, it leaves everything else behind.
Looking at the file numbers above, PublishAsync sits at ~370k msgs/s and fast-ingest at ~360k msgs/s, near parity. It’s a fair question: don’t wait for each ack, let messages fly. So why a new mode?
The difference is what the server sees. PublishAsync is still an independent per-message publish: each message gets its own async reply token under a single wildcard reply subscription, and the server emits a full PubAck per message. The client hides the round-trip behind a PubAckFuture and a pending-ack map (default MaxPending = 4000; the client stalls for up to stallWait when that fills up). Every per-message cost stays; only the wait is hidden.
There’s also a correctness catch: PublishAsync does not guarantee the message landed in the stream until you observe the ack. It returns a PubAckFuture and moves on. If you never read paf.Err() (or set an error callback) and only wait on PublishAsyncComplete(), you’ll see the “all outstanding publishes resolved” signal but can still miss individual failures. On a reconnect, nats.go resolves every in-flight future with nats.ErrDisconnected; that error still has to be surfaced via those futures or the configured error handler. The benchmark numbers in this post wait on PublishAsyncComplete(), which in the no-error path corresponds to all acks arriving, but the code does not inspect each PubAckFuture individually. In real code, “I called PublishAsync” still differs from “it’s persisted”.
Fast-ingest uses a different protocol. The client opens a batched session, streams messages under one batch id, and the server emits one BatchFlowAck every ackMessages messages (starts at min(500, Flow) for the first publisher on the stream and adjusts toward 500 / N where N is the concurrent fast publishers). That means:
PubAck per message.PubAckFuture map to maintain; control information is carried in the batch reply subjects and periodic BatchFlowAck messages.BatchFlowGap, so the client sees dropped messages as a first-class event. With PublishAsync, individual failures still reach you, but only if you look at each PubAckFuture or the error callback.MaxOutstandingAcks limits how far ahead the client can run relative to the latest BatchFlowAck; the server drives the actual cadence via BatchFlowAck messages, so there’s no 100k-entry PubAckFuture map to walk.On file storage, disk I/O buries those protocol savings, so the two look similar. On memory storage, where protocol overhead drives throughput, fast-ingest pulls ahead cleanly (~922k vs ~650k msgs/s at batch 1000) and scales further with batch size.
Use PublishAsync for independent messages with standard per-message acks. Reach for fast-ingest when you want a dedicated ingest path, explicit batches, and lightweight flow control.
The charts above fix the payload at 256 bytes. To see how much the payload size matters, I also ran the matrix at 64 B (think: sensor sample, tight structured log line) and 4 KiB (a decent-sized JSON event).
On file storage, per-message throughput drops as the payload grows. Expected: writing more bytes per message increases both the syscall cost and the page-cache pressure. At batch size 1000 on v2.14.0-RC.1:
| Payload | PublishAsync | Atomic batch | Fast-ingest |
|---|---|---|---|
| 64 B | 381k msgs/s | 231k msgs/s | 373k msgs/s |
| 256 B | 370k msgs/s | 220k msgs/s | 360k msgs/s |
| 4 KiB | 156k msgs/s | 91k msgs/s | 155k msgs/s |
On file storage, fast-ingest runs neck-and-neck with PublishAsync, with slightly lower raw throughput at smaller payloads but still dramatically ahead of atomic-batch. At 4 KiB its bandwidth hits ~620 MB/s. The disk, not the protocol, is the bottleneck.
On memory storage the story shifts: the absolute numbers jump dramatically, and fast-ingest’s advantage over atomic-batch grows at larger payloads. At 4 KiB on memory, fast-ingest at batch=1000 hits ~464k msgs/s ≈ 1.81 GB/s of ingest, compared to ~256k msgs/s (~1.0 GB/s) for atomic-batch.
Two practical takeaways:
PublishAsync, while giving you batch commit semantics.A few things to keep in mind before extrapolating:
sync_interval (including sync_always) or the underlying storage will move the absolute numbers.Fast-ingest is one of many improvements in the RC. A pile of other JetStream additions deserve attention. Briefly, based on the release notes:
Nats-Schedule: @every 5m or crontab syntax on publish.Nats-Schedule-Source header lets you resample the last message for a subject at a different rate.$JS.API.CONSUMER.RESET.stream.consumer rewinds a consumer without deleting it.eob) for atomic batches: the commit marker can now be a non-persisted sentinel instead of an extra payload.Plenty to cover in future posts. Keep an eye on the NATS blog for the 2.14 GA announcement.
FastPublisherIf you prefer a GUI for managing the streams you’re pumping data into, check out Qaze.