Skip to content

Performance Benchmarks

This page shows the latest dispatch-throughput and cross-library comparison numbers for DxMessaging. The tables are auto-generated by CI: every pull request and push re-runs the benchmark suite with the .NET Standard 2.1 API profile and a Release code-optimization build, then renders the results into the AUTOGENERATED region below. Throughput is measured in a built Standalone player under IL2CPP in the Release configuration (no development build, Release C++ code generation) -- the ahead-of-time backend and build shape shipped games actually run. GC allocation counts come from an in-editor PlayMode (Mono) leg, because a Release player strips the profiler the allocation recorder needs; the counts are build-config-independent, so they represent the shipped player.

These numbers are for orientation, not a leaderboard. Real-world performance depends on what your handlers actually do; the benchmarks measure raw dispatch cost with minimal handler work. For the full methodology, CI mechanics, baseline capture, the regression smoke gate, and how to add or bump a comparison library, see the Perf Benchmark Methodology runbook.

See also: Performance optimizations for design details.

How to read these tables

  • Scopes. Each dispatch table is labeled by execution scope and backend. Standalone (IL2CPP) -- a Release player on the ahead-of-time backend shipped games run -- is the throughput headline. CI also publishes an in-editor PlayMode (Mono) leg, because a Release player strips the profiler and so cannot measure allocations; the Mono leg supplies the real GC-allocation numbers (see Allocations below). The renderer also understands EditMode rows, so additional scopes render automatically if a future workflow publishes them; backends differ by design, so read each scope against its own backend.
  • Throughput. Reported as emits per second. Higher is better. Registration scenarios report wall-clock time instead, where lower is better. The published throughput numbers come from the Standalone (IL2CPP) leg.
  • Allocations. Reported as the COUNT of managed GC allocations (and a companion byte total) observed over one measurement batch (lower is better; 0 is best for hot-path dispatch). Both come from Unity's GC.Alloc profiler recorder, which is only available where the profiler is present. A Release IL2CPP player strips that recorder, so the Standalone tables would have nothing to put in their memory columns -- rather than publish a column of n/a, those tables omit the memory columns entirely and show only throughput. The allocation and byte numbers come from the in-editor PlayMode (Mono) leg, where the recorder is functional; counts are build-config-independent (the same code paths box and allocate under Mono and IL2CPP), so the Mono numbers represent the shipped IL2CPP player. n/a appears only as an individual cell -- when a metric is measured for a scope or library in general but missing for that one row -- never as a whole vacuous column or matrix; it is never rendered as a misleading 0, and a measured 0 is a real zero-allocation result. (This replaces an earlier byte counter built on GC.GetAllocatedBytesForCurrentThread(), which returns 0 for every allocation under Unity's Boehm GC and so reported a vacuous 0 for every technology -- see the runbook.)
  • Comparison matrix N/A. The cross-library matrix has a column per scenario and a row per library. A cell shows N/A when that library does not idiomatically support that capability -- it is a capability gap, not a failure, and the value is never faked.
  • Comparison matrix winners. In the throughput matrix the fastest technology per scenario column is rendered in bold (ties are all bolded; N/A never wins). The GC-allocations and GC-allocated-bytes matrices are not bolded: an allocation count or byte total is a property to read, not a race.

Latest CI dispatch throughput

The block below is regenerated by the Performance Numbers workflow (.github/workflows/perf-numbers.yml) via scripts/unity/render-perf-doc.js. It contains one dispatch-throughput table per execution scope present in the run (Standalone IL2CPP for throughput, plus an in-editor PlayMode Mono leg that supplies the real GC-allocation counts, and byte totals once a run measures them) plus the cross-library comparison matrices -- one for throughput (sourced from the Standalone leg), and one each for GC allocations and GC allocated bytes per batch, sourced from whichever leg could measure them and omitted when none could. A scope that cannot measure a metric (the profiler-stripped Standalone leg) omits that column rather than filling it with n/a. The block also carries a privacy-safe provenance line describing the runner hardware (CPU, cores, clock, RAM, GPU, OS), never a hostname or runner name. On a pull request the refreshed numbers are posted as a non-blocking sticky comment; after the pull request merges, the same workflow commits the refreshed tables -- and the sibling baseline perf-baseline.csv that the regression gate compares against -- directly to the default branch when the auto-commit App is provisioned and the branch has not advanced past the measured commit. Do not edit it by hand. See the perf-numbers auto-commit runbook for the repo-settings prerequisite that lets CI push to the default branch.

Latest CI benchmark run: Unity 6000.3.16f1, commit f7af651cea0ecddea5c71f5828f483fcde37a67f.

Runner: 13th Gen Intel(R) Core(TM) i9-13900KF, 24C/32T @ 3000MHz; 64GB DDR5@4200; NVIDIA GeForce RTX 3060; Microsoft Windows 11 Pro N (10.0.26200)

Dispatch throughput - Standalone (IL2CPP)

Platform: Standalone IL2CPP x64 Release (WindowsPlayer; Unity 6000.3.16f1).

Scenario Throughput / Wall clock
Untargeted Flood (One Handler) 38.92 M emits/sec
Untargeted Flood (Four Handlers, One Priority) 22.28 M emits/sec
Untargeted Flood (Four Handlers, Four Priorities) 22.48 M emits/sec
Untargeted First Dispatch (Cold, Distinct Types) 0.289 ms
Targeted Flood (One Listener) 8.44 M emits/sec
Targeted Flood (Sixteen Listeners) 6.43 M emits/sec
Targeted First Dispatch (Cold, Distinct Types) 0.212 ms
Broadcast Flood (One Handler) 17.77 M emits/sec
Broadcast First Dispatch (Cold, Distinct Types) 0.328 ms
Interceptor Heavy (Four Interceptors) 3.52 M emits/sec
Post-Processing Heavy (Four Post-Processors) 11.24 M emits/sec
Registration Flood (1000 Types, Cold Bus) 640.284 ms
Registration Flood (1000 Types, Warm JIT) 5.767 ms
Untargeted Registration (Marginal, 1000 Same-Type) 1.665 ms
Targeted Registration (Marginal, 1000 Same-Type) 0.539 ms
Broadcast Registration (Marginal, 1000 Same-Type) 0.810 ms
Deregistration Flood (1000 Types, Cold) 2.078 ms
Deregistration Flood (1000 Types, Warm JIT) 1.929 ms

Dispatch throughput - PlayMode (Mono)

Platform: Editor PlayMode Mono x64 Release (WindowsEditor; Unity 6000.3.16f1).

Scenario Throughput / Wall clock GC allocs GC bytes
Untargeted Flood (One Handler) 21.36 M emits/sec 0 0
Untargeted Flood (Four Handlers, One Priority) 17.83 M emits/sec 0 0
Untargeted Flood (Four Handlers, Four Priorities) 16.92 M emits/sec 0 0
Untargeted First Dispatch (Cold, Distinct Types) 3.429 ms 44 19,886
Targeted Flood (One Listener) 14.28 M emits/sec 0 0
Targeted Flood (Sixteen Listeners) 7.62 M emits/sec 0 0
Targeted First Dispatch (Cold, Distinct Types) 7.839 ms 46 36,668
Broadcast Flood (One Handler) 15.15 M emits/sec 0 0
Broadcast First Dispatch (Cold, Distinct Types) 3.758 ms 45 36,326
Interceptor Heavy (Four Interceptors) 2.90 M emits/sec 0 0
Post-Processing Heavy (Four Post-Processors) 9.92 M emits/sec 0 0
Registration Flood (1000 Types, Cold Bus) 4504.289 ms 56,233 4,490,468
Registration Flood (1000 Types, Warm JIT) 13.271 ms 35,132 3,247,276
Untargeted Registration (Marginal, 1000 Same-Type) 3.710 ms 3,111 1,125,300
Targeted Registration (Marginal, 1000 Same-Type) 1.333 ms 3,111 1,157,300
Broadcast Registration (Marginal, 1000 Same-Type) 1.063 ms 3,111 1,157,300
Deregistration Flood (1000 Types, Cold) 125.774 ms 34 83,528
Deregistration Flood (1000 Types, Warm JIT) 1.760 ms 36 83,624

Library comparison - throughput (Standalone (IL2CPP))

Technology Global -> 1 subscriber Global -> 16 subscribers Keyed/targeted -> 1 of many Priority-ordered dispatch Filtered/intercepted dispatch Post-processing dispatch Subscribe/unsubscribe churn Struct message (no boxing)
DxMessaging 27.52 M emits/sec 12.52 M emits/sec 9.63 M emits/sec 24.22 M emits/sec 7.16 M emits/sec 11.81 M emits/sec 0.70 M emits/sec 29.34 M emits/sec
MessagePipe 75.10 M emits/sec 14.67 M emits/sec 9.49 M emits/sec N/A 64.92 M emits/sec N/A 2.01 M emits/sec 80.78 M emits/sec
UniRx MessageBroker 4.09 M emits/sec 2.35 M emits/sec N/A N/A N/A N/A 0.78 M emits/sec 4.26 M emits/sec
Zenject SignalBus 2.03 M emits/sec 1.13 M emits/sec N/A N/A N/A N/A 1.64 M emits/sec 2.25 M emits/sec
Unity Atoms 101.88 M emits/sec 36.80 M emits/sec 86.92 M emits/sec N/A N/A N/A 9.53 M emits/sec N/A
ScriptableObject channel 120.44 M emits/sec 20.76 M emits/sec 145.53 M emits/sec N/A N/A N/A 28.36 M emits/sec 138.08 M emits/sec
UnityEvent 78.19 M emits/sec 8.56 M emits/sec 85.40 M emits/sec N/A N/A N/A 3.61 M emits/sec 81.43 M emits/sec
C# event 240.60 M emits/sec 45.42 M emits/sec 56.48 M emits/sec N/A N/A N/A 9.62 M emits/sec 264.63 M emits/sec
Unity SendMessage 7.93 M emits/sec 0.97 M emits/sec 7.79 M emits/sec N/A N/A N/A N/A N/A

Library comparison - GC allocations per 10k ops (PlayMode (Mono))

Technology Global -> 1 subscriber Global -> 16 subscribers Keyed/targeted -> 1 of many Priority-ordered dispatch Filtered/intercepted dispatch Post-processing dispatch Subscribe/unsubscribe churn Struct message (no boxing)
DxMessaging 0 0 0 0 0 0 100,000 0
MessagePipe 0 0 0 N/A 0 N/A 20,000 0
UniRx MessageBroker 0 0 N/A N/A N/A N/A 150,000 0
Zenject SignalBus 20,000 20,000 N/A N/A N/A N/A 70,000 20,000
Unity Atoms 0 0 0 N/A N/A N/A 0 N/A
ScriptableObject channel 0 0 0 N/A N/A N/A 0 0
UnityEvent 0 0 0 N/A N/A N/A 50,000 0
C# event 0 0 0 N/A N/A N/A 0 0
Unity SendMessage 10,000 10,000 10,000 N/A N/A N/A N/A N/A

Library comparison - GC allocated bytes per 10k ops (PlayMode (Mono))

Technology Global -> 1 subscriber Global -> 16 subscribers Keyed/targeted -> 1 of many Priority-ordered dispatch Filtered/intercepted dispatch Post-processing dispatch Subscribe/unsubscribe churn Struct message (no boxing)
DxMessaging 0 0 0 0 0 0 8,120,000 0
MessagePipe 0 0 0 N/A 0 N/A 560,000 0
UniRx MessageBroker 0 0 N/A N/A N/A N/A 6,650,000 0
Zenject SignalBus 600,000 600,000 N/A N/A N/A N/A 3,200,000 600,000
Unity Atoms 0 0 0 N/A N/A N/A 0 N/A
ScriptableObject channel 0 0 0 N/A N/A N/A 0 0
UnityEvent 0 0 0 N/A N/A N/A 2,960,000 0
C# event 0 0 0 N/A N/A N/A 0 0
Unity SendMessage 200,000 200,000 200,000 N/A N/A N/A N/A N/A

Comparison libraries

The cross-library comparison matrices above measure DxMessaging against other common Unity messaging and eventing approaches on the same apples-to-apples scenarios:

  • External libraries: MessagePipe, UniRx MessageBroker, Zenject SignalBus, and Unity Atoms.
  • Zero-dependency baselines: plain C# event, UnityEvent, a ScriptableObject event channel, and Unity SendMessage.

Each library implements only the scenarios it idiomatically supports; unsupported cells render N/A. The comparison suite source lives in Tests/Runtime/Comparisons/. For a feature-by-feature discussion of when each approach wins, see the Comparisons guide.

Memory footprint and reclamation

Dispatch state is stored per message type and, for targeted and broadcast paths, per InstanceId. Long-running sessions accumulate slots for every type or entity ever touched unless something reclaims them. The memory reclamation system caps that growth without changing dispatch semantics or allocating during emit.

Reclamation runs on two paths:

  • An idle sweep that runs from emit-time clock samples and the Unity PlayerLoop, gated by DxMessagingRuntimeSettings.EvictionEnabled and EvictionTickIntervalSeconds. Empty slots become eligible only after remaining empty for at least IdleEvictionSeconds of wall time.
  • An explicit IMessageBus.Trim(force) and MessageHandler.TrimAll(force) pair that runs synchronously at scene boundaries, in tests, or in maintenance windows. The master switch EnableTrimApi controls whether the explicit calls perform work; idle sweeps remain controlled by EvictionEnabled independently.

Active registrations are never reclaimed. Only empty slots and shared pool entries are touched. Sweep work runs outside the hot handler loop, so emit throughput is unaffected; the per-emit overhead is one branch that samples the wall clock.

For tuning recommendations, the public Trim and diagnostic-counter API surface, and worked examples (scene transitions, leak diagnosis, mobile caps, shipped-title configurations), see the Memory Reclamation guide. For the parameter reference, see the Runtime Settings reference.