Performance Benchmarks¶
This page shows the latest dispatch-throughput and cross-library comparison numbers for DxMessaging. The tables are auto-generated by CI: every pull request and push re-runs the benchmark suite with the .NET Standard 2.1 API profile and a Release code-optimization build, then renders the results into the AUTOGENERATED region below. Throughput is measured in a built Standalone player under IL2CPP in the Release configuration (no development build, Release C++ code generation) -- the ahead-of-time backend and build shape shipped games actually run. GC allocation counts come from an in-editor PlayMode (Mono) leg, because a Release player strips the profiler the allocation recorder needs; the counts are build-config-independent, so they represent the shipped player.
These numbers are for orientation, not a leaderboard. Real-world performance depends on what your handlers actually do; the benchmarks measure raw dispatch cost with minimal handler work. For the full methodology, CI mechanics, baseline capture, the regression smoke gate, and how to add or bump a comparison library, see the Perf Benchmark Methodology runbook.
See also: Performance optimizations for design details.
How to read these tables¶
- Scopes. Each dispatch table is labeled by execution scope and backend. Standalone (IL2CPP) -- a Release player on the ahead-of-time backend shipped games run -- is the throughput headline. CI also publishes an in-editor PlayMode (Mono) leg, because a Release player strips the profiler and so cannot measure allocations; the Mono leg supplies the real GC-allocation numbers (see Allocations below). The renderer also understands EditMode rows, so additional scopes render automatically if a future workflow publishes them; backends differ by design, so read each scope against its own backend.
- Throughput. Reported as emits per second. Higher is better. Registration scenarios report wall-clock time instead, where lower is better. The published throughput numbers come from the Standalone (IL2CPP) leg.
- Allocations. Reported as the COUNT of managed GC allocations (and a companion byte total) observed over one measurement batch (lower is better;
0is best for hot-path dispatch). Both come from Unity'sGC.Allocprofiler recorder, which is only available where the profiler is present. A Release IL2CPP player strips that recorder, so the Standalone tables would have nothing to put in their memory columns -- rather than publish a column ofn/a, those tables omit the memory columns entirely and show only throughput. The allocation and byte numbers come from the in-editor PlayMode (Mono) leg, where the recorder is functional; counts are build-config-independent (the same code paths box and allocate under Mono and IL2CPP), so the Mono numbers represent the shipped IL2CPP player.n/aappears only as an individual cell -- when a metric is measured for a scope or library in general but missing for that one row -- never as a whole vacuous column or matrix; it is never rendered as a misleading0, and a measured0is a real zero-allocation result. (This replaces an earlier byte counter built onGC.GetAllocatedBytesForCurrentThread(), which returns0for every allocation under Unity's Boehm GC and so reported a vacuous0for every technology -- see the runbook.) - Comparison matrix
N/A. The cross-library matrix has a column per scenario and a row per library. A cell showsN/Awhen that library does not idiomatically support that capability -- it is a capability gap, not a failure, and the value is never faked. - Comparison matrix winners. In the throughput matrix the fastest technology per scenario column is rendered in bold (ties are all bolded;
N/Anever wins). The GC-allocations and GC-allocated-bytes matrices are not bolded: an allocation count or byte total is a property to read, not a race.
Latest CI dispatch throughput¶
The block below is regenerated by the Performance Numbers workflow (.github/workflows/perf-numbers.yml) via scripts/unity/render-perf-doc.js. It contains one dispatch-throughput table per execution scope present in the run (Standalone IL2CPP for throughput, plus an in-editor PlayMode Mono leg that supplies the real GC-allocation counts, and byte totals once a run measures them) plus the cross-library comparison matrices -- one for throughput (sourced from the Standalone leg), and one each for GC allocations and GC allocated bytes per batch, sourced from whichever leg could measure them and omitted when none could. A scope that cannot measure a metric (the profiler-stripped Standalone leg) omits that column rather than filling it with n/a. The block also carries a privacy-safe provenance line describing the runner hardware (CPU, cores, clock, RAM, GPU, OS), never a hostname or runner name. On a pull request the refreshed numbers are posted as a non-blocking sticky comment; after the pull request merges, the same workflow commits the refreshed tables -- and the sibling baseline perf-baseline.csv that the regression gate compares against -- directly to the default branch when the auto-commit App is provisioned and the branch has not advanced past the measured commit. Do not edit it by hand. See the perf-numbers auto-commit runbook for the repo-settings prerequisite that lets CI push to the default branch.
Latest CI benchmark run: Unity 6000.3.16f1, commit f7af651cea0ecddea5c71f5828f483fcde37a67f.
Runner: 13th Gen Intel(R) Core(TM) i9-13900KF, 24C/32T @ 3000MHz; 64GB DDR5@4200; NVIDIA GeForce RTX 3060; Microsoft Windows 11 Pro N (10.0.26200)
Dispatch throughput - Standalone (IL2CPP)¶
Platform: Standalone IL2CPP x64 Release (WindowsPlayer; Unity 6000.3.16f1).
| Scenario | Throughput / Wall clock |
|---|---|
| Untargeted Flood (One Handler) | 38.92 M emits/sec |
| Untargeted Flood (Four Handlers, One Priority) | 22.28 M emits/sec |
| Untargeted Flood (Four Handlers, Four Priorities) | 22.48 M emits/sec |
| Untargeted First Dispatch (Cold, Distinct Types) | 0.289 ms |
| Targeted Flood (One Listener) | 8.44 M emits/sec |
| Targeted Flood (Sixteen Listeners) | 6.43 M emits/sec |
| Targeted First Dispatch (Cold, Distinct Types) | 0.212 ms |
| Broadcast Flood (One Handler) | 17.77 M emits/sec |
| Broadcast First Dispatch (Cold, Distinct Types) | 0.328 ms |
| Interceptor Heavy (Four Interceptors) | 3.52 M emits/sec |
| Post-Processing Heavy (Four Post-Processors) | 11.24 M emits/sec |
| Registration Flood (1000 Types, Cold Bus) | 640.284 ms |
| Registration Flood (1000 Types, Warm JIT) | 5.767 ms |
| Untargeted Registration (Marginal, 1000 Same-Type) | 1.665 ms |
| Targeted Registration (Marginal, 1000 Same-Type) | 0.539 ms |
| Broadcast Registration (Marginal, 1000 Same-Type) | 0.810 ms |
| Deregistration Flood (1000 Types, Cold) | 2.078 ms |
| Deregistration Flood (1000 Types, Warm JIT) | 1.929 ms |
Dispatch throughput - PlayMode (Mono)¶
Platform: Editor PlayMode Mono x64 Release (WindowsEditor; Unity 6000.3.16f1).
| Scenario | Throughput / Wall clock | GC allocs | GC bytes |
|---|---|---|---|
| Untargeted Flood (One Handler) | 21.36 M emits/sec | 0 | 0 |
| Untargeted Flood (Four Handlers, One Priority) | 17.83 M emits/sec | 0 | 0 |
| Untargeted Flood (Four Handlers, Four Priorities) | 16.92 M emits/sec | 0 | 0 |
| Untargeted First Dispatch (Cold, Distinct Types) | 3.429 ms | 44 | 19,886 |
| Targeted Flood (One Listener) | 14.28 M emits/sec | 0 | 0 |
| Targeted Flood (Sixteen Listeners) | 7.62 M emits/sec | 0 | 0 |
| Targeted First Dispatch (Cold, Distinct Types) | 7.839 ms | 46 | 36,668 |
| Broadcast Flood (One Handler) | 15.15 M emits/sec | 0 | 0 |
| Broadcast First Dispatch (Cold, Distinct Types) | 3.758 ms | 45 | 36,326 |
| Interceptor Heavy (Four Interceptors) | 2.90 M emits/sec | 0 | 0 |
| Post-Processing Heavy (Four Post-Processors) | 9.92 M emits/sec | 0 | 0 |
| Registration Flood (1000 Types, Cold Bus) | 4504.289 ms | 56,233 | 4,490,468 |
| Registration Flood (1000 Types, Warm JIT) | 13.271 ms | 35,132 | 3,247,276 |
| Untargeted Registration (Marginal, 1000 Same-Type) | 3.710 ms | 3,111 | 1,125,300 |
| Targeted Registration (Marginal, 1000 Same-Type) | 1.333 ms | 3,111 | 1,157,300 |
| Broadcast Registration (Marginal, 1000 Same-Type) | 1.063 ms | 3,111 | 1,157,300 |
| Deregistration Flood (1000 Types, Cold) | 125.774 ms | 34 | 83,528 |
| Deregistration Flood (1000 Types, Warm JIT) | 1.760 ms | 36 | 83,624 |
Library comparison - throughput (Standalone (IL2CPP))¶
| Technology | Global -> 1 subscriber | Global -> 16 subscribers | Keyed/targeted -> 1 of many | Priority-ordered dispatch | Filtered/intercepted dispatch | Post-processing dispatch | Subscribe/unsubscribe churn | Struct message (no boxing) |
|---|---|---|---|---|---|---|---|---|
| DxMessaging | 27.52 M emits/sec | 12.52 M emits/sec | 9.63 M emits/sec | 24.22 M emits/sec | 7.16 M emits/sec | 11.81 M emits/sec | 0.70 M emits/sec | 29.34 M emits/sec |
| MessagePipe | 75.10 M emits/sec | 14.67 M emits/sec | 9.49 M emits/sec | N/A | 64.92 M emits/sec | N/A | 2.01 M emits/sec | 80.78 M emits/sec |
| UniRx MessageBroker | 4.09 M emits/sec | 2.35 M emits/sec | N/A | N/A | N/A | N/A | 0.78 M emits/sec | 4.26 M emits/sec |
| Zenject SignalBus | 2.03 M emits/sec | 1.13 M emits/sec | N/A | N/A | N/A | N/A | 1.64 M emits/sec | 2.25 M emits/sec |
| Unity Atoms | 101.88 M emits/sec | 36.80 M emits/sec | 86.92 M emits/sec | N/A | N/A | N/A | 9.53 M emits/sec | N/A |
| ScriptableObject channel | 120.44 M emits/sec | 20.76 M emits/sec | 145.53 M emits/sec | N/A | N/A | N/A | 28.36 M emits/sec | 138.08 M emits/sec |
| UnityEvent | 78.19 M emits/sec | 8.56 M emits/sec | 85.40 M emits/sec | N/A | N/A | N/A | 3.61 M emits/sec | 81.43 M emits/sec |
| C# event | 240.60 M emits/sec | 45.42 M emits/sec | 56.48 M emits/sec | N/A | N/A | N/A | 9.62 M emits/sec | 264.63 M emits/sec |
| Unity SendMessage | 7.93 M emits/sec | 0.97 M emits/sec | 7.79 M emits/sec | N/A | N/A | N/A | N/A | N/A |
Library comparison - GC allocations per 10k ops (PlayMode (Mono))¶
| Technology | Global -> 1 subscriber | Global -> 16 subscribers | Keyed/targeted -> 1 of many | Priority-ordered dispatch | Filtered/intercepted dispatch | Post-processing dispatch | Subscribe/unsubscribe churn | Struct message (no boxing) |
|---|---|---|---|---|---|---|---|---|
| DxMessaging | 0 | 0 | 0 | 0 | 0 | 0 | 100,000 | 0 |
| MessagePipe | 0 | 0 | 0 | N/A | 0 | N/A | 20,000 | 0 |
| UniRx MessageBroker | 0 | 0 | N/A | N/A | N/A | N/A | 150,000 | 0 |
| Zenject SignalBus | 20,000 | 20,000 | N/A | N/A | N/A | N/A | 70,000 | 20,000 |
| Unity Atoms | 0 | 0 | 0 | N/A | N/A | N/A | 0 | N/A |
| ScriptableObject channel | 0 | 0 | 0 | N/A | N/A | N/A | 0 | 0 |
| UnityEvent | 0 | 0 | 0 | N/A | N/A | N/A | 50,000 | 0 |
| C# event | 0 | 0 | 0 | N/A | N/A | N/A | 0 | 0 |
| Unity SendMessage | 10,000 | 10,000 | 10,000 | N/A | N/A | N/A | N/A | N/A |
Library comparison - GC allocated bytes per 10k ops (PlayMode (Mono))¶
| Technology | Global -> 1 subscriber | Global -> 16 subscribers | Keyed/targeted -> 1 of many | Priority-ordered dispatch | Filtered/intercepted dispatch | Post-processing dispatch | Subscribe/unsubscribe churn | Struct message (no boxing) |
|---|---|---|---|---|---|---|---|---|
| DxMessaging | 0 | 0 | 0 | 0 | 0 | 0 | 8,120,000 | 0 |
| MessagePipe | 0 | 0 | 0 | N/A | 0 | N/A | 560,000 | 0 |
| UniRx MessageBroker | 0 | 0 | N/A | N/A | N/A | N/A | 6,650,000 | 0 |
| Zenject SignalBus | 600,000 | 600,000 | N/A | N/A | N/A | N/A | 3,200,000 | 600,000 |
| Unity Atoms | 0 | 0 | 0 | N/A | N/A | N/A | 0 | N/A |
| ScriptableObject channel | 0 | 0 | 0 | N/A | N/A | N/A | 0 | 0 |
| UnityEvent | 0 | 0 | 0 | N/A | N/A | N/A | 2,960,000 | 0 |
| C# event | 0 | 0 | 0 | N/A | N/A | N/A | 0 | 0 |
| Unity SendMessage | 200,000 | 200,000 | 200,000 | N/A | N/A | N/A | N/A | N/A |
Comparison libraries¶
The cross-library comparison matrices above measure DxMessaging against other common Unity messaging and eventing approaches on the same apples-to-apples scenarios:
- External libraries: MessagePipe, UniRx MessageBroker, Zenject SignalBus, and Unity Atoms.
- Zero-dependency baselines: plain C# event, UnityEvent, a ScriptableObject event channel, and Unity
SendMessage.
Each library implements only the scenarios it idiomatically supports; unsupported cells render N/A. The comparison suite source lives in Tests/Runtime/Comparisons/. For a feature-by-feature discussion of when each approach wins, see the Comparisons guide.
Memory footprint and reclamation¶
Dispatch state is stored per message type and, for targeted and broadcast paths, per InstanceId. Long-running sessions accumulate slots for every type or entity ever touched unless something reclaims them. The memory reclamation system caps that growth without changing dispatch semantics or allocating during emit.
Reclamation runs on two paths:
- An idle sweep that runs from emit-time clock samples and the Unity PlayerLoop, gated by
DxMessagingRuntimeSettings.EvictionEnabledandEvictionTickIntervalSeconds. Empty slots become eligible only after remaining empty for at leastIdleEvictionSecondsof wall time. - An explicit
IMessageBus.Trim(force)andMessageHandler.TrimAll(force)pair that runs synchronously at scene boundaries, in tests, or in maintenance windows. The master switchEnableTrimApicontrols whether the explicit calls perform work; idle sweeps remain controlled byEvictionEnabledindependently.
Active registrations are never reclaimed. Only empty slots and shared pool entries are touched. Sweep work runs outside the hot handler loop, so emit throughput is unaffected; the per-emit overhead is one branch that samples the wall clock.
For tuning recommendations, the public Trim and diagnostic-counter API surface, and worked examples (scene transitions, leak diagnosis, mobile caps, shipped-title configurations), see the Memory Reclamation guide. For the parameter reference, see the Runtime Settings reference.