Boost.Corosio Performance Benchmarks
Executive Summary
This report presents comprehensive performance benchmarks comparing Boost.Corosio against Boost.Asio (with coroutines) on Windows using the IOCP (I/O Completion Ports) backend. The benchmarks cover handler dispatch, socket throughput, socket latency, and HTTP server workloads.
Bottom Line
Corosio significantly outperforms Asio in handler dispatch (16-61% faster) while delivering equivalent performance in socket I/O and HTTP server workloads. Asio has a slight edge in tail latency (p99).
Where Corosio Excels
-
Single-threaded handler post: 61% faster (1.36 Mops/s vs 847 Kops/s)
-
Concurrent post and run: 61% faster (2.32 Mops/s vs 1.44 Mops/s)
-
Interleaved post/run: 37% faster (2.35 Mops/s vs 1.71 Mops/s)
-
Multi-threaded handler dispatch: 16% faster at 8 threads (3.47 Mops/s vs 3.00 Mops/s)
Detailed Results
Handler Dispatch Summary
| Scenario | Corosio | Asio | Winner |
|---|---|---|---|
Single-threaded post |
1.36 Mops/s |
847 Kops/s |
Corosio (+61%) |
Multi-threaded (8 threads) |
3.47 Mops/s |
3.00 Mops/s |
Corosio (+16%) |
Interleaved post/run |
2.35 Mops/s |
1.71 Mops/s |
Corosio (+37%) |
Concurrent post/run |
2.32 Mops/s |
1.44 Mops/s |
Corosio (+61%) |
Socket Throughput Summary
| Scenario | Corosio | Asio | Winner |
|---|---|---|---|
Unidirectional 1KB buffer |
215 MB/s |
206 MB/s |
Corosio (+4%) |
Unidirectional 64KB buffer |
6.29 GB/s |
6.34 GB/s |
Tie |
Bidirectional 64KB buffer |
6.24 GB/s |
6.25 GB/s |
Tie |
Test Environment
Platform |
Windows (IOCP backend) |
Benchmarks |
Handler dispatch, socket throughput, socket latency, HTTP server |
Comparison |
Asio coroutines (co_spawn/use_awaitable) |
Measurement |
Client-side latency and throughput |
Handler Dispatch Benchmarks
These benchmarks measure raw handler posting and execution throughput, isolating the scheduler from I/O completion overhead.
Single-Threaded Handler Post
Posting 5,000,000 handlers from a single thread.
| Metric | Corosio | Asio | Difference |
|---|---|---|---|
Handlers |
5,000,000 |
5,000,000 |
— |
Elapsed |
3.687 s |
5.903 s |
-38% |
Throughput |
1.36 Mops/s |
847 Kops/s |
+61% |
Key finding: Corosio’s single-threaded handler dispatch is 61% faster than Asio.
Multi-Threaded Scaling
Multiple threads running handlers concurrently (5,000,000 handlers total).
| Threads | Corosio | Asio | Corosio Speedup | Asio Speedup |
|---|---|---|---|---|
1 |
2.95 Mops/s |
1.49 Mops/s |
(baseline) |
(baseline) |
2 |
2.84 Mops/s |
2.13 Mops/s |
0.96× |
1.43× |
4 |
3.87 Mops/s |
2.95 Mops/s |
1.31× |
1.98× |
8 |
3.47 Mops/s |
3.00 Mops/s |
1.17× |
2.01× |
Scaling Analysis
Throughput vs Thread Count:
Threads Corosio Asio Winner
1 2.95 M 1.49 M Corosio +98%
2 2.84 M 2.13 M Corosio +33%
4 3.87 M 2.95 M Corosio +31%
8 3.47 M 3.00 M Corosio +16%
Notable observations:
-
Corosio is faster at all thread counts
-
Both peak around 4 threads
-
Asio scales better (2× at 8 threads) but starts from a lower baseline
Interleaved Post/Run
Alternating between posting batches and running them (50,000 iterations × 100 handlers).
| Metric | Corosio | Asio | Difference |
|---|---|---|---|
Total handlers |
5,000,000 |
5,000,000 |
— |
Elapsed |
2.128 s |
2.921 s |
-27% |
Throughput |
2.35 Mops/s |
1.71 Mops/s |
+37% |
Key finding: Corosio is 37% faster at interleaved post/run patterns—a common pattern in real applications.
Socket Throughput Benchmarks
Unidirectional Throughput
Single direction transfer of 4096 MB with varying buffer sizes.
| Buffer Size | Corosio | Asio | Difference |
|---|---|---|---|
1024 bytes |
215.26 MB/s |
206.19 MB/s |
+4% |
4096 bytes |
736.99 MB/s |
710.17 MB/s |
+4% |
16384 bytes |
2.52 GB/s |
2.52 GB/s |
0% |
65536 bytes |
6.29 GB/s |
6.34 GB/s |
-1% |
Observation: Throughput is essentially identical. Corosio has a slight edge at smaller buffers.
Bidirectional Throughput
Simultaneous transfer of 2048 MB in each direction (4096 MB total).
| Buffer Size | Corosio | Asio | Difference |
|---|---|---|---|
1024 bytes |
211.41 MB/s |
209.36 MB/s |
+1% |
4096 bytes |
737.69 MB/s |
722.13 MB/s |
+2% |
16384 bytes |
2.43 GB/s |
2.50 GB/s |
-3% |
65536 bytes |
6.24 GB/s |
6.25 GB/s |
0% |
Observation: Bidirectional throughput is identical between implementations.
Socket Latency Benchmarks
Ping-Pong Round-Trip Latency
Single socket pair exchanging messages (1,000,000 iterations each).
| Message Size | Corosio Mean | Asio Mean | Difference | Corosio p99 | Asio p99 |
|---|---|---|---|---|---|
1 byte |
9.56 μs |
9.74 μs |
-2% |
15.40 μs |
13.60 μs |
64 bytes |
9.62 μs |
9.68 μs |
-1% |
16.70 μs |
13.90 μs |
1024 bytes |
9.71 μs |
10.03 μs |
-3% |
14.20 μs |
19.10 μs |
Latency Distribution (64-byte messages)
| Percentile | Corosio | Asio | Difference |
|---|---|---|---|
p50 |
9.00 μs |
9.20 μs |
-2% |
p90 |
9.50 μs |
9.70 μs |
-2% |
p99 |
16.70 μs |
13.90 μs |
+20% |
p99.9 |
119.20 μs |
80.60 μs |
+48% |
min |
8.10 μs |
8.20 μs |
-1% |
max |
2.58 ms |
2.67 ms |
-3% |
Observation: Mean latency is essentially identical (Corosio slightly faster). Asio has better tail latency (p99, p99.9).
Concurrent Socket Pairs
Multiple socket pairs operating concurrently (64-byte messages).
| Pairs | Iterations | Corosio Mean | Asio Mean | Corosio p99 | Asio p99 |
|---|---|---|---|---|---|
1 |
1,000,000 |
9.57 μs |
9.89 μs |
16.60 μs |
17.50 μs |
4 |
500,000 |
40.03 μs |
39.79 μs |
84.40 μs |
73.85 μs |
16 |
250,000 |
162.44 μs |
165.59 μs |
354.57 μs |
369.66 μs |
Observation: Both implementations scale similarly. Mean latencies are nearly identical.
HTTP Server Benchmarks
Single Connection (Sequential Requests)
| Metric | Corosio | Asio | Difference |
|---|---|---|---|
Requests |
1,000,000 |
1,000,000 |
— |
Elapsed |
10.615 s |
10.935 s |
-3% |
Throughput |
94.21 Kops/s |
91.45 Kops/s |
+3% |
Mean latency |
10.59 μs |
10.90 μs |
-3% |
p99 latency |
19.50 μs |
23.00 μs |
-15% |
Observation: Single-connection HTTP performance is comparable with Corosio having a slight edge.
Concurrent Connections (Single Thread)
| Connections | Corosio Throughput | Asio Throughput | Corosio Mean | Asio Mean | Gap |
|---|---|---|---|---|---|
1 |
91.33 Kops/s |
92.29 Kops/s |
10.92 μs |
10.80 μs |
-1% |
4 |
91.88 Kops/s |
92.12 Kops/s |
43.50 μs |
43.39 μs |
0% |
16 |
90.39 Kops/s |
89.94 Kops/s |
176.98 μs |
177.87 μs |
0% |
32 |
87.96 Kops/s |
90.61 Kops/s |
363.77 μs |
353.12 μs |
-3% |
Observation: Single-threaded concurrent connection performance is essentially identical.
Multi-Threaded HTTP (32 Connections)
| Threads | Corosio Throughput | Asio Throughput | Gap | Scaling Factor |
|---|---|---|---|---|
1 |
89.02 Kops/s |
89.25 Kops/s |
0% |
(baseline) |
2 |
124.65 Kops/s |
124.91 Kops/s |
0% |
1.40× / 1.40× |
4 |
200.29 Kops/s |
210.46 Kops/s |
-5% |
2.25× / 2.36× |
8 |
342.00 Kops/s |
334.71 Kops/s |
+2% |
3.84× / 3.75× |
16 |
430.51 Kops/s |
434.07 Kops/s |
-1% |
4.84× / 4.86× |
Multi-Threaded Latency
| Threads | Corosio Mean | Asio Mean | Corosio p99 | Asio p99 |
|---|---|---|---|---|
1 |
359.41 μs |
358.52 μs |
720.81 μs |
742.29 μs |
2 |
256.63 μs |
256.10 μs |
416.91 μs |
439.69 μs |
4 |
159.66 μs |
151.93 μs |
279.01 μs |
205.49 μs |
8 |
93.35 μs |
95.35 μs |
117.70 μs |
121.33 μs |
16 |
73.64 μs |
73.13 μs |
90.10 μs |
88.80 μs |
Key finding: Both implementations show excellent scaling to 16 threads with nearly identical throughput and latency.
Analysis
Performance Characteristics
Handler Dispatch
Corosio has a clear advantage in handler dispatch:
| Scenario | Corosio Advantage | Notes |
|---|---|---|
Single-threaded |
+61% |
Significantly faster |
8 threads |
+16% |
Maintains advantage at scale |
Interleaved |
+37% |
Common real-world pattern |
Concurrent |
+61% |
Multi-producer scenario |
Conclusions
Summary
Corosio delivers equivalent or better performance compared to Asio coroutines:
-
Handler dispatch: Corosio is 16-61% faster
-
Socket I/O: Identical throughput, identical mean latency
-
HTTP server: Equivalent throughput and latency
-
Tail latency: Asio has ~17% better p99
Appendix: Raw Data
Corosio Results
Backend: iocp
=== Single-threaded Handler Post ===
Handlers: 5000000
Elapsed: 3.687 s
Throughput: 1.36 Mops/s
=== Multi-threaded Scaling ===
Handlers per test: 5000000
1 thread(s): 2.95 Mops/s
2 thread(s): 2.84 Mops/s (speedup: 0.96x)
4 thread(s): 3.87 Mops/s (speedup: 1.31x)
8 thread(s): 3.47 Mops/s (speedup: 1.17x)
=== Interleaved Post/Run ===
Iterations: 50000
Handlers/iter: 100
Total handlers: 5000000
Elapsed: 2.128 s
Throughput: 2.35 Mops/s
=== Concurrent Post and Run ===
Threads: 4
Handlers/thread: 1250000
Total handlers: 5000000
Elapsed: 2.159 s
Throughput: 2.32 Mops/s
=== Unidirectional Throughput ===
Buffer size: 1024 bytes, Transfer: 4096 MB
Throughput: 215.26 MB/s
Buffer size: 4096 bytes, Transfer: 4096 MB
Throughput: 736.99 MB/s
Buffer size: 16384 bytes, Transfer: 4096 MB
Throughput: 2.52 GB/s
Buffer size: 65536 bytes, Transfer: 4096 MB
Throughput: 6.29 GB/s
=== Bidirectional Throughput ===
Buffer size: 1024 bytes: 211.41 MB/s (combined)
Buffer size: 4096 bytes: 737.69 MB/s (combined)
Buffer size: 16384 bytes: 2.43 GB/s (combined)
Buffer size: 65536 bytes: 6.24 GB/s (combined)
=== Ping-Pong Round-Trip Latency ===
1 byte: mean=9.56 us, p50=8.90 us, p99=15.40 us
64 bytes: mean=9.62 us, p50=9.00 us, p99=16.70 us
1024 bytes: mean=9.71 us, p50=9.10 us, p99=14.20 us
=== Concurrent Socket Pairs Latency ===
1 pair: mean=9.57 us, p99=16.60 us
4 pairs: mean=40.03 us, p99=84.40 us
16 pairs: mean=162.44 us, p99=354.57 us
=== HTTP Single Connection ===
Throughput: 94.21 Kops/s
Latency: mean=10.59 us, p99=19.50 us
=== HTTP Concurrent Connections (single thread) ===
1 conn: 91.33 Kops/s, mean=10.92 us, p99=25.70 us
4 conns: 91.88 Kops/s, mean=43.50 us, p99=97.05 us
16 conns: 90.39 Kops/s, mean=176.98 us, p99=377.09 us
32 conns: 87.96 Kops/s, mean=363.77 us, p99=858.13 us
=== HTTP Multi-threaded (32 connections) ===
1 thread: 89.02 Kops/s, mean=359.41 us, p99=720.81 us
2 threads: 124.65 Kops/s, mean=256.63 us, p99=416.91 us
4 threads: 200.29 Kops/s, mean=159.66 us, p99=279.01 us
8 threads: 342.00 Kops/s, mean=93.35 us, p99=117.70 us
16 threads: 430.51 Kops/s, mean=73.64 us, p99=90.10 us
Asio Results
=== Single-threaded Handler Post (Asio) ===
Handlers: 5000000
Elapsed: 5.903 s
Throughput: 847.04 Kops/s
=== Multi-threaded Scaling (Asio Coroutines) ===
Handlers per test: 5000000
1 thread(s): 1.49 Mops/s
2 thread(s): 2.13 Mops/s (speedup: 1.43x)
4 thread(s): 2.95 Mops/s (speedup: 1.98x)
8 thread(s): 3.00 Mops/s (speedup: 2.01x)
=== Interleaved Post/Run (Asio Coroutines) ===
Iterations: 50000
Handlers/iter: 100
Total handlers: 5000000
Elapsed: 2.921 s
Throughput: 1.71 Mops/s
=== Concurrent Post and Run (Asio Coroutines) ===
Threads: 4
Handlers/thread: 1250000
Total handlers: 5000000
Elapsed: 3.475 s
Throughput: 1.44 Mops/s
=== Unidirectional Throughput (Asio) ===
Buffer size: 1024 bytes: 206.19 MB/s
Buffer size: 4096 bytes: 710.17 MB/s
Buffer size: 16384 bytes: 2.52 GB/s
Buffer size: 65536 bytes: 6.34 GB/s
=== Bidirectional Throughput (Asio) ===
Buffer size: 1024 bytes: 209.36 MB/s (combined)
Buffer size: 4096 bytes: 722.13 MB/s (combined)
Buffer size: 16384 bytes: 2.50 GB/s (combined)
Buffer size: 65536 bytes: 6.25 GB/s (combined)
=== Ping-Pong Round-Trip Latency (Asio) ===
1 byte: mean=9.74 us, p50=9.20 us, p99=13.60 us
64 bytes: mean=9.68 us, p50=9.20 us, p99=13.90 us
1024 bytes: mean=10.03 us, p50=9.50 us, p99=19.10 us
=== Concurrent Socket Pairs Latency (Asio) ===
1 pair: mean=9.89 us, p99=17.50 us
4 pairs: mean=39.79 us, p99=73.85 us
16 pairs: mean=165.59 us, p99=369.66 us
=== HTTP Single Connection ===
Throughput: 91.45 Kops/s
Latency: mean=10.90 us, p99=23.00 us
=== HTTP Multi-threaded (32 connections) ===
1 thread: 89.25 Kops/s, mean=358.52 us, p99=742.29 us
2 threads: 124.91 Kops/s, mean=256.10 us, p99=439.69 us
4 threads: 210.46 Kops/s, mean=151.93 us, p99=205.49 us
8 threads: 334.71 Kops/s, mean=95.35 us, p99=121.33 us
16 threads: 434.07 Kops/s, mean=73.13 us, p99=88.80 us