Boost.Corosio Performance Benchmarks

Executive Summary

This report presents comprehensive performance benchmarks comparing Boost.Corosio against Boost.Asio (with coroutines) on Windows using the IOCP (I/O Completion Ports) backend. The benchmarks cover handler dispatch, socket throughput, socket latency, and HTTP server workloads.

Bottom Line

Corosio significantly outperforms Asio in handler dispatch (16-61% faster) while delivering equivalent performance in socket I/O and HTTP server workloads. Asio has a slight edge in tail latency (p99).

Where Corosio Excels

  • Single-threaded handler post: 61% faster (1.36 Mops/s vs 847 Kops/s)

  • Concurrent post and run: 61% faster (2.32 Mops/s vs 1.44 Mops/s)

  • Interleaved post/run: 37% faster (2.35 Mops/s vs 1.71 Mops/s)

  • Multi-threaded handler dispatch: 16% faster at 8 threads (3.47 Mops/s vs 3.00 Mops/s)

Where Asio Has an Edge

  • Tail latency (p99): 17% better ping-pong p99 (13.90 μs vs 16.70 μs)

Where They’re Equal

  • Socket throughput: Essentially identical (6.29 GB/s vs 6.34 GB/s at 64KB)

  • Socket latency (mean): Identical (9.62 μs vs 9.68 μs)

  • HTTP server throughput: Comparable (±2% at all thread counts)

Key Insights

Component Assessment

Handler Dispatch

Corosio 16-61% faster across all patterns

Socket Throughput

Equivalent performance

Socket Latency

Equivalent mean, Asio better p99

HTTP Server

Equivalent performance


Detailed Results

Handler Dispatch Summary

Scenario Corosio Asio Winner

Single-threaded post

1.36 Mops/s

847 Kops/s

Corosio (+61%)

Multi-threaded (8 threads)

3.47 Mops/s

3.00 Mops/s

Corosio (+16%)

Interleaved post/run

2.35 Mops/s

1.71 Mops/s

Corosio (+37%)

Concurrent post/run

2.32 Mops/s

1.44 Mops/s

Corosio (+61%)

Socket Throughput Summary

Scenario Corosio Asio Winner

Unidirectional 1KB buffer

215 MB/s

206 MB/s

Corosio (+4%)

Unidirectional 64KB buffer

6.29 GB/s

6.34 GB/s

Tie

Bidirectional 64KB buffer

6.24 GB/s

6.25 GB/s

Tie

Socket Latency Summary

Scenario Corosio Asio Winner

Ping-pong mean (64B)

9.62 μs

9.68 μs

Tie

Ping-pong p99 (64B)

16.70 μs

13.90 μs

Asio (-17%)

16 concurrent pairs

162.44 μs

165.59 μs

Tie

HTTP Server Summary

Scenario Corosio Asio Winner

Single connection

94.21 Kops/s

91.45 Kops/s

Corosio (+3%)

32 connections, 8 threads

342.00 Kops/s

334.71 Kops/s

Corosio (+2%)

32 connections, 16 threads

430.51 Kops/s

434.07 Kops/s

Tie

Test Environment

Platform

Windows (IOCP backend)

Benchmarks

Handler dispatch, socket throughput, socket latency, HTTP server

Comparison

Asio coroutines (co_spawn/use_awaitable)

Measurement

Client-side latency and throughput

Handler Dispatch Benchmarks

These benchmarks measure raw handler posting and execution throughput, isolating the scheduler from I/O completion overhead.

Single-Threaded Handler Post

Posting 5,000,000 handlers from a single thread.

Metric Corosio Asio Difference

Handlers

5,000,000

5,000,000

Elapsed

3.687 s

5.903 s

-38%

Throughput

1.36 Mops/s

847 Kops/s

+61%

Key finding: Corosio’s single-threaded handler dispatch is 61% faster than Asio.

Multi-Threaded Scaling

Multiple threads running handlers concurrently (5,000,000 handlers total).

Threads Corosio Asio Corosio Speedup Asio Speedup

1

2.95 Mops/s

1.49 Mops/s

(baseline)

(baseline)

2

2.84 Mops/s

2.13 Mops/s

0.96×

1.43×

4

3.87 Mops/s

2.95 Mops/s

1.31×

1.98×

8

3.47 Mops/s

3.00 Mops/s

1.17×

2.01×

Scaling Analysis

Throughput vs Thread Count:

Threads    Corosio    Asio       Winner
   1       2.95 M     1.49 M     Corosio +98%
   2       2.84 M     2.13 M     Corosio +33%
   4       3.87 M     2.95 M     Corosio +31%
   8       3.47 M     3.00 M     Corosio +16%

Notable observations:

  • Corosio is faster at all thread counts

  • Both peak around 4 threads

  • Asio scales better (2× at 8 threads) but starts from a lower baseline

Interleaved Post/Run

Alternating between posting batches and running them (50,000 iterations × 100 handlers).

Metric Corosio Asio Difference

Total handlers

5,000,000

5,000,000

Elapsed

2.128 s

2.921 s

-27%

Throughput

2.35 Mops/s

1.71 Mops/s

+37%

Key finding: Corosio is 37% faster at interleaved post/run patterns—a common pattern in real applications.

Concurrent Post and Run

Four threads simultaneously posting and running handlers.

Metric Corosio Asio Difference

Threads

4

4

Total handlers

5,000,000

5,000,000

Elapsed

2.159 s

3.475 s

-38%

Throughput

2.32 Mops/s

1.44 Mops/s

+61%

Socket Throughput Benchmarks

Unidirectional Throughput

Single direction transfer of 4096 MB with varying buffer sizes.

Buffer Size Corosio Asio Difference

1024 bytes

215.26 MB/s

206.19 MB/s

+4%

4096 bytes

736.99 MB/s

710.17 MB/s

+4%

16384 bytes

2.52 GB/s

2.52 GB/s

0%

65536 bytes

6.29 GB/s

6.34 GB/s

-1%

Observation: Throughput is essentially identical. Corosio has a slight edge at smaller buffers.

Bidirectional Throughput

Simultaneous transfer of 2048 MB in each direction (4096 MB total).

Buffer Size Corosio Asio Difference

1024 bytes

211.41 MB/s

209.36 MB/s

+1%

4096 bytes

737.69 MB/s

722.13 MB/s

+2%

16384 bytes

2.43 GB/s

2.50 GB/s

-3%

65536 bytes

6.24 GB/s

6.25 GB/s

0%

Observation: Bidirectional throughput is identical between implementations.

Socket Latency Benchmarks

Ping-Pong Round-Trip Latency

Single socket pair exchanging messages (1,000,000 iterations each).

Message Size Corosio Mean Asio Mean Difference Corosio p99 Asio p99

1 byte

9.56 μs

9.74 μs

-2%

15.40 μs

13.60 μs

64 bytes

9.62 μs

9.68 μs

-1%

16.70 μs

13.90 μs

1024 bytes

9.71 μs

10.03 μs

-3%

14.20 μs

19.10 μs

Latency Distribution (64-byte messages)

Percentile Corosio Asio Difference

p50

9.00 μs

9.20 μs

-2%

p90

9.50 μs

9.70 μs

-2%

p99

16.70 μs

13.90 μs

+20%

p99.9

119.20 μs

80.60 μs

+48%

min

8.10 μs

8.20 μs

-1%

max

2.58 ms

2.67 ms

-3%

Observation: Mean latency is essentially identical (Corosio slightly faster). Asio has better tail latency (p99, p99.9).

Concurrent Socket Pairs

Multiple socket pairs operating concurrently (64-byte messages).

Pairs Iterations Corosio Mean Asio Mean Corosio p99 Asio p99

1

1,000,000

9.57 μs

9.89 μs

16.60 μs

17.50 μs

4

500,000

40.03 μs

39.79 μs

84.40 μs

73.85 μs

16

250,000

162.44 μs

165.59 μs

354.57 μs

369.66 μs

Observation: Both implementations scale similarly. Mean latencies are nearly identical.

HTTP Server Benchmarks

Single Connection (Sequential Requests)

Metric Corosio Asio Difference

Requests

1,000,000

1,000,000

Elapsed

10.615 s

10.935 s

-3%

Throughput

94.21 Kops/s

91.45 Kops/s

+3%

Mean latency

10.59 μs

10.90 μs

-3%

p99 latency

19.50 μs

23.00 μs

-15%

Observation: Single-connection HTTP performance is comparable with Corosio having a slight edge.

Concurrent Connections (Single Thread)

Connections Corosio Throughput Asio Throughput Corosio Mean Asio Mean Gap

1

91.33 Kops/s

92.29 Kops/s

10.92 μs

10.80 μs

-1%

4

91.88 Kops/s

92.12 Kops/s

43.50 μs

43.39 μs

0%

16

90.39 Kops/s

89.94 Kops/s

176.98 μs

177.87 μs

0%

32

87.96 Kops/s

90.61 Kops/s

363.77 μs

353.12 μs

-3%

Observation: Single-threaded concurrent connection performance is essentially identical.

Multi-Threaded HTTP (32 Connections)

Threads Corosio Throughput Asio Throughput Gap Scaling Factor

1

89.02 Kops/s

89.25 Kops/s

0%

(baseline)

2

124.65 Kops/s

124.91 Kops/s

0%

1.40× / 1.40×

4

200.29 Kops/s

210.46 Kops/s

-5%

2.25× / 2.36×

8

342.00 Kops/s

334.71 Kops/s

+2%

3.84× / 3.75×

16

430.51 Kops/s

434.07 Kops/s

-1%

4.84× / 4.86×

Multi-Threaded Latency

Threads Corosio Mean Asio Mean Corosio p99 Asio p99

1

359.41 μs

358.52 μs

720.81 μs

742.29 μs

2

256.63 μs

256.10 μs

416.91 μs

439.69 μs

4

159.66 μs

151.93 μs

279.01 μs

205.49 μs

8

93.35 μs

95.35 μs

117.70 μs

121.33 μs

16

73.64 μs

73.13 μs

90.10 μs

88.80 μs

Key finding: Both implementations show excellent scaling to 16 threads with nearly identical throughput and latency.

Analysis

Performance Characteristics

Handler Dispatch

Corosio has a clear advantage in handler dispatch:

Scenario Corosio Advantage Notes

Single-threaded

+61%

Significantly faster

8 threads

+16%

Maintains advantage at scale

Interleaved

+37%

Common real-world pattern

Concurrent

+61%

Multi-producer scenario

Socket I/O

Socket throughput and latency are essentially identical:

Metric Comparison Notes

Throughput (64KB)

Identical

6.29 vs 6.34 GB/s

Latency (mean)

Identical

9.62 vs 9.68 μs

Latency (p99)

Asio +17% better

13.90 vs 16.70 μs

Latency (p99.9)

Asio +48% better

80.60 vs 119.20 μs

HTTP Server

HTTP performance is nearly identical:

Multi-threaded HTTP Throughput:

Threads    Corosio      Asio        Winner
   1       89.0 K       89.3 K      Tie
   2       124.7 K      124.9 K     Tie
   4       200.3 K      210.5 K     Asio +5%
   8       342.0 K      334.7 K     Corosio +2%
  16       430.5 K      434.1 K     Tie

Summary

Component Assessment

Handler Dispatch

Corosio 16-61% faster

Socket Throughput

Equivalent

Socket Latency (mean)

Equivalent

Socket Latency (tail)

Asio 17-48% better p99/p99.9

HTTP Throughput

Equivalent

HTTP Latency

Equivalent

Conclusions

Summary

Corosio delivers equivalent or better performance compared to Asio coroutines:

  • Handler dispatch: Corosio is 16-61% faster

  • Socket I/O: Identical throughput, identical mean latency

  • HTTP server: Equivalent throughput and latency

  • Tail latency: Asio has ~17% better p99

Recommendations

Workload Recommendation

Handler-intensive workloads

Corosio is 16-61% faster

Socket I/O

Both equivalent

HTTP servers

Both equivalent

Low tail latency requirements

Asio has slightly better p99

Key Takeaway

For coroutine-based async programming on Windows (IOCP), Corosio provides equivalent socket I/O performance while delivering significantly faster handler dispatch. The choice between the two may come down to API preference and ecosystem considerations rather than raw performance.

Appendix: Raw Data

Corosio Results

Backend: iocp

=== Single-threaded Handler Post ===
  Handlers:    5000000
  Elapsed:     3.687 s
  Throughput:  1.36 Mops/s

=== Multi-threaded Scaling ===
  Handlers per test: 5000000

  1 thread(s): 2.95 Mops/s
  2 thread(s): 2.84 Mops/s (speedup: 0.96x)
  4 thread(s): 3.87 Mops/s (speedup: 1.31x)
  8 thread(s): 3.47 Mops/s (speedup: 1.17x)

=== Interleaved Post/Run ===
  Iterations:        50000
  Handlers/iter:     100
  Total handlers:    5000000
  Elapsed:           2.128 s
  Throughput:        2.35 Mops/s

=== Concurrent Post and Run ===
  Threads:           4
  Handlers/thread:   1250000
  Total handlers:    5000000
  Elapsed:           2.159 s
  Throughput:        2.32 Mops/s

=== Unidirectional Throughput ===
  Buffer size: 1024 bytes, Transfer: 4096 MB
    Throughput: 215.26 MB/s

  Buffer size: 4096 bytes, Transfer: 4096 MB
    Throughput: 736.99 MB/s

  Buffer size: 16384 bytes, Transfer: 4096 MB
    Throughput: 2.52 GB/s

  Buffer size: 65536 bytes, Transfer: 4096 MB
    Throughput: 6.29 GB/s

=== Bidirectional Throughput ===
  Buffer size: 1024 bytes: 211.41 MB/s (combined)
  Buffer size: 4096 bytes: 737.69 MB/s (combined)
  Buffer size: 16384 bytes: 2.43 GB/s (combined)
  Buffer size: 65536 bytes: 6.24 GB/s (combined)

=== Ping-Pong Round-Trip Latency ===
  1 byte:    mean=9.56 us, p50=8.90 us, p99=15.40 us
  64 bytes:  mean=9.62 us, p50=9.00 us, p99=16.70 us
  1024 bytes: mean=9.71 us, p50=9.10 us, p99=14.20 us

=== Concurrent Socket Pairs Latency ===
  1 pair:   mean=9.57 us, p99=16.60 us
  4 pairs:  mean=40.03 us, p99=84.40 us
  16 pairs: mean=162.44 us, p99=354.57 us

=== HTTP Single Connection ===
  Throughput: 94.21 Kops/s
  Latency: mean=10.59 us, p99=19.50 us

=== HTTP Concurrent Connections (single thread) ===
  1 conn:   91.33 Kops/s, mean=10.92 us, p99=25.70 us
  4 conns:  91.88 Kops/s, mean=43.50 us, p99=97.05 us
  16 conns: 90.39 Kops/s, mean=176.98 us, p99=377.09 us
  32 conns: 87.96 Kops/s, mean=363.77 us, p99=858.13 us

=== HTTP Multi-threaded (32 connections) ===
  1 thread:  89.02 Kops/s, mean=359.41 us, p99=720.81 us
  2 threads: 124.65 Kops/s, mean=256.63 us, p99=416.91 us
  4 threads: 200.29 Kops/s, mean=159.66 us, p99=279.01 us
  8 threads: 342.00 Kops/s, mean=93.35 us, p99=117.70 us
  16 threads: 430.51 Kops/s, mean=73.64 us, p99=90.10 us

Asio Results

=== Single-threaded Handler Post (Asio) ===
  Handlers:    5000000
  Elapsed:     5.903 s
  Throughput:  847.04 Kops/s

=== Multi-threaded Scaling (Asio Coroutines) ===
  Handlers per test: 5000000

  1 thread(s): 1.49 Mops/s
  2 thread(s): 2.13 Mops/s (speedup: 1.43x)
  4 thread(s): 2.95 Mops/s (speedup: 1.98x)
  8 thread(s): 3.00 Mops/s (speedup: 2.01x)

=== Interleaved Post/Run (Asio Coroutines) ===
  Iterations:        50000
  Handlers/iter:     100
  Total handlers:    5000000
  Elapsed:           2.921 s
  Throughput:        1.71 Mops/s

=== Concurrent Post and Run (Asio Coroutines) ===
  Threads:           4
  Handlers/thread:   1250000
  Total handlers:    5000000
  Elapsed:           3.475 s
  Throughput:        1.44 Mops/s

=== Unidirectional Throughput (Asio) ===
  Buffer size: 1024 bytes: 206.19 MB/s
  Buffer size: 4096 bytes: 710.17 MB/s
  Buffer size: 16384 bytes: 2.52 GB/s
  Buffer size: 65536 bytes: 6.34 GB/s

=== Bidirectional Throughput (Asio) ===
  Buffer size: 1024 bytes: 209.36 MB/s (combined)
  Buffer size: 4096 bytes: 722.13 MB/s (combined)
  Buffer size: 16384 bytes: 2.50 GB/s (combined)
  Buffer size: 65536 bytes: 6.25 GB/s (combined)

=== Ping-Pong Round-Trip Latency (Asio) ===
  1 byte:    mean=9.74 us, p50=9.20 us, p99=13.60 us
  64 bytes:  mean=9.68 us, p50=9.20 us, p99=13.90 us
  1024 bytes: mean=10.03 us, p50=9.50 us, p99=19.10 us

=== Concurrent Socket Pairs Latency (Asio) ===
  1 pair:   mean=9.89 us, p99=17.50 us
  4 pairs:  mean=39.79 us, p99=73.85 us
  16 pairs: mean=165.59 us, p99=369.66 us

=== HTTP Single Connection ===
  Throughput: 91.45 Kops/s
  Latency: mean=10.90 us, p99=23.00 us

=== HTTP Multi-threaded (32 connections) ===
  1 thread:  89.25 Kops/s, mean=358.52 us, p99=742.29 us
  2 threads: 124.91 Kops/s, mean=256.10 us, p99=439.69 us
  4 threads: 210.46 Kops/s, mean=151.93 us, p99=205.49 us
  8 threads: 334.71 Kops/s, mean=95.35 us, p99=121.33 us
  16 threads: 434.07 Kops/s, mean=73.13 us, p99=88.80 us