To recap, usched is an experimental (and very simple, 120LOC) coroutine implementation different from stackful and stackless models: coroutines are executed on the native stack of the caller and when the coroutine is about to block its stack is copied into a separately allocated (e.g., in the heap) buffer. The buffer is copied back onto the native stack when the coroutine is ready to resume.
I added a new scheduler ll.c that distributes coroutines across multiple native threads and then does lockless scheduling within each thread. In the benchmark (the same as in the previous post), each coroutine in the communicating cycle belongs to the same thread.
Results are amazing: usched actually beats compiler-assisted C++ coroutines by a large margin. The horizontal axis is the number of coroutines in the test (logarithmic) and the vertical axis is coroutine wakeup-wait operations per second (1 == 1e8 op/sec).
16 | 32 | 64 | 400 | 800 | 4000 | 8000 | 40000 | 80000 | 400000 | 800000 | 4M | 8M | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GO | 0.077 | 0.127 | 0.199 | 0.326 | 0.323 | 0.285 | 0.228 | 0.142 | 0.199 | 0.305 | 0.303 | 0.286 | 0.268 |
C++ | 1.089 | 1.234 | 1.344 | 1.262 | 1.201 | 1.159 | 1.141 | 1.135 | 1.163 | 1.168 | 1.138 | 1.076 | 1.051 |
UL | 0.560 | 0.955 | 1.515 | 2.047 | 2.095 | 2.127 | 2.148 | 2.160 | 2.154 | 2.020 | 1.932 | 1.819 | 1.811 |
I only kept the most efficient implementation from every competing class: C++ for stackless, GO for stackful and usched for stackswap. See the full results in results.darwin
No comments:
Post a Comment