Executor Concept Design
Overview
This document describes the design of the Executor concept: the
interface through which coroutines are scheduled for execution. It
explains the relationship to Asio’s executor model, why dispatch
returns void, why defer was dropped, how executor_ref
achieves zero-allocation type erasure, and the I/O completion
pattern that motivates the entire design.
The Executor concept exists to answer one question: when a
coroutine is ready to run, where does it run? The concept
captures the rules for scheduling coroutine resumption, tracking
outstanding work for graceful shutdown, and accessing the
execution context that owns the executor. Every I/O awaitable in
Corosio — sockets, acceptors, timers, resolvers — depends on
this concept to dispatch completions back to the correct executor.
Definition
template<class E>
concept Executor =
std::is_nothrow_copy_constructible_v<E> &&
std::is_nothrow_move_constructible_v<E> &&
requires(E& e, E const& ce, E const& ce2,
std::coroutine_handle<> h)
{
{ ce == ce2 } noexcept -> std::convertible_to<bool>;
{ ce.context() } noexcept;
requires std::is_lvalue_reference_v<
decltype(ce.context())> &&
std::derived_from<
std::remove_reference_t<
decltype(ce.context())>,
execution_context>;
{ ce.on_work_started() } noexcept;
{ ce.on_work_finished() } noexcept;
{ ce.dispatch(h) };
{ ce.post(h) };
};
An Executor provides exactly two operations on a coroutine
handle:
dispatch(h) — Execute If Safe
If the executor determines it is safe (e.g., the current thread
is already associated with the executor’s context), resumes the
coroutine inline via h.resume(). Otherwise, posts the coroutine
for later execution. Returns void.
post(h) — Always Queue
Queues the coroutine for later execution without ever executing it inline. Never blocks. Use when guaranteed asynchrony is required.
The remaining operations support context access, lifecycle management, and identity:
context() — Access the Execution Context
Returns an lvalue reference to the execution_context that
created this executor. The context provides service
infrastructure, frame allocators, and shutdown coordination.
Relationship to Asio
Kohlhoff’s Asio library established the executor-as-policy model that Capy inherits. As described in P0113R0:
An executor is to function execution as an allocator is to allocation.
Capy retains the core elements of this model:
-
Work tracking.
on_work_started/on_work_finishedfor graceful shutdown. -
dispatch/postduality. Execute-if-safe versus always-queue. -
execution_contextbase class. Service infrastructure and context lifetime management. -
Equality comparison. Same-executor optimization.
Capy removes or changes:
-
defer. Dropped entirely. See Why Two Operations, Not Three. -
Function object submission. Asio executors accept arbitrary callables. Capy executors accept only
std::coroutine_handle<>. This removes the need for allocator-aware function erasure and enables a simpler, cheaper type-erased wrapper (executor_ref). -
dispatchreturn type. Asio’sdispatchreturns void for the same reason Capy’s does, but Capy also considered and rejected acoroutine_handle<>return for symmetric transfer. See WhydispatchReturnsvoid.
The result is a concept that preserves Asio’s proven execution model while removing the machinery that a coroutine-native library does not need.
Why dispatch Returns void
An earlier design had dispatch return
std::coroutine_handle<> so that callers could use it for
symmetric transfer from await_suspend. This was rejected
because it violates a fundamental constraint of the I/O layer.
The Problem: Synchronous Completion During await_suspend
When an I/O awaitable initiates an operation inside
await_suspend, the I/O might complete immediately. If it does,
the completion path would call dispatch(h) while the caller’s
await_suspend is still on the call stack. If dispatch
resumed the coroutine inline via h.resume(), the coroutine
would execute while await_suspend has not yet returned — resuming a coroutine from inside await_suspend before the
suspension machinery completes risks undefined behavior.
The C++ standard describes the sequencing in [expr.await/5.1]:
If the result of await-ready is false, the coroutine is considered suspended. Then, await-suspend is evaluated.
Although the standard considers the coroutine suspended before
await_suspend is called, resuming it from within
await_suspend creates a nested resumption on the same call
stack. The resumed coroutine runs, potentially suspends again or
completes, and then control returns into the middle of
await_suspend. If the coroutine was destroyed during
resumption, await_suspend returns into a destroyed frame.
Why I/O Awaitables Return void or std::noop_coroutine()
To avoid this, all I/O awaitables return void or
std::noop_coroutine() from await_suspend. Both forms
guarantee that the caller is fully suspended and the call stack
has unwound before any completion handler can resume the
coroutine. The I/O operation is initiated during await_suspend,
but the completion is dispatched later — from the event loop,
after await_suspend has returned.
P0913R1
introduced the coroutine_handle<Z> return type for symmetric
transfer, which is the correct mechanism for coroutine-to-coroutine
control transfer (as used by task<T> internally). But I/O
awaitables cannot use it because the I/O completion is
asynchronous relative to await_suspend — it comes from the
reactor or proactor, not from the awaitable itself.
Consequence for dispatch
Since the primary consumer of dispatch is I/O completion — called after the coroutine is suspended, from the event loop — dispatch does not need to participate in symmetric transfer.
It calls h.resume() inline when safe and returns void. A
conforming implementation looks like:
void dispatch(std::coroutine_handle<> h) const
{
if(ctx_.running_in_this_thread())
h.resume();
else
post(h);
}
After dispatch returns, the state of h is unspecified. The
coroutine may have completed, been destroyed, or suspended at a
different point. Callers must not use h after calling
dispatch.
Why Two Operations, Not Three
Asio provides three submission methods: dispatch, post, and
defer. Capy provides only dispatch and post.
What defer Does
P0113R0
describes defer:
A defer operation is similar to a post operation… However, a defer operation also implies a relationship between the caller and the function object being submitted. It is intended for use when submitting a function object that represents a continuation of the caller.
The optimization this enables is thread-local queuing. When the
caller is already executing within the executor’s context,
defer saves the continuation to a thread-local queue instead
of the shared work queue. From P0113R0:
If the caller is executing within the thread pool, saves the function object to a thread-local queue. Once control returns to the thread pool, the function object is scheduled for execution as soon as possible.
Why Coroutines Make defer Redundant
In a callback-based library, when an asynchronous operation
completes, the completion handler must be submitted to the
executor as a function object. If the handler is the caller’s
continuation, defer tells the executor "this is my next step;
optimize accordingly."
In a coroutine-native library, this optimization is provided by the language itself. P0913R1 introduced symmetric transfer specifically to eliminate the need for queues and schedulers in coroutine-to-coroutine control transfer:
Currently Coroutines TS only supports asymmetric control transfer where suspend always returns control back to the current coroutine caller or resumer. In order to emulate symmetric coroutine to coroutine control transfer, one needs to build a queue and a scheduler.
When task<T>::await_suspend returns the parent’s coroutine
handle, the compiler performs a tail-call-like transfer directly
to the parent. No queue, no executor submission, no defer.
The optimization that defer provides through a runtime hint,
symmetric transfer provides through a compile-time guarantee.
Corosio confirms this in practice: its entire I/O layer — sockets, acceptors, timers, resolvers, signals — across all
three backends (epoll, IOCP, select) uses only dispatch and
post. No code path requires defer.
Why std::coroutine_handle<>, Not Typed Handles
The executor accepts std::coroutine_handle<> — the type-erased
handle — rather than std::coroutine_handle<P> for a specific
promise type P.
This decision has three consequences:
-
Type erasure is possible.
executor_refwraps any executor behind a uniform interface. Ifdispatchandpostwere templated on the promise type, the vtable would need to be generic over all promise types, making type erasure impractical. -
Executor implementations are independent of coroutine internals. An executor schedules resumption. It does not need to know what the coroutine’s promise type is, what value it produces, or how it handles exceptions. The type-erased handle provides exactly the right interface:
resume()and nothing else. -
I/O operation structures stay simple. Every pending I/O operation in Corosio stores two fields:
std::coroutine_handle<> h(a typedef forstd::coroutine_handle<>) andcapy::executor_ref ex. Both are type-erased. The operation structure does not need to be templated on the coroutine’s promise type, which keeps the I/O backend code non-generic and out of headers.
Why Nothrow Copy and Move
The concept requires std::is_nothrow_copy_constructible_v<E>
and std::is_nothrow_move_constructible_v<E>.
Executors propagate through coroutine machinery at points where
exceptions cannot be handled: inside await_suspend, during
promise construction, and through type-erased wrappers like
executor_ref. An exception thrown from an executor copy at any
of these points would leave the coroutine in an unrecoverable
state — suspended but with no executor to resume it through.
The nothrow requirement eliminates this failure mode entirely. In practice, executors are lightweight handles — a pointer to the execution context and perhaps a strand pointer or a priority value. Nothrow copy and move are natural for such types. The requirement does not impose a burden; it documents what is already true of every reasonable executor implementation.
Work Tracking, Shutdown, and Executor Validity
The on_work_started and on_work_finished operations serve
three roles.
Event Loop Lifetime
Work tracking is the mechanism by which the event loop knows
when to stop. When outstanding work reaches zero, run()
returns. This is not bookkeeping — it is the event loop’s
termination signal.
In Corosio, on_work_finished triggers stop() when the
atomic work count reaches zero:
void on_work_finished() noexcept
{
if(outstanding_work_.fetch_sub(
1, std::memory_order_acq_rel) == 1)
stop();
}
Every run_async call increments the count. When the launched
task completes, the count decrements. When no tasks remain,
run() returns. Without work tracking in the executor, the
event loop would need a separate signaling mechanism or would
spin indefinitely.
Public Visibility
These operations are public, not private with friendship. The
reason is extensibility: work_guard is the library’s RAII
wrapper for work tracking, but users may define their own guards
with additional behaviors (logging, metrics, timeout detection).
Making work tracking private would require the library to grant
friendship to types it cannot anticipate.
Executor Validity
An executor becomes invalid when its context’s shutdown()
returns. After shutdown:
-
dispatch,post,on_work_started,on_work_finished: undefined behavior. -
Copy, comparison,
context(): valid until the context is destroyed.
This two-phase model exists because shutdown drains outstanding work. During the drain, executors must still be copyable (they are stored in pending operations) and comparable (for same-executor checks). Only the work-submission operations become invalid, because the context has stopped accepting new work.
Why context() Returns execution_context&
The context() operation returns a reference to the
execution_context base class, not the concrete derived type.
This serves two purposes:
-
Type erasure.
executor_refcan wrap any executor without knowing its context type. Ifcontext()returned a concrete type, the vtable would need a different return type per executor type. -
Service lookup. The
execution_contextbase class providesuse_service<T>()andmake_service<T>(), which is sufficient for all runtime service discovery. I/O objects do not need the concrete context type to find their services.
Corosio demonstrates this pattern throughout its public API. I/O objects accept any executor and extract the context via the base class reference:
template<class Ex>
requires capy::Executor<Ex>
explicit tcp_socket(Ex const& ex)
: tcp_socket(ex.context())
{
}
The socket constructor receives execution_context& and looks
up the socket service. The concrete context type — epoll_context,
iocp_context, select_context — is irrelevant to the socket.
The executor_ref Design
executor_ref is a non-owning, type-erased wrapper for any
executor satisfying the Executor concept. It is the mechanism
by which I/O operations store and use executors without templates.
Two Pointers
The entire object is two pointers:
class executor_ref
{
void const* ex_; // pointer to the executor
detail::executor_vtable const* vt_; // pointer to the vtable
};
Two pointers fit in two registers. executor_ref can be passed
by value as cheaply as passing a pointer. No heap allocation, no
small-buffer optimization, no reference counting.
Why Not std::function or std::any
std::function and small-buffer-optimized type erasure wrappers
have overhead that executor usage cannot tolerate:
-
Heap allocation.
std::functionmay allocate when the callable exceeds the SBO threshold. Executor dispatch happens on every I/O completion — allocation on the hot path is unacceptable. -
Reference counting.
std::shared_ptr-based wrappers add atomic reference count operations on every copy. Executors are copied frequently as they propagate through coroutine chains. -
Indirection. SBO wrappers store either inline data or a heap pointer, adding a branch on every operation.
executor_ref avoids all three. The vtable pointer goes directly
to a static constexpr structure in .rodata. One indirection,
no branches, no allocation.
Why Not C++ Virtual Functions
C++ virtual dispatch places the vtable pointer inside each
heap-allocated object. Every virtual call chases a pointer from
the object to its vtable, which may reside at an unpredictable
address in memory. When objects of different types are
interleaved on the heap, their vtable pointers point to
different locations in .rodata, defeating spatial prefetch and
polluting the instruction cache.
executor_ref separates the vtable from the object. The vtable
is a static constexpr structure — one per executor type,
shared by all instances of that type. Because most programs use
only one or two executor types (a thread pool executor and
perhaps a strand), the vtable stays hot in L1 cache. The
executor pointer and the vtable pointer sit adjacent in the
executor_ref object, so both are loaded in a single cache line.
Reference Semantics
executor_ref stores a pointer to the executor, not a copy. The
executor must outlive the executor_ref. This matches how
executors propagate through coroutine chains: the executor is
owned by the execution context (which outlives all coroutines
running on it), and executor_ref is a lightweight handle
passed through await_suspend and stored in I/O operation
structures.
The I/O Completion Pattern
The executor concept is designed around a single use case: I/O completion dispatch. This pattern is the reason the concept exists.
Capture at Initiation
When a coroutine co_await`s an I/O awaitable, the awaitable’s
`await_suspend receives the caller’s executor and stores it
as executor_ref:
template<typename Ex>
auto await_suspend(
std::coroutine_handle<> h,
Ex const& ex) -> std::coroutine_handle<>
{
// ex is captured as executor_ref in the operation
return socket_.connect(h, ex, endpoint_, token_, &ec_);
}
The operation structure stores both the coroutine handle and the executor reference:
struct io_op : scheduler_op
{
std::coroutine_handle<> h;
capy::executor_ref ex;
// ... error codes, buffers, etc.
};
Dispatch at Completion
When the I/O completes (from the reactor thread for epoll, the completion port for IOCP, or the select loop), the operation uses the stored executor to resume the coroutine:
void operator()() override
{
// ... set error codes ...
capy::executor_ref saved_ex(std::move(ex));
std::coroutine_handle<> saved_h(std::move(h));
impl_ptr.reset();
saved_ex.dispatch(saved_h);
}
dispatch checks whether the current thread is already running
on the executor’s context. If so, the coroutine resumes inline.
If not, the coroutine is posted for later execution on the
correct context.
Platform Independence
This pattern is identical across all three Corosio backends:
epoll (Linux), IOCP (Windows), and select (POSIX fallback). The
executor concept and executor_ref provide the abstraction that
makes this possible. The backend-specific code deals with I/O
readiness or completion notification. The executor-specific code
deals with coroutine scheduling. The two concerns are cleanly
separated.
Why Not std::execution (P2300)
P2300 defines a sender/receiver model
where execution context flows backward from receiver to
sender via queries after connect():
task<int> async_work(); // Frame allocated NOW
auto sndr = async_work();
auto op = connect(sndr, receiver); // Allocator available NOW
start(op); // -- too late
For coroutines, this ordering is fatal. Coroutine frame
allocation happens before the coroutine body executes. The
compiler calls operator new first, then constructs the
promise, then begins execution. Any mechanism that provides the
allocator after the coroutine call — receiver queries,
await_transform, explicit method calls — arrives after the
frame is already allocated with the wrong (or default)
allocator.
Capy’s model flows context forward from launcher to task.
The run_async(ex, alloc)(my_task()) two-phase invocation sets
the thread-local allocator before the task expression is
evaluated, so operator new reads it in time. This is
described in detail in Run API.
The same forward-flowing model applies to executors. The
launcher binds the executor before the task runs. The task’s
promise stores the executor and propagates it to nested
awaitables via await_transform. Context flows from caller to
callee at every level, never backward.
Conforming Signatures
A minimal executor implementation:
class my_executor
{
public:
execution_context& context() const noexcept;
void on_work_started() const noexcept;
void on_work_finished() const noexcept;
void dispatch(std::coroutine_handle<> h) const;
void post(std::coroutine_handle<> h) const;
bool operator==(my_executor const&) const noexcept;
};
Summary
The Executor concept provides dispatch and post for
coroutine scheduling, work tracking for event loop lifetime, and
context() for service access. The design descends from Asio’s
executor model but is adapted for coroutines: defer is
replaced by symmetric transfer, function objects are replaced by
std::coroutine_handle<>, and dispatch returns void
because I/O completions are dispatched after suspension, not
during it.
executor_ref type-erases any executor into two pointers,
enabling platform-independent I/O completion dispatch with zero
allocation and predictable cache behavior. The
capture-at-initiation / dispatch-at-completion pattern is the
fundamental use case the concept serves.