Executor Concept Design

Overview

This document describes the design of the Executor concept: the interface through which coroutines are scheduled for execution. It explains the relationship to Asio’s executor model, why dispatch returns void, why defer was dropped, how executor_ref achieves zero-allocation type erasure, and the I/O completion pattern that motivates the entire design.

The Executor concept exists to answer one question: when a coroutine is ready to run, where does it run? The concept captures the rules for scheduling coroutine resumption, tracking outstanding work for graceful shutdown, and accessing the execution context that owns the executor. Every I/O awaitable in Corosio — sockets, acceptors, timers, resolvers — depends on this concept to dispatch completions back to the correct executor.

Definition

template<class E>
concept Executor =
    std::is_nothrow_copy_constructible_v<E> &&
    std::is_nothrow_move_constructible_v<E> &&
    requires(E& e, E const& ce, E const& ce2,
             std::coroutine_handle<> h)
    {
        { ce == ce2 } noexcept -> std::convertible_to<bool>;
        { ce.context() } noexcept;
        requires std::is_lvalue_reference_v<
            decltype(ce.context())> &&
            std::derived_from<
                std::remove_reference_t<
                    decltype(ce.context())>,
                execution_context>;
        { ce.on_work_started() } noexcept;
        { ce.on_work_finished() } noexcept;

        { ce.dispatch(h) };
        { ce.post(h) };
    };

An Executor provides exactly two operations on a coroutine handle:

dispatch(h) — Execute If Safe

If the executor determines it is safe (e.g., the current thread is already associated with the executor’s context), resumes the coroutine inline via h.resume(). Otherwise, posts the coroutine for later execution. Returns void.

post(h) — Always Queue

Queues the coroutine for later execution without ever executing it inline. Never blocks. Use when guaranteed asynchrony is required.

The remaining operations support context access, lifecycle management, and identity:

context() — Access the Execution Context

Returns an lvalue reference to the execution_context that created this executor. The context provides service infrastructure, frame allocators, and shutdown coordination.

on_work_started() / on_work_finished() — Track Work

Paired calls that track outstanding work. When the count reaches zero, the context’s event loop (run()) returns. These calls must be balanced: each on_work_started must have a matching on_work_finished.

operator== — Equality Comparison

Two executors are equal if they submit work to the same destination. This enables the same-executor optimization: when a completion’s executor matches the caller’s, the dispatch can skip the indirection.

Relationship to Asio

Kohlhoff’s Asio library established the executor-as-policy model that Capy inherits. As described in P0113R0:

An executor is to function execution as an allocator is to allocation.

Capy retains the core elements of this model:

  • Work tracking. on_work_started / on_work_finished for graceful shutdown.

  • dispatch / post duality. Execute-if-safe versus always-queue.

  • execution_context base class. Service infrastructure and context lifetime management.

  • Equality comparison. Same-executor optimization.

Capy removes or changes:

  • defer. Dropped entirely. See Why Two Operations, Not Three.

  • Function object submission. Asio executors accept arbitrary callables. Capy executors accept only std::coroutine_handle<>. This removes the need for allocator-aware function erasure and enables a simpler, cheaper type-erased wrapper (executor_ref).

  • dispatch return type. Asio’s dispatch returns void for the same reason Capy’s does, but Capy also considered and rejected a coroutine_handle<> return for symmetric transfer. See Why dispatch Returns void.

The result is a concept that preserves Asio’s proven execution model while removing the machinery that a coroutine-native library does not need.

Why dispatch Returns void

An earlier design had dispatch return std::coroutine_handle<> so that callers could use it for symmetric transfer from await_suspend. This was rejected because it violates a fundamental constraint of the I/O layer.

The Problem: Synchronous Completion During await_suspend

When an I/O awaitable initiates an operation inside await_suspend, the I/O might complete immediately. If it does, the completion path would call dispatch(h) while the caller’s await_suspend is still on the call stack. If dispatch resumed the coroutine inline via h.resume(), the coroutine would execute while await_suspend has not yet returned — resuming a coroutine from inside await_suspend before the suspension machinery completes risks undefined behavior.

The C++ standard describes the sequencing in [expr.await/5.1]:

If the result of await-ready is false, the coroutine is considered suspended. Then, await-suspend is evaluated.

Although the standard considers the coroutine suspended before await_suspend is called, resuming it from within await_suspend creates a nested resumption on the same call stack. The resumed coroutine runs, potentially suspends again or completes, and then control returns into the middle of await_suspend. If the coroutine was destroyed during resumption, await_suspend returns into a destroyed frame.

Why I/O Awaitables Return void or std::noop_coroutine()

To avoid this, all I/O awaitables return void or std::noop_coroutine() from await_suspend. Both forms guarantee that the caller is fully suspended and the call stack has unwound before any completion handler can resume the coroutine. The I/O operation is initiated during await_suspend, but the completion is dispatched later — from the event loop, after await_suspend has returned.

P0913R1 introduced the coroutine_handle<Z> return type for symmetric transfer, which is the correct mechanism for coroutine-to-coroutine control transfer (as used by task<T> internally). But I/O awaitables cannot use it because the I/O completion is asynchronous relative to await_suspend — it comes from the reactor or proactor, not from the awaitable itself.

Consequence for dispatch

Since the primary consumer of dispatch is I/O completion — called after the coroutine is suspended, from the event loop — dispatch does not need to participate in symmetric transfer. It calls h.resume() inline when safe and returns void. A conforming implementation looks like:

void dispatch(std::coroutine_handle<> h) const
{
    if(ctx_.running_in_this_thread())
        h.resume();
    else
        post(h);
}

After dispatch returns, the state of h is unspecified. The coroutine may have completed, been destroyed, or suspended at a different point. Callers must not use h after calling dispatch.

Why Two Operations, Not Three

Asio provides three submission methods: dispatch, post, and defer. Capy provides only dispatch and post.

What defer Does

P0113R0 describes defer:

A defer operation is similar to a post operation…​ However, a defer operation also implies a relationship between the caller and the function object being submitted. It is intended for use when submitting a function object that represents a continuation of the caller.

The optimization this enables is thread-local queuing. When the caller is already executing within the executor’s context, defer saves the continuation to a thread-local queue instead of the shared work queue. From P0113R0:

If the caller is executing within the thread pool, saves the function object to a thread-local queue. Once control returns to the thread pool, the function object is scheduled for execution as soon as possible.

Why Coroutines Make defer Redundant

In a callback-based library, when an asynchronous operation completes, the completion handler must be submitted to the executor as a function object. If the handler is the caller’s continuation, defer tells the executor "this is my next step; optimize accordingly."

In a coroutine-native library, this optimization is provided by the language itself. P0913R1 introduced symmetric transfer specifically to eliminate the need for queues and schedulers in coroutine-to-coroutine control transfer:

Currently Coroutines TS only supports asymmetric control transfer where suspend always returns control back to the current coroutine caller or resumer. In order to emulate symmetric coroutine to coroutine control transfer, one needs to build a queue and a scheduler.

When task<T>::await_suspend returns the parent’s coroutine handle, the compiler performs a tail-call-like transfer directly to the parent. No queue, no executor submission, no defer. The optimization that defer provides through a runtime hint, symmetric transfer provides through a compile-time guarantee.

Corosio confirms this in practice: its entire I/O layer — sockets, acceptors, timers, resolvers, signals — across all three backends (epoll, IOCP, select) uses only dispatch and post. No code path requires defer.

Why std::coroutine_handle<>, Not Typed Handles

The executor accepts std::coroutine_handle<> — the type-erased handle — rather than std::coroutine_handle<P> for a specific promise type P.

This decision has three consequences:

  • Type erasure is possible. executor_ref wraps any executor behind a uniform interface. If dispatch and post were templated on the promise type, the vtable would need to be generic over all promise types, making type erasure impractical.

  • Executor implementations are independent of coroutine internals. An executor schedules resumption. It does not need to know what the coroutine’s promise type is, what value it produces, or how it handles exceptions. The type-erased handle provides exactly the right interface: resume() and nothing else.

  • I/O operation structures stay simple. Every pending I/O operation in Corosio stores two fields: std::coroutine_handle<> h (a typedef for std::coroutine_handle<>) and capy::executor_ref ex. Both are type-erased. The operation structure does not need to be templated on the coroutine’s promise type, which keeps the I/O backend code non-generic and out of headers.

Why Nothrow Copy and Move

The concept requires std::is_nothrow_copy_constructible_v<E> and std::is_nothrow_move_constructible_v<E>.

Executors propagate through coroutine machinery at points where exceptions cannot be handled: inside await_suspend, during promise construction, and through type-erased wrappers like executor_ref. An exception thrown from an executor copy at any of these points would leave the coroutine in an unrecoverable state — suspended but with no executor to resume it through.

The nothrow requirement eliminates this failure mode entirely. In practice, executors are lightweight handles — a pointer to the execution context and perhaps a strand pointer or a priority value. Nothrow copy and move are natural for such types. The requirement does not impose a burden; it documents what is already true of every reasonable executor implementation.

Work Tracking, Shutdown, and Executor Validity

The on_work_started and on_work_finished operations serve three roles.

Event Loop Lifetime

Work tracking is the mechanism by which the event loop knows when to stop. When outstanding work reaches zero, run() returns. This is not bookkeeping — it is the event loop’s termination signal.

In Corosio, on_work_finished triggers stop() when the atomic work count reaches zero:

void on_work_finished() noexcept
{
    if(outstanding_work_.fetch_sub(
        1, std::memory_order_acq_rel) == 1)
        stop();
}

Every run_async call increments the count. When the launched task completes, the count decrements. When no tasks remain, run() returns. Without work tracking in the executor, the event loop would need a separate signaling mechanism or would spin indefinitely.

Public Visibility

These operations are public, not private with friendship. The reason is extensibility: work_guard is the library’s RAII wrapper for work tracking, but users may define their own guards with additional behaviors (logging, metrics, timeout detection). Making work tracking private would require the library to grant friendship to types it cannot anticipate.

Executor Validity

An executor becomes invalid when its context’s shutdown() returns. After shutdown:

  • dispatch, post, on_work_started, on_work_finished: undefined behavior.

  • Copy, comparison, context(): valid until the context is destroyed.

This two-phase model exists because shutdown drains outstanding work. During the drain, executors must still be copyable (they are stored in pending operations) and comparable (for same-executor checks). Only the work-submission operations become invalid, because the context has stopped accepting new work.

Why context() Returns execution_context&

The context() operation returns a reference to the execution_context base class, not the concrete derived type.

This serves two purposes:

  • Type erasure. executor_ref can wrap any executor without knowing its context type. If context() returned a concrete type, the vtable would need a different return type per executor type.

  • Service lookup. The execution_context base class provides use_service<T>() and make_service<T>(), which is sufficient for all runtime service discovery. I/O objects do not need the concrete context type to find their services.

Corosio demonstrates this pattern throughout its public API. I/O objects accept any executor and extract the context via the base class reference:

template<class Ex>
    requires capy::Executor<Ex>
explicit tcp_socket(Ex const& ex)
    : tcp_socket(ex.context())
{
}

The socket constructor receives execution_context& and looks up the socket service. The concrete context type — epoll_context, iocp_context, select_context — is irrelevant to the socket.

The executor_ref Design

executor_ref is a non-owning, type-erased wrapper for any executor satisfying the Executor concept. It is the mechanism by which I/O operations store and use executors without templates.

Two Pointers

The entire object is two pointers:

class executor_ref
{
    void const* ex_;                       // pointer to the executor
    detail::executor_vtable const* vt_;    // pointer to the vtable
};

Two pointers fit in two registers. executor_ref can be passed by value as cheaply as passing a pointer. No heap allocation, no small-buffer optimization, no reference counting.

Why Not std::function or std::any

std::function and small-buffer-optimized type erasure wrappers have overhead that executor usage cannot tolerate:

  • Heap allocation. std::function may allocate when the callable exceeds the SBO threshold. Executor dispatch happens on every I/O completion — allocation on the hot path is unacceptable.

  • Reference counting. std::shared_ptr-based wrappers add atomic reference count operations on every copy. Executors are copied frequently as they propagate through coroutine chains.

  • Indirection. SBO wrappers store either inline data or a heap pointer, adding a branch on every operation.

executor_ref avoids all three. The vtable pointer goes directly to a static constexpr structure in .rodata. One indirection, no branches, no allocation.

Why Not C++ Virtual Functions

C++ virtual dispatch places the vtable pointer inside each heap-allocated object. Every virtual call chases a pointer from the object to its vtable, which may reside at an unpredictable address in memory. When objects of different types are interleaved on the heap, their vtable pointers point to different locations in .rodata, defeating spatial prefetch and polluting the instruction cache.

executor_ref separates the vtable from the object. The vtable is a static constexpr structure — one per executor type, shared by all instances of that type. Because most programs use only one or two executor types (a thread pool executor and perhaps a strand), the vtable stays hot in L1 cache. The executor pointer and the vtable pointer sit adjacent in the executor_ref object, so both are loaded in a single cache line.

Reference Semantics

executor_ref stores a pointer to the executor, not a copy. The executor must outlive the executor_ref. This matches how executors propagate through coroutine chains: the executor is owned by the execution context (which outlives all coroutines running on it), and executor_ref is a lightweight handle passed through await_suspend and stored in I/O operation structures.

The I/O Completion Pattern

The executor concept is designed around a single use case: I/O completion dispatch. This pattern is the reason the concept exists.

Capture at Initiation

When a coroutine co_await`s an I/O awaitable, the awaitable’s `await_suspend receives the caller’s executor and stores it as executor_ref:

template<typename Ex>
auto await_suspend(
    std::coroutine_handle<> h,
    Ex const& ex) -> std::coroutine_handle<>
{
    // ex is captured as executor_ref in the operation
    return socket_.connect(h, ex, endpoint_, token_, &ec_);
}

The operation structure stores both the coroutine handle and the executor reference:

struct io_op : scheduler_op
{
    std::coroutine_handle<> h;
    capy::executor_ref ex;
    // ... error codes, buffers, etc.
};

Dispatch at Completion

When the I/O completes (from the reactor thread for epoll, the completion port for IOCP, or the select loop), the operation uses the stored executor to resume the coroutine:

void operator()() override
{
    // ... set error codes ...
    capy::executor_ref saved_ex(std::move(ex));
    std::coroutine_handle<> saved_h(std::move(h));
    impl_ptr.reset();
    saved_ex.dispatch(saved_h);
}

dispatch checks whether the current thread is already running on the executor’s context. If so, the coroutine resumes inline. If not, the coroutine is posted for later execution on the correct context.

Platform Independence

This pattern is identical across all three Corosio backends: epoll (Linux), IOCP (Windows), and select (POSIX fallback). The executor concept and executor_ref provide the abstraction that makes this possible. The backend-specific code deals with I/O readiness or completion notification. The executor-specific code deals with coroutine scheduling. The two concerns are cleanly separated.

Why Not std::execution (P2300)

P2300 defines a sender/receiver model where execution context flows backward from receiver to sender via queries after connect():

task<int> async_work();              // Frame allocated NOW
auto sndr = async_work();
auto op = connect(sndr, receiver);   // Allocator available NOW
start(op);                           //   -- too late

For coroutines, this ordering is fatal. Coroutine frame allocation happens before the coroutine body executes. The compiler calls operator new first, then constructs the promise, then begins execution. Any mechanism that provides the allocator after the coroutine call — receiver queries, await_transform, explicit method calls — arrives after the frame is already allocated with the wrong (or default) allocator.

Capy’s model flows context forward from launcher to task. The run_async(ex, alloc)(my_task()) two-phase invocation sets the thread-local allocator before the task expression is evaluated, so operator new reads it in time. This is described in detail in Run API.

The same forward-flowing model applies to executors. The launcher binds the executor before the task runs. The task’s promise stores the executor and propagates it to nested awaitables via await_transform. Context flows from caller to callee at every level, never backward.

Conforming Signatures

A minimal executor implementation:

class my_executor
{
public:
    execution_context& context() const noexcept;

    void on_work_started() const noexcept;
    void on_work_finished() const noexcept;

    void dispatch(std::coroutine_handle<> h) const;
    void post(std::coroutine_handle<> h) const;

    bool operator==(my_executor const&) const noexcept;
};

Summary

The Executor concept provides dispatch and post for coroutine scheduling, work tracking for event loop lifetime, and context() for service access. The design descends from Asio’s executor model but is adapted for coroutines: defer is replaced by symmetric transfer, function objects are replaced by std::coroutine_handle<>, and dispatch returns void because I/O completions are dispatched after suspension, not during it.

executor_ref type-erases any executor into two pointers, enabling platform-independent I/O completion dispatch with zero allocation and predictable cache behavior. The capture-at-initiation / dispatch-at-completion pattern is the fundamental use case the concept serves.