This document explains the internal design of libtealet, focusing on the stack-slicing technique and memory optimization strategies.
Stack Direction Terminology
⚠️ Important Concept: libtealet uses platform-agnostic "near" and "far" terminology for stack boundaries.
The Problem
C stacks grow in different directions on different platforms:
- Descending stacks (most common: x86, ARM): Stack pointer decreases as functions are called
- Stack "bottom" = high memory address
- Stack "top" = low memory address (where SP currently points)
- Ascending stacks (rare): Stack pointer increases
- Stack "bottom" = low memory address
- Stack "top" = high memory address
The traditional "top/bottom" metaphor is ambiguous:
- In the stack-of-cards metaphor: "bottom" = oldest (first), "top" = newest (last)
- In memory addresses: depends on whether stack grows up or down
The Solution: Near and Far
libtealet uses direction-neutral terminology:
- Near boundary: The current stack position (where the stack pointer is now)
- Closest to the active function's frame
- Changes as the stack grows/shrinks
- In a descending stack: lower address
- In an ascending stack: higher address
- Far boundary: The stack limit (where the stack started or cannot grow beyond)
- Furthest from the current position
- Fixed reference point for each tealet
- In a descending stack: higher address (original entry point)
- In an ascending stack: lower address
Mnemonic: Think of a telescope extending:
- Near = the eyepiece you're looking through (current position)
- Far = the distant objective lens (fixed reference point)
Example on x86-64 (descending stack):
High addresses → 0x7fff8000 ← Far boundary (main entry, stack "bottom")
0x7fff7f00 ← Function A's frame
0x7fff7e00 ← Function B's frame
Low addresses → 0x7fff7d00 ← Near boundary (current SP, stack "top")
↑
Stack grows downward
Same logical stack on ascending platform:
Low addresses → 0x1000 ← Far boundary (main entry)
0x1100 ← Function A's frame
0x1200 ← Function B's frame
High addresses → 0x1300 ← Near boundary (current SP)
↑
Stack grows upward
In both cases:
- Near = current execution point
- Far = original entry point
- Saved stack = memory between near and far
This abstraction allows the same algorithm to work regardless of platform stack direction.
Overview
libtealet implements symmetric coroutines through stack-slicing: saving portions of the C execution stack to the heap and restoring them later. The innovation lies in incremental stack saving that minimizes memory usage.
Core Data Structures
tealet_chunk_t - Stack Segment
typedef struct tealet_chunk_t {
struct tealet_chunk_t *next;
char *stack_near;
size_t size;
char data[1];
} tealet_chunk_t;
A chunk represents a contiguous saved stack segment. Multiple chunks can be chained together.
tealet_stack_t - Complete Saved Stack
typedef struct tealet_stack_t {
int refcount;
struct tealet_stack_t **prev;
struct tealet_stack_t *next;
char *stack_far;
size_t saved;
struct tealet_chunk_t chunk;
} tealet_stack_t;
Reference counting allows stack sharing when duplicating tealets.
Doubly-linked list (prev/next) tracks partially-saved stacks in the g_prev chain.
tealet_sub_t - Tealet Instance
typedef struct tealet_sub_t {
char *stack_far;
tealet_stack_t *stack;
int id;
} tealet_sub_t;
Stack states:
NULL: Tealet is currently running (on the C stack)
(tealet_stack_t*)-1: Defunct (corrupted, unusable)
tealet_stack_t*: Saved stack data
stack_far indicates:
- Valid pointer: Normal tealet (bounded stack extent)
NULL: Tealet is exiting
STACKMAN_SP_FURTHEST: Unbounded stack (main tealet)
The main tealet uses STACKMAN_SP_FURTHEST because it runs on the process's original C stack, which could theoretically extend all the way to the start of the process. This special value means "save as much as needed" without a predefined limit. When computing stack statistics, tealets with STACKMAN_SP_FURTHEST have their naive size calculated from actual saved chunks rather than the full theoretical extent (which would be the entire address space).
tealet_main_t - Master Tealet
typedef struct tealet_main_t {
tealet_sub_t base;
void *g_user;
tealet_sub_t *g_current;
tealet_sub_t *g_previous;
tealet_sub_t *g_target;
void *g_arg;
tealet_stack_t *g_prev;
tealet_sr_e g_sw;
int g_flags;
int g_tealets, g_counter;
size_t g_extrasize;
} tealet_main_t;
The g_prev field is the linchpin of the memory optimization strategy.
Stack-Chaining Optimization
The Problem
When tealet A switches to tealet B, A's stack must be saved. But how much?
Naive approach: Save A's entire stack
- Wastes memory (most of stack may be unused)
- Slow (unnecessary copying)
Challenge: A might later switch to tealet C that's even deeper on the stack.
The Solution: Incremental Saving with Chaining
libtealet saves stacks incrementally based on need:
- When A switches to B, save A's stack only up to B's far boundary
- Link A's partially-saved stack into the
g_prev chain
- If A later switches to C (deeper than B), grow A's saved stack
Memory savings: Only the differential portions are saved, not complete stack copies.
Visual Example
High Memory (stack bottom)
↑
| STACKMAN_SP_FURTHEST (main's far boundary)
| ┌─────────────────┐
| │ Main stack │
| └─────────────────┘
├─ A's stack_far
| ┌─────────────────┐
| │ A's stack │ ← Created tealet A here
| └─────────────────┘
├─ B's stack_far
| ┌─────────────────┐
| │ B's stack │ ← Created tealet B deeper
| └─────────────────┘
├─ C's stack_far
| ┌─────────────────┐
| │ C's stack │ ← Created tealet C even deeper
| └─────────────────┘
↓
Low Memory (stack top)
Switch Sequence Example
Step 1: Main → A
Action: Main switches to A
Saved: Nothing (A is new, becomes active)
g_prev: Empty
Step 2: A → B
Action: A switches to B
Saved: A's stack from current position up to B's stack_far
g_prev: [A] (partially saved)
A's tealet_stack_t:
chunk.size = distance(A's position, B's stack_far)
saved = chunk.size
Step 3: B → A
Action: B switches back to A
Saved: B's stack up to A's stack_far
g_prev: [B, A] (both partially saved)
Result: A's stack restored, A continues
Step 4: A → C
Action: A switches to C (C is deeper than B)
Saved: A's stack GROWS from B's limit to C's limit
g_prev: [A] (grown), B removed if fully saved
A's tealet_stack_t grows:
new_chunk = malloc(...)
new_chunk.size = distance(B's stack_far, C's stack_far)
new_chunk.next = chunk.next
chunk.next = new_chunk
saved += new_chunk.size
The g_prev Chain
The g_main->g_prev chain tracks partially-saved stacks:
g_main->g_prev → stackA → stackB → stackC → NULL
↑ ↑ ↑
(A) (B) (C)
partial partial partial
Walk the chain during switch to grow stacks as needed:
static int tealet_stack_grow_list(tealet_main_t *main,
tealet_stack_t *list, char *saveto, tealet_stack_t *target,
int fail_ok)
{
while (list) {
if (list == target) {
tealet_stack_unlink(list);
return 0;
}
int full;
int fail = tealet_stack_growto(main, list, saveto, &full, fail_ok);
if (fail) return fail;
if (full)
tealet_stack_unlink(list);
list = list->next;
}
return 0;
}
Unlink when fully saved: Once a stack is saved entirely (up to its stack_far), remove it from the chain.
Stack Growth Algorithm
static int tealet_stack_grow(tealet_main_t *main,
tealet_stack_t *stack, size_t size)
{
size_t diff = size - stack->saved;
tealet_chunk_t *chunk = malloc(sizeof(tealet_chunk_t) + diff);
memcpy(&chunk->data[0], stack_near + stack->saved, diff);
chunk->next = stack->chunk.next;
stack->chunk.next = chunk;
stack->saved = size;
return 0;
}
Stacks are chains of chunks, grown on demand.
Stack Restore
static void tealet_stack_restore(tealet_stack_t *stack)
{
tealet_chunk_t *chunk = &stack->chunk;
do {
memcpy(chunk->stack_near, &chunk->data[0], chunk->size);
chunk = chunk->next;
} while (chunk);
}
Simple iteration through chunks, copying each back to the C stack.
State Transitions
A tealet progresses through these states:
[CREATED]
↓
tealet_create() / tealet_new()
↓
[ACTIVE] ←──────┐
↓ │
tealet_switch() │ (restore)
↓ │
[SAVED] ────────┘
↓
run() returns
↓
[EXITED] or [DEFUNCT]
State Indicators
stack pointer:
NULL: Active (currently running)
tealet_stack_t*: Saved (suspended)
(tealet_stack_t*)-1: Defunct (corrupt)
stack_far pointer:
- Valid address: Normal bounded stack
NULL: Exiting (run function returning)
STACKMAN_SP_FURTHEST: Unbounded stack (typically main tealet)
The special value STACKMAN_SP_FURTHEST represents an unbounded stack extent, used by the main tealet since it runs on the original process stack with no predetermined limit.
Defunct State
A tealet becomes defunct if:
- Stack couldn't be saved due to memory shortage during exit
- Explicitly marked after corruption
Recovery: None. Defunct tealets cannot be used and should be deleted.
if (tealet_status(t) == TEALET_STATUS_DEFUNCT) {
}
TEALET_API void tealet_delete(tealet_t *target)
Deallocate a non-main tealet.
Reference Counting
Stacks use reference counting for sharing:
static tealet_stack_t *tealet_stack_dup(tealet_stack_t *stack) {
stack->refcount += 1;
return stack;
}
static void tealet_stack_decref(tealet_main_t *main,
tealet_stack_t *stack)
{
if (stack == NULL || --stack->refcount > 0)
return;
tealet_chunk_t *chunk = stack->chunk.next;
free(stack);
while (chunk) {
tealet_chunk_t *next = chunk->next;
free(chunk);
chunk = next;
}
}
Why reference counting?
tealet_duplicate() creates a copy sharing the same saved stack:
tealet_sub_t *copy = tealet_alloc(g_main);
copy->stack_far = tealet->stack_far;
copy->stack = tealet_stack_dup(tealet->stack);
}
TEALET_API tealet_t * tealet_duplicate(tealet_t *tealet)
Duplicate a suspended tealet and its saved stack state.
Multiple tealets can share a stack snapshot, useful for the stub pattern (template tealets).
In the current implementation, this sharing is introduced by duplication paths (for example tealet_duplicate() and helpers built on it). tealet_fork() creates a new tealet by saving stack state through switch/save logic and does not directly use tealet_stack_dup().
The Switch Operation
High-Level Flow
g_main->g_target = target;
g_main->g_arg = *parg;
stackman_switch(tealet_save_restore_cb, g_main);
*parg = g_main->g_arg;
return result;
}
TEALET_API int tealet_switch(tealet_t *target, void **parg)
Suspend current tealet and resume target.
Save/Restore Callback State Machine
typedef enum tealet_sr_e {
SW_NOP,
SW_RESTORE,
SW_ERR,
} tealet_sr_e;
The callback is invoked twice by stackman_switch():
Call 1: STACKMAN_OP_SAVE
- Save current stack
- Walk
g_prev chain, growing stacks as needed
- Return new stack pointer (or current if no restore)
Call 2: STACKMAN_OP_RESTORE
- Restore target stack from heap to C stack
- Update state
Memory Management
All allocations go through tealet_alloc_t:
tealet_free_t free_p;
void *context;
void *(* tealet_malloc_t)(size_t size, void *context)
Definition tealet.h:38
This allows:
- Custom allocators (memory pools, arenas)
- Instrumentation (leak detection, profiling)
- Platform-specific allocation strategies
Example with statistics:
tealet_statsalloc_init(&stats_alloc, &base);
printf("Allocated: %zu bytes in %zu calls\n",
stats_alloc.s_allocs, stats_alloc.n_allocs);
Definition tealet_extras.h:13
#define TEALET_ALLOC_INIT_MALLOC
Definition tealet.h:50
TEALET_API tealet_t * tealet_initialize(tealet_alloc_t *alloc, size_t extrasize)
Initialize libtealet and create the main tealet for the current thread.
Platform Abstraction
Platform-specific code is isolated in the stackman library:
void *stackman_switch(stackman_cb_t callback, void *context);
stackman handles:
- Register saving/restoring
- Stack pointer manipulation
- Calling conventions (CDECL, STDCALL, etc.)
- Architecture differences (x86, ARM, RISC-V, etc.)
libtealet provides the policy (when/what to save), stackman provides the mechanism (how to switch).
Performance Characteristics
Time Complexity
Space Complexity
Per tealet:
- Structure: ~64-128 bytes (platform-dependent)
- Saved stack: Variable, proportional to actual stack usage
- Extra data: User-specified
Chunk overhead:
- ~24 bytes per chunk
- Typical: 1-3 chunks per tealet
Optimization: Stack Chaining Benefit
Consider 10 nested tealets at different stack depths:
Without chaining (naive):
- Each tealet saves its entire stack: 10 × stack_size
With chaining:
- Each tealet saves only the differential: sum of differentials ≈ stack_size
Memory savings: ~10× in nested scenarios
Thread Safety
libtealet is not thread-safe across threads.
Rules:
- Each thread has its own
main tealet
- Tealets with different
main cannot interact
- No synchronization primitives included
Rationale: Co-routines are user-space scheduling, not parallelism. For parallel execution, use OS threads + libtealet within each thread.
Debugging
Debug Builds
Compile with -UNDEBUG to enable assertions:
Assertions check:
- Stack state consistency
- Reference count validity
- Chain integrity
- Switch preconditions
Tealet IDs
In debug builds, each tealet gets an ID:
#ifndef NDEBUG
tealet_sub_t->id = g_main->g_counter;
#endif
Useful for tracking which tealet is which during debugging.
Common Issues
Defunct tealet errors:
- Cause: Memory allocation failure during stack save
- Fix: Check available memory, use smaller stacks
Segfaults during switch:
- Cause: Passing stack-allocated data between tealets
- Fix: Use
tealet_malloc() for shared data
Assertion failures in g_prev chain:
- Cause: Corruption of chain pointers
- Fix: Check for buffer overruns, use valgrind
Design Rationale
Why Stack-Slicing vs. Separate Stacks?
Separate stacks (like OS threads):
- ✅ Simple: Each coroutine has fixed-size stack
- ❌ Memory waste: Pre-allocate large stacks
- ❌ Stack overflow: Fixed size can be too small
Stack-slicing (libtealet):
- ✅ Memory efficient: Grows on demand
- ✅ No stack overflow: Limited only by heap
- ❌ Complex: Requires saving/restoring
Why Symmetric vs. Asymmetric Coroutines?
Asymmetric (yield/resume):
- ✅ Simpler mental model
- ✅ Natural for generators
- ❌ Limited: Can only yield to caller
Symmetric (explicit switch):
- ✅ Flexible: Switch to any coroutine
- ✅ Powerful: Build any scheduling strategy
- ❌ More complex: Must manage transitions
libtealet chooses symmetric for maximum flexibility.
Why Reference Counting vs. Garbage Collection?
Reference counting:
- ✅ Deterministic: Resources freed immediately
- ✅ No runtime: No GC pauses
- ❌ Manual: User must avoid cycles
Garbage collection:
- ✅ Automatic: No manual management
- ❌ Pauses: Stop-the-world collection
- ❌ Non-deterministic: Unclear when freed
For a low-level C library, reference counting fits better.
Further Reading