
Agent Trajectory Data
for Coding-Agent Training.
Coding agents fail in places that look fine on a single-step benchmark but break in multi-step real work. The fix is training and evaluating on trajectories that look like real software engineering - with full context: terminal, browser, IDE, the engineer's reasoning out loud, and the eventual outcome. Beyond Labs captures and delivers that data.
What a Trace Contains
Full-resolution screen recording with timestamped click, keystroke, and scroll events. Replayable frame-by-frame.
with every command, output, and exit code.
with URLs, page content, and form interactions.
including file diffs, test runs, and lint output.
first-person audio or text captured live, describing what the engineer is trying, why, and how they decide next.
task completion, success criteria met or not, time-to-completion, blockers encountered.

Task Families We Cover
Bug fixes
Single-repo and cross-repo. Root-cause isolation, fix, regression test - full trace from repro to green CI.
Feature implementation
Multi-file, multi-step. Spec reading, planning verbalization, implementation, and review cycle captured end-to-end.
Test authoring & TDD workflows
Writing tests before code, running red-green-refactor cycles, and documenting the reasoning at each pivot.
Refactoring & migration tasks
Systematic transitions - legacy to modern, messy to clean - with rationale verbalized at every structural decision.
Code review & reviewer-feedback cycles
Reviewer perspective: reading, assessing, annotating, and iterating on AI-generated or peer-written code.
Debug with observability
Using logs, metrics, and distributed traces to locate and fix issues - the full real-world debugging loop.
Multi-tool / multi-agent orchestration tasks
Tasks where the engineer coordinates across tools, APIs, or agents - capturing the orchestration reasoning that
single-step benchmarks miss entirely.


Data Schema & Delivery
Delivered as structured JSON with bundled artifacts. Schema is published and version-controlled. We work with clients to adapt to their internal trace format with no setup overhead after the first project.
Quality Controls
Trace completeness check
every artifact present, every step accounted for.
Verbalization coverage
every meaningful decision verbalized; gaps flagged for re-capture.
Outcome verification
every "success" trace independently re-run against the success criteria.
Senior calibrator sign-off
per batch.


