Agent Trajectory Data
for Coding-Agent Training.

Coding agents fail in places that look fine on a single-step benchmark but break in multi-step real work. The fix is training and evaluating on trajectories that look like real software engineering - with full context: terminal, browser, IDE, the engineer's reasoning out loud, and the eventual outcome. Beyond Labs captures and delivers that data.

What a Trace Contains

Layer List:

Screen recording

Full-resolution screen recording with timestamped click, keystroke, and scroll events. Replayable frame-by-frame.

Terminal session log

with every command, output, and exit code.

Browser activity

with URLs, page content, and form interactions.

IDE State

including file diffs, test runs, and lint output.

Verbalization

first-person audio or text captured live, describing what the engineer is trying, why, and how they decide next.

Outcome

task completion, success criteria met or not, time-to-completion, blockers encountered.

Task Families We Cover

Bug fixes

Single-repo and cross-repo. Root-cause isolation, fix, regression test - full trace from repro to green CI.

Feature implementation

Multi-file, multi-step. Spec reading, planning verbalization, implementation, and review cycle captured end-to-end.

Test authoring & TDD workflows

Writing tests before code, running red-green-refactor cycles, and documenting the reasoning at each pivot.

Refactoring & migration tasks

Systematic transitions - legacy to modern, messy to clean - with rationale verbalized at every structural decision.

Code review & reviewer-feedback cycles

Reviewer perspective: reading, assessing, annotating, and iterating on AI-generated or peer-written code.

Debug with observability

Using logs, metrics, and distributed traces to locate and fix issues - the full real-world debugging loop.

Multi-tool / multi-agent orchestration tasks

Tasks where the engineer coordinates across tools, APIs, or agents - capturing the orchestration reasoning that
single-step benchmarks miss entirely.

Capture Environment

We run captures in a controlled sandbox environment with consistent tooling so traces are clean, replayable, and easy to ingest. Custom capture configurations available on request, including air-gapped environments for sensitive work.

trace_schema_v2.json - structured JSON delivery format

Data Schema & Delivery

Delivered as structured JSON with bundled artifacts. Schema is published and version-controlled. We work with clients to adapt to their internal trace format with no setup overhead after the first project.

Quality Controls

Trace completeness check

every artifact present, every step accounted for.

Verbalization coverage

every meaningful decision verbalized; gaps flagged for re-capture.

Outcome verification

every "success" trace independently re-run against the success criteria.

Senior calibrator sign-off

per batch.

See a sample trace

Tell us the task family you're training on and we'll deliver one fully-captured trace, free, within 7 business days.

Agent Trajectory Datafor Coding-Agent Training.