Member of Technical Staff - Agent Systems

Short version

Make AI employees do more real work, more reliably.

You will own the systems layer underneath configurable AI employees: runtimes, tools, memory, evaluation, orchestration, recovery, and the loop where the system checks its own work.

If you have never built or modified an agent harness, runtime, tool interface, automation loop, or serious AI-assisted development setup, this is probably not the role.

Reality

The agent is not the demo. The operating system around it is the product.

A customer should be able to define a useful employee, connect it to live work, and trust that it can act inside real boundaries. That only works if execution, state, permissions, tool access, context, QA, and recovery behave as one system.

The loop we care about is simple: configure useful work, run it in the customer’s environment, verify the result, learn from failure, and make the next employee stronger without making the system harder to trust.

What you will own

The harness beneath the AI employee.

You will build the shared layer for long-running AI work: isolated execution, reproducible environments, durable state, tool orchestration, observability, recovery, permission boundaries, and quality automation.

This means working across the control plane that schedules work, the data plane that runs isolated sandboxes and persistent Docker-style runtimes, and the network ingress that keeps live connections reliable.

The job is not to make one impressive demo. It is to make the thousandth action as reliable as the first.

Problems

Mostly systems, sometimes product, always real.

Make new employee environments faster to create and safer to upgrade.
Improve cold-start paths for sandboxed work and keep persistent runtimes fast, inspectable, and recoverable.
Turn live failures into reproducible checks, fixtures, and runbooks.
Improve tool access, isolation, state preservation, and human handoff behavior.
Build verification loops that catch false success before users do.
Move across runtime code, product surfaces, deployment scripts, and debugging tools when the work requires it.

Bar

Speed matters, but evidence wins.

You should ship quickly by directing coding agents, reviewing their work, and proving the result from code, logs, state, traces, tests, browser checks, or focused reproductions. We do not count “the agent said it worked” as evidence.

Week one should not be onboarding theater. The goal is to ship something real, understand one live surface, and start owning the consequences.

Signals

What we look for.

You have built or seriously modified an agent system, harness, tool interface, or automation loop.
You use coding agents daily and verify behavior from code, state, logs, requests, or focused repros.
You can reason about failure modes across execution, permissions, state, dependencies, and user-visible behavior.
You care about infrastructure that compounds: faster builds, better fixtures, safer syncs, and clearer debugging paths.
You can move from runtime code to product surface to deployment script when the problem demands it.

Location

San Francisco first. Remote possible for exceptional candidates.

We want the early team spending meaningful time in the same room. The default is San Francisco, with visa and relocation support where useful. Remote can work for people who are already unusually effective in ambiguous, high-ownership environments.

Compensation

Top-percentile compensation and real ownership.

This is an early role with a large surface area. Compensation is designed for people who can own infrastructure the company depends on, not complete a queue of tickets.

Apply

Tell us what you have built.

Send the shortest note that proves you understand the work. Include a system, harness, infrastructure surface, or debugging story you owned end to end.

Apply for this role →

This opens an email. Send a short note with something you owned end to end.