Environmentcuda_kernel_fused_softmax

Training

Reward0.94

Loss0.031

Rollouts12.4k

Epoch34/50

The Gym Where AI Models Get Stronger

RL training and evaluation infrastructure for coding agents. Built-ins, imported benchmarks, SWE-bench Pro, Terminal-Bench 2.0, mixed-environment reward routing, and thin DAPO/TRL/verl integrations in one stack.

pip install deepgymv0.3.0 live

View on GitHub PyPI Talk to Founders

Train stronger models.

Agent Connected

Train stronger models.

Loading...

The Problem

Most teams build this from scratch

Code model teams often spend months building sandbox infrastructure, test harnesses, and evaluation pipelines. Then they maintain it indefinitely. We handle the infra so you can focus on training.

deepgym dashboard

Episodes/day

—

Latency

low

Parallel

scalable

Isolation

full

Sandbox Setup

Weeks of Docker config, security hardening, and resource management before you can run a single episode.

Test Harness Maintenance

Test suites drift, edge cases multiply, and reward signals degrade. Keeping evaluation reliable is a full-time job.

Scaling Bottleneck

Running millions of parallel episodes needs orchestration, monitoring, and failover. Most teams cap at hundreds.

Scaling Bottleneck

Running millions of parallel episodes needs orchestration, monitoring, and failover. Most teams cap at hundreds.

Product

Three ways to use DeepGym

Training loops, evaluation benchmarks, and community sharing.

Train

Run RL training loops with verifiable rewards. Thin adapters for DAPO, TRL, verl, and OpenRLHF, plus per-test-case and shaped reward breakdowns.

from deepgym.integrations.dapo import make_dapo_reward_fn

reward_fn = make_dapo_reward_fn(env)

scores = reward_fn(completions=batch)

Evaluate

Benchmark against built-ins plus imported HumanEval, MBPP, BigCodeBench, EvalPlus, SWE-bench Pro, and Terminal-Bench 2.0 tasks through one API.

$ deepgym run swebench_pro

✓ patch applies cleanly

✓ fail_to_pass tests fixed

→ pass_fraction: 0.83

same reward API for terminal + coding tasks

Push environments and results to HuggingFace Hub. Load community environments. Publish leaderboard datasets.

push_environment_to_hub(env, "org/coin-change")

✓ pushed to HuggingFace Hub

push_results_to_hub(results)

✓ leaderboard updated

Integrations

Plugs into your existing stack

First-class integrations with the frameworks you already use. One-line setup for training, evaluation, and sharing.

TRL / GRPOTrainer

Drop-in reward function for HuggingFace TRL. One line to add verifiable code execution rewards to your GRPO training loop.

reward_fn = make_trl_reward_fn(env)

DAPO

Thin DAPO reward and config helpers for verl-style recipes without reimplementing the trainer layer.

reward_fn = make_dapo_reward_fn(env)

verl (ByteDance)

Compatible compute_score function that plugs into verl training pipelines.

score_fn = make_verl_compute_score()

OpenRLHF

FastAPI reward server endpoint. Deploy as a sidecar and point OpenRLHF at it.

router = create_openrlhf_router()

HuggingFace Hub

Push and pull environments as HF datasets. Share evaluation results as leaderboard datasets.

push_environment_to_hub(env, repo_id)

lm-eval Harness

lm_eval --tasks deepgym_coin_change

Gymnasium API

Standard Gymnasium-compatible interface. reset(), step(), render(). Works with any RL framework that speaks Gym.

obs, reward, done, info = env.step(action)

Capabilities

Everything you need to train and eval code models

From sandbox execution to adversarial testing, multi-turn agents to community sharing.

Sandboxed Execution

Every environment runs real code in Daytona containers. Full OS-level isolation with network restrictions and resource limits. Auto-fallback to local mode for development.

mode: daytona | local | auto

isolation: full

network: restricted

escape: blocked

Adversarial Testing

Built-in reward hack detection with 5+ attack strategies. Probes for empty solutions, hardcoded results, and pattern exploits. RL-based exploit discovery finds novel attacks.

$ deepgym adversarial coin_change

✓ empty solution: blocked

✓ hardcoded output: blocked

✓ pattern exploit: blocked

△ 1 edge case found

Multi-Turn Agents

Step-by-step agent interaction with intermediate rewards. Record full trajectories. Safe mode restricts execution to Python only.

runner = MultiTurnRunner()

trajectory, result = runner.run(env, agent)

→ 4 steps, score: 0.92

Computer-Use & Tool-Use

Beyond code: browser interaction, screenshot verification, file system tasks, API requests, and data pipelines. Full GUI agent support.

Environment types

codingcomputer-usetool-use

screenshotclicktypescrollbash

Per-Test-Case Rewards

Fine-grained reward signals with per-case scoring, input summaries, and error traces. Shape rewards beyond binary pass/fail.

cases:

test_0: 1.0 coins=[1,2,5]

test_1: 1.0 coins=[2]

test_2: 0.0 coins=[]

score: 0.67

Rich Environment Library

24 built-in environments across core families, plus imported HumanEval and MBPP tasks, repo-level SWE-bench Pro patches, and Terminal-Bench 2.0 shell workflows.

Built-in envs24

Importable2,350+

Repo / terminalnative

HumanEvalMBPPSWE-bench ProTerminal-Bench 2.0

CLI & Web UI

Full CLI for running, evaluating, and creating environments. Browser-based debugging UI for interactive testing with real-time feedback.

$ deepgym run coin_change

$ deepgym eval medium

$ deepgym web --port 8080

$ deepgym serve --host 0.0.0.0

FastAPI Server

REST API with OpenAPI docs. Run single episodes, batch scoring, and full evaluation suites. API key authentication for production.

POST /v1/run

POST /v1/run-batch

POST /v1/eval

GET /v1/environments

Async & Batch

AsyncDeepGym with semaphore-based concurrency, strict per-sample routing, and mixed benchmark batches for smarter training runs across task types.

mixed = MixedEnvironment([...])

batch = dg.run_batch(

mixed, completions, environment_name=[...]

)

Train stronger models.

Start Training with Verifiable Rewards

Sandboxed code execution, benchmark-backed repo and terminal tasks, and reward signals that plug into DAPO, TRL, verl, and OpenRLHF.

pip install deepgymv0.3.0

View on GitHub PyPI Talk to Founders