We are releasing SYNTHETIC-2, an open dataset of four million verified reasoning traces spanning the most comprehensive set of complex reinforcement learning tasks and verifiers released to date. The dataset was collaboratively generated by compute contributors across the globe via our pipeline-parallel decentralized inference. Over 1,250 GPUs joined in 3 days — from 4090s to H200s — creating data for complex RL tasks.
Frontier‑size models such as DeepSeek‑R1‑0528 no longer fit on a single node. In SYNTHETIC‑2 we therefore shard the model across permissionless workers with pipeline parallelism. Each device stores only one stage, streams activations to the next peer. The Prime Intellect protocol groups GPUs with similar throughput and geography on‑the‑fly to maximize utilization.
TOPLOC v2 ensured honest computation of the decentralized inference workers in this run by generating verifiable proofs of computation. For SYNTHETIC-2, we extend our locality-sensitive hashing proofs with three key enhancements:
During the 4M-sample SYNTHETIC-2 run, the false positive rate of TOPLOC was just 0.000925 % (37 slashes). The median proof verification cost was on average across all models 25x cheaper than re‑doing the original inference.
Our infrastructure orchestrates GPU nodes globally through a peer-to-peer network. Each node joins after validation, sends heartbeats with system metrics, and receives task assignments. The system tracks all active nodes and their work submissions on a decentralized ledger, ensuring transparency and accountability. Our orchestrator API enables seamless deployment of diverse workloads across the global compute fabric.
Tasks are matched to node groups based on compute requirements—from consumer GPUs to multi-node setups with hundreds of GBs of GPU memory. When forming multi-node groups, the system optimizes for geographic proximity to minimize latency between collaborating nodes. The system automatically handles node failures by dissolving affected groups and rescheduling work, while an opportunistic scheduling algorithm continuously optimizes resource utilization by identifying opportunities to merge compatible nodes into more efficient configurations. For example, the following POST request deploys DeepSeek-R1-0528 inference across two topologies: 1×1128GB
(a single 8×H200 node) and 2×640GB
(pipeline-parallel inference across two 8×A/H100 nodes):
{
"name": "DeepSeek-R1-0528:R1-SFT",
"image": "primeintellect/prime-rl:commit-df75e4c",
"env_vars": {
"HF_HUB_CACHE": "/shared/hf_hub",
"HF_HUB_ETAG_TIMEOUT": "500"
},
"cmd": [
"@configs/inference/synthetic-2/base.toml",
"@configs/inference/synthetic-2/deepseek-r1-0528.toml",
"--data.name",
"PrimeIntellect/SYNTHETIC-2-Base-R1-SFT",
"--parallel.pp.rank",
"${GROUP_INDEX}",
"--parallel.pp.world-size",
"${GROUP_SIZE}",
"--parallel.pp.iroh-seed",
"${WORKER_P2P_SEED}",
"--parallel.pp.iroh-peer-id",
"${NEXT_P2P_ADDRESS}",
"--group-id",
"${GROUP_ID}",
"--task-id",
"${TASK_ID}",
],
"metadata": {
"labels": {
"model": "DeepSeek-R1-0528",
"data": "R1-SFT"
}
},
"scheduling_config": {
"plugins": {
"node_groups": {
"allowed_topologies": ["1x1128GB"]
}
}
},
"storage_config": {
"file_name_template": "deepseek-ai/DeepSeek-R1-0528/PrimeIntellect/SYNTHETIC-2-Base-R1-SFT/1-${NODE_GROUP_ID}-${NODE_GROUP_SIZE}-${CURRENT_FILE_INDEX}-${NODE_GROUP_INDEX}.parquet"
},
"volume_mounts": [
{
"host_path": "/group-${GROUP_ID}-state",
"container_path": "/state"
}
]
}
The orchestrator handles everything from Docker container management to secure P2P connections (via mTLS), while our validation framework ensures computational integrity through continuous monitoring and specialized TOPLOC servers for model-specific verification.
A total of 1,253 GPUs from around the world participated in the run, including 49 nodes with 8xH200 GPUs, 43 nodes with 8xH100 GPUs, and numerous consumer 3090 and 4090 GPUs.
To generate SYNTHETIC-2, we've collected a diverse set of challenging reasoning tasks spanning math, coding as well as non-traditional and previously underrepresented reasoning tasks such as puzzles or problems testing precise instruction-following abilities. These tasks are collected from public datasets such as Skywork Math, open-source libraries such as reasoning-gym and largely generated from internal research.
Concretely, we propose the following new verifiable reasoning tasks:
The full distribution of tasks is shown below.
SYNTHETIC-2 is meant to provide both high-quality SFT and RL data. Hence, we split our tasks into two subsets:
Our SFT subset contains a smaller set of more difficult tasks. For these tasks, we generate responses from Deepseek-R1-0528, the best open reasoning model available, enabling developers to distill its reasoning capabilities into smaller models
Our RL subset contains all of our tasks that can be verified using prime-rl. As difficulty filtering has proven to be crucial for RL training performance, we produce difficulty annotations for all tasks by computing pass rates of three smaller models, Qwen3-32B, Qwen3-4B and DeepSeek-R1-0528-Qwen3-8B, for each task.
We release the following final dataset splits on Huggingface:
Building on the release of SYNTHETIC-2, our next step is to leverage the SYNTHETIC-2 tasks as well as dataset as the foundation for our next distributed RL run. INTELLECT-2 has already shown that globally distributed reinforcement learning works—now it’s time to demonstrate its promise as a novel scaling paradigm, unlocking even more compute and achieving state-of-the-art model performance. Since the INTELLECT-2 release, we’ve made significant improvements to the stability of asynchronous RL at large scale and are confident these improvements will lead to state-of-the-art reasoning models trained in a decentralized fashion.
To expand the diversity of our RL environment’s ecosystem in prime-rl, we integrated the verifiers repository as our core library for crowdsourcing complex RL environments from the open-source community. More details on this soon!
Our goal is to introduce additional multi-turn and tool-use environments—especially for coding and autonomous research tasks—to unlock SOTA coding-agent capabilities with our INTELLECT-3 model.