SYNTHETIC-2 Release: Four Million Collaboratively Generated Reasoning Traces
We are releasing SYNTHETIC-2, an open dataset of four million verified reasoning traces spanning the most comprehensive set of complex reinforcement learning tasks and verifiers released to date. The dataset was collaboratively generated by compute contributors across the globe via our pipeline-parallel decentralized inference. Over 1,250 GPUs joined in 3 days — from 4090s to H200s — creating data for complex RL tasks.
Planetary-Scale Inference
Frontier‑size models such as DeepSeek‑R1‑0528 no longer fit on a single node. In SYNTHETIC‑2 we therefore shard the model across permissionless workers with pipeline parallelism. Each device stores only one stage, streams activations to the next peer. The Prime Intellect protocol groups GPUs with similar throughput and geography on‑the‑fly to maximize utilization.

TOPLOC
TOPLOC v2 ensured honest computation of the decentralized inference workers in this run by generating verifiable proofs of computation. For SYNTHETIC-2, we extend our locality-sensitive hashing proofs with three key enhancements:
- Group‑level accept/reject: Success of the final pipeline stage implies the integrity of all stages.
- Stage‑by‑stage replay on failure: Enables precise identification and slashing of the first faulty node that caused the loss of integrity.
- Reproducible Gumbel sampling proofs: Detects tampering with the sampling logic (paper coming soon).
During the 4M-sample SYNTHETIC-2 run, the false positive rate of TOPLOC was just 0.000925 % (37 slashes). The median proof verification cost was on average across all models 25x cheaper than re‑doing the original inference.
Protocol
Our infrastructure orchestrates GPU nodes globally through a peer-to-peer network. Each node joins after validation, sends heartbeats with system metrics, and receives task assignments. The system tracks all active nodes and their work submissions on a decentralized ledger, ensuring transparency and accountability. Our orchestrator API enables seamless deployment of diverse workloads across the global compute fabric.
Tasks are matched to node groups based on compute requirements—from consumer GPUs to multi-node setups with hundreds of GBs of GPU memory. When forming multi-node groups, the system optimizes for geographic proximity to minimize latency between collaborating nodes. The system automatically handles node failures by dissolving affected groups and rescheduling work, while an opportunistic scheduling algorithm continuously optimizes resource utilization by identifying opportunities to merge compatible nodes into more efficient configurations. For example, the following POST request deploys DeepSeek-R1-0528 inference across two topologies: 1×1128GB (a single 8×H200 node) and 2×640GB (pipeline-parallel inference across two 8×A/H100 nodes):
{
"name": "DeepSeek-R1-0528:R1-SFT",
"image": "primeintellect/prime-rl:commit-df75e4c",
"env_vars": {
"HF_HUB_CACHE": "/shared/hf_hub",
"HF_HUB_ETAG_TIMEOUT": "500"
},
"cmd": [
"@configs/inference/synthetic-2/base.toml",
"@configs/inference/synthetic-2/deepseek-r1-0528.toml",
"--data.name",
"PrimeIntellect/SYNTHETIC-2-Base-R1-SFT",
"--parallel.pp.rank",
"${GROUP_INDEX}",
"--parallel.pp.world-size",
"${GROUP_SIZE}",
"--parallel.pp.iroh-seed",
"${WORKER_P2P_SEED}",
"--parallel.pp.iroh-peer-id",
"${NEXT_P2P_ADDRESS}",
"--group-id",
"${GROUP_ID}",
"--task-id",
"${TASK_ID}",
],
"metadata": {
"labels": {
"model": "DeepSeek-R1-0528",
"data": "R1-SFT"
}
},
"scheduling_config": {
"plugins": {
"node_groups": {
"allowed_topologies": ["1x1128GB"]
}
}
},
"storage_config": {
"file_name_template": "deepseek-ai/DeepSeek-R1-0528/PrimeIntellect/SYNTHETIC-2-Base-R1-SFT/1-${NODE_GROUP_ID}-${NODE_GROUP_SIZE}-${CURRENT_FILE_INDEX}-${NODE_GROUP_INDEX}.parquet"
},
"volume_mounts": [
{
"host_path": "/group-${GROUP_ID}-state",
"container_path": "/state"
}
]
}
The orchestrator handles everything from Docker container management to secure P2P connections (via mTLS), while our validation framework ensures computational integrity through continuous monitoring and specialized TOPLOC servers for model-specific verification.
Compute Contributions
A total of 1,253 GPUs from around the world participated in the run, including 49 nodes with 8xH200 GPUs, 43 nodes with 8xH100 GPUs, and numerous consumer 3090 and 4090 GPUs.

SYNTHETIC-2 Dataset
To generate SYNTHETIC-2, we've collected a diverse set of challenging reasoning tasks spanning math, coding as well as non-traditional and previously underrepresented reasoning tasks such as puzzles or problems testing precise instruction-following abilities. These tasks are collected from public datasets such as Skywork Math, open-source libraries such as reasoning-gym and largely generated from internal research.
Concretely, we propose the following new verifiable reasoning tasks:
- Code Output Prediction (v2): This task asks an LLM to predict the output of a complex piece of LLM-generated code. Contrary to v1 from SYNTHETIC-1, v2 includes real world libraries and code that more accurately mimicks real world use cases
- Pydantic Adherance: We ask an LLM to generate a JSON object that adheres to a complex (LLM-generated) pydantic model
- Complex JSON formatting: This task tests an LLM’s ability to adhere to complex JSON formatting instructions. Its prompts contain a few simple reasoning problems, with complex instructions for how the results should be presented in a JSON format
- Sentence Unscrambling: Requires the model to rearrange randomly ordered text blocks into their original chronological/logical order while maintaining block numbering.
- ASCII Tree formatting: Evaluates ASCII tree structure generation (for file system directories) by comparing generated output against ground truth tree representations.
- Formatask: Designed to help with precise extraction of exact portions from natural text sequences with specific formatting requirements, based on natural descriptions of the particular section that is to be extracted ("return just the part where it mentions X topic…")
The full distribution of tasks is shown below.


SYNTHETIC-2 is meant to provide both high-quality SFT and RL data. Hence, we split our tasks into two subsets:
Our SFT subset contains a smaller set of more difficult tasks. For these tasks, we generate responses from Deepseek-R1-0528, the best open reasoning model available, enabling developers to distill its reasoning capabilities into smaller models
Our RL subset contains all of our tasks that can be verified using prime-rl. As difficulty filtering has proven to be crucial for RL training performance, we produce difficulty annotations for all tasks by computing pass rates of three smaller models, Qwen3-32B, Qwen3-4B and DeepSeek-R1-0528-Qwen3-8B, for each task.

We release the following final dataset splits on Huggingface:
- SYNTHETIC-2: The full SYNTHETIC-2 dataset consisting of all prompts and completions along with rewards
- SYNTHETIC-2-SFT-verified: The SFT split of SYNTHETIC-2 with responses from Deepseek-R1-0528 verified as correct (rewards of 1 for binary rewards and over 0.7 for non-binary rewards)
- SYNTHETIC-2-SFT-unverified: The SFT split of SYNTHETIC-2 with all responses, including those not verified as correct
- SYNTHETIC-2-RL: The RL subset of SYNTHETIC-2 with difficulty annotations from Qwen3-32B, Qwen3-4B and DeepSeek-R1-0528-Qwen3-8B
Next Steps
Building on the release of SYNTHETIC-2, our next step is to leverage the SYNTHETIC-2 tasks as well as dataset as the foundation for our next distributed RL run. INTELLECT-2 has already shown that globally distributed reinforcement learning works—now it’s time to demonstrate its promise as a novel scaling paradigm, unlocking even more compute and achieving state-of-the-art model performance. Since the INTELLECT-2 release, we’ve made significant improvements to the stability of asynchronous RL at large scale and are confident these improvements will lead to state-of-the-art reasoning models trained in a decentralized fashion.
To expand the diversity of our RL environment’s ecosystem in prime-rl, we integrated the verifiers repository as our core library for crowdsourcing complex RL environments from the open-source community. More details on this soon!
Our goal is to introduce additional multi-turn and tool-use environments—especially for coding and autonomous research tasks—to unlock SOTA coding-agent capabilities with our INTELLECT-3 model.



.png)

