SYNTHETIC-2 Release: Four Million Collaboratively Generated Reasoning Traces

H2 Essay Title

Authors

Mika

Announcements

July 10, 2025

SYNTHETIC-2 Release: Four Million Collaboratively Generated Reasoning Traces

We are releasing SYNTHETIC-2, an open dataset of four million verified reasoning traces spanning the most comprehensive set of complex reinforcement learning tasks and verifiers released to date. The dataset was collaboratively generated by compute contributors across the globe via our pipeline-parallel decentralized inference. Over 1,250 GPUs joined in 3 days — from 4090s to H200s — creating data for complex RL tasks.

SYNTHETIC-2 Collection on Hugging Face

Planetary-Scale Inference

Frontier‑size models such as DeepSeek‑R1‑0528 no longer fit on a single node. In SYNTHETIC‑2 we therefore shard the model across permissionless workers with pipeline parallelism. Each device stores only one stage, streams activations to the next peer. The Prime Intellect protocol groups GPUs with similar throughput and geography on‑the‑fly to maximize utilization.

TOPLOC

TOPLOC v2 ensured honest computation of the decentralized inference workers in this run by generating verifiable proofs of computation. For SYNTHETIC-2, we extend our locality-sensitive hashing proofs with three key enhancements:

Group‑level accept/reject: Success of the final pipeline stage implies the integrity of all stages.
Stage‑by‑stage replay on failure: Enables precise identification and slashing of the first faulty node that caused the loss of integrity.
Reproducible Gumbel sampling proofs: Detects tampering with the sampling logic (paper coming soon).

During the 4M-sample SYNTHETIC-2 run, the false positive rate of TOPLOC was just 0.000925 % (37 slashes). The median proof verification cost was on average across all models 25x cheaper than re‑doing the original inference.

Protocol

Our infrastructure orchestrates GPU nodes globally through a peer-to-peer network. Each node joins after validation, sends heartbeats with system metrics, and receives task assignments. The system tracks all active nodes and their work submissions on a decentralized ledger, ensuring transparency and accountability. Our orchestrator API enables seamless deployment of diverse workloads across the global compute fabric.

Tasks are matched to node groups based on compute requirements—from consumer GPUs to multi-node setups with hundreds of GBs of GPU memory. When forming multi-node groups, the system optimizes for geographic proximity to minimize latency between collaborating nodes. The system automatically handles node failures by dissolving affected groups and rescheduling work, while an opportunistic scheduling algorithm continuously optimizes resource utilization by identifying opportunities to merge compatible nodes into more efficient configurations. For example, the following POST request deploys DeepSeek-R1-0528 inference across two topologies: 1×1128GB (a single 8×H200 node) and 2×640GB (pipeline-parallel inference across two 8×A/H100 nodes):

{ "name": "DeepSeek-R1-0528:R1-SFT", "image": "primeintellect/prime-rl:commit-df75e4c", "env_vars": { "HF_HUB_CACHE": "/shared/hf_hub", "HF_HUB_ETAG_TIMEOUT": "500" }, "cmd": [ "@configs/inference/synthetic-2/base.toml", "@configs/inference/synthetic-2/deepseek-r1-0528.toml", "--data.name", "PrimeIntellect/SYNTHETIC-2-Base-R1-SFT", "--parallel.pp.rank", "${GROUP_INDEX}", "--parallel.pp.world-size", "${GROUP_SIZE}", "--parallel.pp.iroh-seed", "${WORKER_P2P_SEED}", "--parallel.pp.iroh-peer-id", "${NEXT_P2P_ADDRESS}", "--group-id", "${GROUP_ID}", "--task-id", "${TASK_ID}", ], "metadata": { "labels": { "model": "DeepSeek-R1-0528", "data": "R1-SFT" } }, "scheduling_config": { "plugins": { "node_groups": { "allowed_topologies": ["1x1128GB"] } } }, "storage_config": { "file_name_template": "deepseek-ai/DeepSeek-R1-0528/PrimeIntellect/SYNTHETIC-2-Base-R1-SFT/1-${NODE_GROUP_ID}-${NODE_GROUP_SIZE}-${CURRENT_FILE_INDEX}-${NODE_GROUP_INDEX}.parquet" }, "volume_mounts": [ { "host_path": "/group-${GROUP_ID}-state", "container_path": "/state" } ] }

The orchestrator handles everything from Docker container management to secure P2P connections (via mTLS), while our validation framework ensures computational integrity through continuous monitoring and specialized TOPLOC servers for model-specific verification.

Compute Contributions

A total of 1,253 GPUs from around the world participated in the run, including 49 nodes with 8xH200 GPUs, 43 nodes with 8xH100 GPUs, and numerous consumer 3090 and 4090 GPUs.

‍

SYNTHETIC-2 Dataset

To generate SYNTHETIC-2, we've collected a diverse set of challenging reasoning tasks spanning math, coding as well as non-traditional and previously underrepresented reasoning tasks such as puzzles or problems testing precise instruction-following abilities. These tasks are collected from public datasets such as Skywork Math, open-source libraries such as reasoning-gym and largely generated from internal research.

Concretely, we propose the following new verifiable reasoning tasks:

Code Output Prediction (v2): This task asks an LLM to predict the output of a complex piece of LLM-generated code. Contrary to v1 from SYNTHETIC-1, v2 includes real world libraries and code that more accurately mimicks real world use cases
Pydantic Adherance: We ask an LLM to generate a JSON object that adheres to a complex (LLM-generated) pydantic model
Complex JSON formatting: This task tests an LLM’s ability to adhere to complex JSON formatting instructions. Its prompts contain a few simple reasoning problems, with complex instructions for how the results should be presented in a JSON format
Sentence Unscrambling: Requires the model to rearrange randomly ordered text blocks into their original chronological/logical order while maintaining block numbering.
ASCII Tree formatting: Evaluates ASCII tree structure generation (for file system directories) by comparing generated output against ground truth tree representations.
Formatask: Designed to help with precise extraction of exact portions from natural text sequences with specific formatting requirements, based on natural descriptions of the particular section that is to be extracted ("return just the part where it mentions X topic…")

The full distribution of tasks is shown below.

‍

SYNTHETIC-2 is meant to provide both high-quality SFT and RL data. Hence, we split our tasks into two subsets:

Our SFT subset contains a smaller set of more difficult tasks. For these tasks, we generate responses from Deepseek-R1-0528, the best open reasoning model available, enabling developers to distill its reasoning capabilities into smaller models

Our RL subset contains all of our tasks that can be verified using prime-rl. As difficulty filtering has proven to be crucial for RL training performance, we produce difficulty annotations for all tasks by computing pass rates of three smaller models, Qwen3-32B, Qwen3-4B and DeepSeek-R1-0528-Qwen3-8B, for each task.

We release the following final dataset splits on Huggingface:

SYNTHETIC-2: The full SYNTHETIC-2 dataset consisting of all prompts and completions along with rewards
SYNTHETIC-2-SFT-verified: The SFT split of SYNTHETIC-2 with responses from Deepseek-R1-0528 verified as correct (rewards of 1 for binary rewards and over 0.7 for non-binary rewards)
SYNTHETIC-2-SFT-unverified: The SFT split of SYNTHETIC-2 with all responses, including those not verified as correct
SYNTHETIC-2-RL: The RL subset of SYNTHETIC-2 with difficulty annotations from Qwen3-32B, Qwen3-4B and DeepSeek-R1-0528-Qwen3-8B

Next Steps

Building on the release of SYNTHETIC-2, our next step is to leverage the SYNTHETIC-2 tasks as well as dataset as the foundation for our next distributed RL run. INTELLECT-2 has already shown that globally distributed reinforcement learning works—now it’s time to demonstrate its promise as a novel scaling paradigm, unlocking even more compute and achieving state-of-the-art model performance. Since the INTELLECT-2 release, we’ve made significant improvements to the stability of asynchronous RL at large scale and are confident these improvements will lead to state-of-the-art reasoning models trained in a decentralized fashion.

To expand the diversity of our RL environment’s ecosystem in prime-rl, we integrated the verifiers repository as our core library for crowdsourcing complex RL environments from the open-source community. More details on this soon!

Our goal is to introduce additional multi-turn and tool-use environments—especially for coding and autonomous research tasks—to unlock SOTA coding-agent capabilities with our INTELLECT-3 model.

‍

Eurus-2-PRIME

System Prompt

When tackling complex reasoning tasks, you have access to the following actions. Use them as needed to progress through your thought process. [ASSESS] [ADVANCE] [VERIFY] [SIMPLIFY] [SYNTHESIZE] [PIVOT] [OUTPUT] You should strictly follow the format below: [ACTION NAME] # Your action step 1 # Your action step 2 # Your action step 3 ... Next action: [NEXT ACTION NAME]

User

Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$

Eurus-2-PRIME

[ASSESS]

We are given two quadratic polynomials $P(x)$ and $Q(x)$ with leading coefficients $2$ and $-2$, respectively. The graphs of both polynomials pass through the points $(16,54)$ and $(20,53)$. We need to find the value of $P(0) + Q(0)$.

[ADVANCE]

Let's express the quadratic polynomials $P(x)$ and $Q(x)$ in the general form:
$ P(x) = 2x^2 + bx + c $
$ Q(x) = -2x^2 + dx + e $

Since the graphs of both polynomials pass through the points $(16,54)$ and $(20,53)$, we can substitute these points into the equations of $P(x)$ and $Q(x)$ to form a system of equations.

For $P(x)$:
$ P(16) = 2(16)^2 + 16b + c = 54 $
$ 512 + 16b + c = 54 $
$ 16b + c = -458 \quad \text{(Equation 1)} $

$ P(20) = 2(20)^2 + 20b + c = 53 $
$ 800 + 20b + c = 53 $
$ 20b + c = -747 \quad \text{(Equation 2)} $

For $Q(x)$:
$ Q(16) = -2(16)^2 + 16d + e = 54 $
$ -512 + 16d + e = 54 $
$ 16d + e = 566 \quad \text{(Equation 3)} $

$ Q(20) = -2(20)^2 + 20d + e = 53 $
$ -800 + 20d + e = 53 $
$ 20d + e = 853 \quad \text{(Equation 4)} $

[ADVANCE]

Now, we can solve the system of equations for $P(x)$:
Subtract Equation 1 from Equation 2:
$ (20b + c) - (16b + c) = -747 - (-458) $
$ 4b = -289 $
$ b = -\frac{289}{4} $

Substitute $b = -\frac{289}{4}$ into Equation 1:
$ 16\left(-\frac{289}{4}\right) + c = -458 $
$ -1156 + c = -458 $
$ c = 698 $

So, for $P(x)$, we have:
$ P(x) = 2x^2 - \frac{289}{4}x + 698 $
$ P(0) = 698 $

Next, we can solve the system of equations for $Q(x)$:
Subtract Equation 3 from Equation 4:
$ (20d + e) - (16d + e) = 853 - 566 $
$ 4d = 287 $
$ d = \frac{287}{4} $

Substitute $d = \frac{287}{4}$ into Equation 3:
$ 16\left(\frac{287}{4}\right) + e = 566 $
$ 1148 + e = 566 $
$ e = -582 $

So, for $Q(x)$, we have:
$ Q(x) = -2x^2 + \frac{287}{4}x - 582 $
$ Q(0) = -582 $

[ADVANCE]

Finally, we can find $P(0) + Q(0)$:
$ P(0) + Q(0) = 698 + (-582) = 116 $

So, the value of $P(0) + Q(0)$ is $\boxed{116}$.

INTELLECT-MATH-7B-SFT (ours)

System Prompt

Solve the following math problem efficiently and clearly. Think carefully and step by step about your response and reason before providing a final response. Conclude your response with: \n\nTherefore, the final answer is: $\\boxed{answer}$. I hope it is correct.\n\nWhere [answer] is just the final number or expression that solves the problem.

User

Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$

INTELLECT-MATH-7B-SFT

So I've got this problem here about quadratic polynomials $P(x)$ and $Q(x)$. $P(x)$ has a leading coefficient of $2$, and $Q(x)$ has a leading coefficient of $-2$. Both polynomials pass through the points $(16,54)$ and $(20,53)$. I need to find $P(0) + Q(0)$.

First, since both $P(x)$ and $Q(x)$ are quadratic polynomials, I can express them in the general form:

$P(x) = 2x^2 + bx + c$

$Q(x) = -2x^2 + dx + e$

Wait, actually, the leading coefficient is the coefficient of the highest power of $x$, which for a quadratic is $x^2$. So yes, $P(x) = 2x^2 + bx + c$ and $Q(x) = -2x^2 + dx + e$.

Now, both polynomials pass through the points $(16,54)$ and $(20,53)$. That means when $x=16$, $P(16)=54$, and when $x=20$, $P(20)=53$. Similarly for $Q(x)$.

So, I can set up equations using these points.

First, for $P(x)$:

$P(16) = 2(16)^2 + b16 + c = 54$

Similarly, $P(20) = 2(20)^2 + b20 + c = 53$

Let me calculate $16^2$ and $20^2$:

$16^2 = 256$

$20^2 = 400$

So,

$P(16): 2256 + 16b + c = 54 \Rightarrow 512 + 16b + c = 54 \Rightarrow 16b + c = 54 - 512 \Rightarrow 16b + c = -458$

$P(20): 2400 + 20b + c = 53 \Rightarrow 800 + 20b + c = 53 \Rightarrow 20b + c = 53 - 800 \Rightarrow 20b + c = -747$

Now, I have two equations:

1) $16b + c = -458$

2) $20b + c = -747$

I can subtract equation 1 from equation 2 to eliminate $c$:

$(20b + c) - (16b + c) = -747 - (-458)$

$20b + c - 16b - c = -747 + 458$

$4b = -289$

So, $b = -289 / 4 = -72.25$

Wait, but coefficients are usually integers, but maybe not. Let me check.

Now, plug $b$ back into one of the equations to find $c$. Let's use equation 1:

$16(-72.25) + c = -458$

$-1156 + c = -458$

$c = -458 + 1156 = 698$

So, $P(x) = 2x^2 - 72.25x + 698$

Hmm, $P(0)$ would be $c$, which is $698$.

Now, let's do the same for $Q(x)$.

$Q(16) = -2*(16)^2 + d16 + e = 54 \Rightarrow -2256 + 16d + e = 54 \Rightarrow -512 + 16d + e = 54 \Rightarrow 16d + e = 54 + 512 \Rightarrow 16d + e = 566$

$Q(20) = -2*(20)^2 + d20 + e = 53 \Rightarrow -2400 + 20d + e = 53 \Rightarrow -800 + 20d + e = 53 \Rightarrow 20d + e = 53 + 800 \Rightarrow 20d + e = 853$

Now, equations for $Q(x)$:

1) $16d + e = 566$

2) $20d + e = 853$

Subtract equation 1 from equation 2:

$(20d + e) - (16d + e) = 853 - 566$

$4d = 287$

$d = 287 / 4 = 71.75$

Now, plug $d$ back into equation 1:

$1671.75 + e = 566 \Rightarrow 1148 + e = 566 \Rightarrow e = 566 - 1148 = -582$

So, $Q(x) = -2x^2 + 71.75x - 582$

Therefore, $Q(0) = e = -582$

Now, $P(0) + Q(0) = 698 + (-582) = 116$

Wait, but the problem says "present the answer in LaTeX format: \boxed{Your answer}"

So, I should write $\boxed{116}$

But let me double-check my calculations to make sure I didn't make any mistakes.

First, for $P(x)$:

$P(16) = 2(256) + 16b + c = 512 + 16b + c = 54 \Rightarrow 16b + c = -458$

$P(20) = 2*(400) + 20b + c = 800 + 20b + c = 53 \Rightarrow 20b + c = -747$

Subtracting: $4b = -289 \Rightarrow b = -72.25$

Then, $c = -458 - 16*(-72.25) = -458 + 1156 = 698$

Similarly for $Q(x)$:

$Q(16) = -2*(256) + 16d + e = -512 + 16d + e = 54 \Rightarrow 16d + e = 566$

$Q(20) = -2*(400) + 20d + e = -800 + 20d + e = 53 \Rightarrow 20d + e = 853$

Subtracting: $4d = 287 \Rightarrow d = 71.75$

Then, $e = 566 - 16*71.75 = 566 - 1148 = -582$

So, $P(0) = 698$, $Q(0) = -582$, sum is $116$.

Seems correct.

Final Answer

$\boxed{116}$

INTELLECT-MATH-7B (ours)

System Prompt

User

Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$

INTELLECT-MATH-7B

So I have two quadratic polynomials, $P(x)$ and $Q(x)$, with leading coefficients 2 and -2, respectively. Both pass through the points $(16,54)$ and $(20,53)$. I need to find $P(0) + Q(0)$.

First, since both are quadratic polynomials, I can express them in the general form:

$P(x) = 2x² + bx + c$

$Q(x) = -2x² + dx + e$

But since both pass through the points $(16,54)$ and $(20,53)$, I can set up equations using these points.

For $P(x)$:

$P(16) = 2*(16)² + b \cdot 16 + c = 54$

$P(20) = 2(20)² + b \cdot 20 + c = 53$

Similarly for $Q(x)$:

$Q(16) = -2(16)² + d \cdot 16 + e = 54$

$Q(20) = -2(20)² + d \cdot 20 + e = 53$

Let's compute the squares:

$16² = 256$

$20² = 400$

So for $P(x)$:

$2256 + 16b + c = 54 \Rightarrow 512 + 16b + c = 54 \Rightarrow 16b + c = 54 - 512 = -458 \quad ...(1)$

$2400 + 20b + c = 53 \Rightarrow 800 + 20b + c = 53 \Rightarrow 20b + c = 53 - 800 = -747 \quad ...(2)$

For $Q(x)$:

$-2256 + 16d + e = 54 \Rightarrow -512 + 16d + e = 54 \Rightarrow 16d + e = 54 + 512 = 566 \quad ...(3)$

$-2400 + 20d + e = 53 \Rightarrow -800 + 20d + e = 53 \Rightarrow 20d + e = 53 + 800 = 853 \quad ...(4)$

Now, I have two systems of linear equations:

For $P(x)$:

$16b + c = -458 \quad ...(1)$

$20b + c = -747 \quad ...(2)$

Subtract equation (1) from equation (2):

$(20b + c) - (16b + c) = -747 - (-458)$

$4b = -289 \Rightarrow b = -\frac{289}{4}$

Now plug $b$ back into equation (1):

$16\left(-\frac{289}{4}\right) + c = -458 \Rightarrow -1156 + c = -458 \Rightarrow c = -458 + 1156 = 698$

So $P(x) = 2x² - \frac{289}{4}x + 698$

For $Q(x)$:

$16d + e = 566 \quad ...(3)$

$20d + e = 853 \quad ...(4)$

Subtract equation (3) from equation (4):

$(20d + e) - (16d + e) = 853 - 566$

$4d = 287 \Rightarrow d = \frac{287}{4}$

Now plug $d$ back into equation (3):

$16\left(\frac{287}{4}\right) + e = 566 \Rightarrow 1148 + e = 566 \Rightarrow e = 566 -1148= -582$

So $Q(x)=-2x^{2}+\frac{287}{4}x-582$.

Now, $P(0)=c=698$.

$Q(0)=e=-582$.

Therefore, $P(0)+Q(0)=698+(-582)=116$.

Final Answer

$\boxed{116}$.

Authors

Acknowledgements

Big shoutout to all our compute providers who collectively contributed over 1,250 GPUs to this run – we couldn’t have done it without you!

SYNTHETIC-2: Planetary-Scale Pipeline Parallel Inference for Verified Reasoning

Today, we’re excited to launch SYNTHETIC-2, our next-generation, open-source reasoning dataset and planetary-scale, pipeline-parallel decentralized inference run.Built on our peer-to-peer inference stack and powered by the new DeepSeek-R1-0528 model, SYNTHETIC-2 generates verified reasoning traces spanning the most comprehensive set of complex reinforcement-learning tasks and verifiers released to date.

PCCL - Prime Collective Communications Library

We are excited to release the Prime Collective Communications Library, a low-level communication library built for decentralized training across the globe.

INTELLECT-2 Release: The First 32B Parameter Model Trained Through Globally Distributed Reinforcement Learning

We're excited to release INTELLECT-2, the first 32B parameter model trained via globally distributed reinforcement learning. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning language model using fully asynchronous RL across a dynamic, heterogeneous swarm of permissionless compute contributors.

SYNTHETIC-2 Release: Four Million Collaboratively Generated Reasoning Traces

Planetary-Scale Inference

TOPLOC

Protocol

Compute Contributions

SYNTHETIC-2 Dataset

Next Steps

SYNTHETIC-2: Planetary-Scale Pipeline Parallel Inference for Verified Reasoning

PCCL - Prime Collective Communications Library

INTELLECT-2 Release: The First 32B Parameter Model Trained Through Globally Distributed Reinforcement Learning

Related Posts

SYNTHETIC-2: Planetary-Scale Pipeline Parallel Inference for Verified Reasoning

PCCL - Prime Collective Communications Library

INTELLECT-2 Release: The First 32B Parameter Model Trained Through Globally Distributed Reinforcement Learning