INTELLECT-2 Release: The First 32B Parameter Model Trained Through Globally Distributed Reinforcement Learning
We're excited to release INTELLECT-2, the first 32B parameter model trained via globally distributed reinforcement learning. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning language model using fully asynchronous RL across a dynamic, heterogeneous swarm of permissionless compute contributors.
To enable a training run with this unique infrastructure, we built various components from scratch: we introduce PRIME-RL, our training framework purpose-built for distributed asynchronous reinforcement learning, based on top of novel components such as TOPLOC, which verifies rollouts from untrusted inference workers, and SHARDCAST, which efficiently broadcasts policy weights from training nodes to inference workers.
Beyond infrastructure components, we propose modifications to the standard GRPO training recipe and data filtering techniques that were crucial to achieve training stability and ensure that our model successfully learned its training objective, thus improving upon QwQ-32B.
We open-source INTELLECT-2 along with our code and data, hoping to enable more open research in the field of decentralized training
Test-time compute scaling with reinforcement learning has emerged as a new scaling axis for large language models (LLMs), enabling improvements by allowing models to spend more time reasoning.
However, reinforcement learning training is typically centralized, requiring large clusters of co-located GPUs and fast interconnect speeds. With INTELLECT-2, we showcase a paradigm shift: reinforcement learning is inherently more asynchronous and well suited for decentralized, globally distributed compute.
Training Infrastructure
We introduce the following key open-source infrastructure components for training INTELLECT-2:
Fully asynchronous reinforcement learning framework designed for decentralized training. Decouples rollout generation, model training, and weight broadcasting. It enables training across heterogeneous, unreliable networks.
The trainer implementation uses PyTorch FSDP2, inference uses vLLM and the verifiers use the GENESYS schema introduced in SYNTHETIC-1.
SHARDCAST: A library for distributing large files via a HTTP-based tree-topology network that efficiently propagates updated model weights to the decentralized inference workers.
TOPLOC:
A locality-sensitive hashing scheme for efficient verifiable inference. It detects tampering or precision changes in model inference and works reliably across nondeterministic GPU hardware.
Inference workers generate the rollouts, these rollout files are uploaded via signed URLs, an on-chain event triggers TOPLOC validators to check them; accepted files feed the trainer, while invalid ones slash and remove the submitting node from the pool.
Protocol Testnet: Provides the infrastructure to aggregate and coordinate global computeresources.
Rust-based orchestrator and discovery service coordinate permissionless workers—nodes auto-register with hardware checks, heartbeats, and pull Docker-container tasks while the orchestrator schedules workloads, tracks health, and records pool ownership and contributions.
Training Recipe
Training Data & Rewards:
285k verifiable tasks (math & coding) from NuminaMath-1.5, Deepscaler, and SYNTHETIC-1.
Two-step asynchronous RL: The broadcast of new policy weights is fully overlapped with ongoing inference and training—eliminating the communication bottleneck
Two-Sided GRPO Clipping: Stabilizes training by mitigating gradient spikes with two-sided token probability ratio clipping.
Advanced Data Filtering: Combines offline and online filtering to select challenging tasks, significantly enhancing model learning efficiency.
Aggressive Gradient Clipping: Addresses escalating gradient norms at scale, providing improved training stability.
Experiments
We report results from two main experiments: TARGET-SHORT, an experimental run with short target lengths to train an efficient reasoning model, and, TARGET-LONG, our main run with longer target lengths.
Compute Utilization: During the two main experiments, we successfully overlapped communication with computation through two-step asynchronous reinforcement learning.
Reward Trajectories:
Throughout training, we saw significant improvements of our task rewards, indicating that the model improved its performance on our mathematics and coding problems. We also saw a reduction of length penalties, but a much slower one than during our ablation experiments
Benchmark Performance: We were able to increase the performance of QwQ-32B on mathematics and coding benchmarks.
Overall, as QwQ-32B was already extensively trained with RL, it was difficult to obtain huge amounts of generalized improvement on benchmarks beyond our improvements on the training dataset. To see stronger improvements, it is likely that better base models such as the now available Qwen3, or higher quality datasets and RL environments are needed.
Future Work
INTELLECT-2 is a first step towards open frontier reasoning models trained in a decentralized fashion. Over the coming months, we’ll work on:
Increasing the Ratio of Inference to Training Compute
Inference is embarrassingly parallel and communication-free, so more complex RL environments that devote more FLOPs to inference are a natural fit for decentralized training.
Tool Calls & Multi-Turn RL
To fully leverage inference-time compute for scientific and research progress, we need to give models built-in tools in its reasoning chain—web search, Python interpreters, etc.
Crowdsourcing RL Tasks & Environments
We believe open-source has a unique advantage here. Distributed RL is still in its early days, and with the right community and contributions, open-source AI can outpace the closed labs.
Model Merging & DiLoCo
Fuse independently trained RL models—either at the end or continuously via DiLoCo—to create a single unified model, scaling decentralized RL to a lot more compute.
INTELLECT-2 demonstrates that globally decentralized RL works. Now, we’re focusing on tool-assisted reasoning, crowdsourcing higher-quality data, and optimizing our infrastructure and training recipe to build frontier open models.
Join us if you want to build open source and decentralized AGI with us.
Eurus-2-PRIME
System Prompt
When tackling complex reasoning tasks, you have access to the following actions. Use them as needed to progress through your thought process. [ASSESS] [ADVANCE] [VERIFY] [SIMPLIFY] [SYNTHESIZE] [PIVOT] [OUTPUT] You should strictly follow the format below: [ACTION NAME] # Your action step 1 # Your action step 2 # Your action step 3 ... Next action: [NEXT ACTION NAME]
User
Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$
Eurus-2-PRIME
[ASSESS]
We are given two quadratic polynomials $P(x)$ and $Q(x)$ with leading coefficients $2$ and $-2$, respectively. The graphs of both polynomials pass through the points $(16,54)$ and $(20,53)$. We need to find the value of $P(0) + Q(0)$.
[ADVANCE]
Let's express the quadratic polynomials $P(x)$ and $Q(x)$ in the general form:
$ P(x) = 2x^2 + bx + c $
$ Q(x) = -2x^2 + dx + e $
Since the graphs of both polynomials pass through the points $(16,54)$ and $(20,53)$, we can substitute these points into the equations of $P(x)$ and $Q(x)$ to form a system of equations.
For $P(x)$:
$ P(16) = 2(16)^2 + 16b + c = 54 $
$ 512 + 16b + c = 54 $
$ 16b + c = -458 \quad \text{(Equation 1)} $
$ P(20) = 2(20)^2 + 20b + c = 53 $
$ 800 + 20b + c = 53 $
$ 20b + c = -747 \quad \text{(Equation 2)} $
For $Q(x)$:
$ Q(16) = -2(16)^2 + 16d + e = 54 $
$ -512 + 16d + e = 54 $
$ 16d + e = 566 \quad \text{(Equation 3)} $
$ Q(20) = -2(20)^2 + 20d + e = 53 $
$ -800 + 20d + e = 53 $
$ 20d + e = 853 \quad \text{(Equation 4)} $
[ADVANCE]
Now, we can solve the system of equations for $P(x)$: Subtract Equation 1 from Equation 2:
$ (20b + c) - (16b + c) = -747 - (-458) $
$ 4b = -289 $
$ b = -\frac{289}{4} $
Substitute $b = -\frac{289}{4}$ into Equation 1:
$ 16\left(-\frac{289}{4}\right) + c = -458 $
$ -1156 + c = -458 $
$ c = 698 $
So, for $P(x)$, we have:
$ P(x) = 2x^2 - \frac{289}{4}x + 698 $
$ P(0) = 698 $
Next, we can solve the system of equations for $Q(x)$: Subtract Equation 3 from Equation 4:
$ (20d + e) - (16d + e) = 853 - 566 $
$ 4d = 287 $
$ d = \frac{287}{4} $
Substitute $d = \frac{287}{4}$ into Equation 3:
$ 16\left(\frac{287}{4}\right) + e = 566 $
$ 1148 + e = 566 $
$ e = -582 $
So, for $Q(x)$, we have:
$ Q(x) = -2x^2 + \frac{287}{4}x - 582 $
$ Q(0) = -582 $
[ADVANCE]
Finally, we can find $P(0) + Q(0)$:
$ P(0) + Q(0) = 698 + (-582) = 116 $
So, the value of $P(0) + Q(0)$ is $\boxed{116}$.
INTELLECT-MATH-7B-SFT (ours)
System Prompt
Solve the following math problem efficiently and clearly. Think carefully and step by step about your response and reason before providing a final response. Conclude your response with: \n\nTherefore, the final answer is: $\\boxed{answer}$. I hope it is correct.\n\nWhere [answer] is just the final number or expression that solves the problem.
User
Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$
INTELLECT-MATH-7B-SFT
So I've got this problem here about quadratic polynomials $P(x)$ and $Q(x)$. $P(x)$ has a leading coefficient of $2$, and $Q(x)$ has a leading coefficient of $-2$. Both polynomials pass through the points $(16,54)$ and $(20,53)$. I need to find $P(0) + Q(0)$.
First, since both $P(x)$ and $Q(x)$ are quadratic polynomials, I can express them in the general form:
$P(x) = 2x^2 + bx + c$
$Q(x) = -2x^2 + dx + e$
Wait, actually, the leading coefficient is the coefficient of the highest power of $x$, which for a quadratic is $x^2$. So yes, $P(x) = 2x^2 + bx + c$ and $Q(x) = -2x^2 + dx + e$.
Now, both polynomials pass through the points $(16,54)$ and $(20,53)$. That means when $x=16$, $P(16)=54$, and when $x=20$, $P(20)=53$. Similarly for $Q(x)$.
So, I can set up equations using these points.
First, for $P(x)$:
$P(16) = 2(16)^2 + b16 + c = 54$
Similarly, $P(20) = 2(20)^2 + b20 + c = 53$
Let me calculate $16^2$ and $20^2$:
$16^2 = 256$
$20^2 = 400$
So,
$P(16): 2256 + 16b + c = 54 \Rightarrow 512 + 16b + c = 54 \Rightarrow 16b + c = 54 - 512 \Rightarrow 16b + c = -458$
$P(20): 2400 + 20b + c = 53 \Rightarrow 800 + 20b + c = 53 \Rightarrow 20b + c = 53 - 800 \Rightarrow 20b + c = -747$
Now, I have two equations:
1) $16b + c = -458$
2) $20b + c = -747$
I can subtract equation 1 from equation 2 to eliminate $c$:
$(20b + c) - (16b + c) = -747 - (-458)$
$20b + c - 16b - c = -747 + 458$
$4b = -289$
So, $b = -289 / 4 = -72.25$
Wait, but coefficients are usually integers, but maybe not. Let me check.
Now, plug $b$ back into one of the equations to find $c$. Let's use equation 1:
$16(-72.25) + c = -458$
$-1156 + c = -458$
$c = -458 + 1156 = 698$
So, $P(x) = 2x^2 - 72.25x + 698$
Hmm, $P(0)$ would be $c$, which is $698$.
Now, let's do the same for $Q(x)$.
$Q(16) = -2*(16)^2 + d16 + e = 54 \Rightarrow -2256 + 16d + e = 54 \Rightarrow -512 + 16d + e = 54 \Rightarrow 16d + e = 54 + 512 \Rightarrow 16d + e = 566$
$Q(20) = -2*(20)^2 + d20 + e = 53 \Rightarrow -2400 + 20d + e = 53 \Rightarrow -800 + 20d + e = 53 \Rightarrow 20d + e = 53 + 800 \Rightarrow 20d + e = 853$
Now, equations for $Q(x)$:
1) $16d + e = 566$
2) $20d + e = 853$
Subtract equation 1 from equation 2:
$(20d + e) - (16d + e) = 853 - 566$
$4d = 287$
$d = 287 / 4 = 71.75$
Now, plug $d$ back into equation 1:
$1671.75 + e = 566 \Rightarrow 1148 + e = 566 \Rightarrow e = 566 - 1148 = -582$
So, $Q(x) = -2x^2 + 71.75x - 582$
Therefore, $Q(0) = e = -582$
Now, $P(0) + Q(0) = 698 + (-582) = 116$
Wait, but the problem says "present the answer in LaTeX format: \boxed{Your answer}"
So, I should write $\boxed{116}$
But let me double-check my calculations to make sure I didn't make any mistakes.
First, for $P(x)$:
$P(16) = 2(256) + 16b + c = 512 + 16b + c = 54 \Rightarrow 16b + c = -458$
$P(20) = 2*(400) + 20b + c = 800 + 20b + c = 53 \Rightarrow 20b + c = -747$
$Q(16) = -2*(256) + 16d + e = -512 + 16d + e = 54 \Rightarrow 16d + e = 566$
$Q(20) = -2*(400) + 20d + e = -800 + 20d + e = 53 \Rightarrow 20d + e = 853$
Subtracting: $4d = 287 \Rightarrow d = 71.75$
Then, $e = 566 - 16*71.75 = 566 - 1148 = -582$
So, $P(0) = 698$, $Q(0) = -582$, sum is $116$.
Seems correct.
Final Answer
$\boxed{116}$
INTELLECT-MATH-7B (ours)
System Prompt
Solve the following math problem efficiently and clearly. Think carefully and step by step about your response and reason before providing a final response. Conclude your response with: \n\nTherefore, the final answer is: $\\boxed{answer}$. I hope it is correct.\n\nWhere [answer] is just the final number or expression that solves the problem.
User
Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$
INTELLECT-MATH-7B
So I have two quadratic polynomials, $P(x)$ and $Q(x)$, with leading coefficients 2 and -2, respectively. Both pass through the points $(16,54)$ and $(20,53)$. I need to find $P(0) + Q(0)$.
First, since both are quadratic polynomials, I can express them in the general form:
$P(x) = 2x² + bx + c$
$Q(x) = -2x² + dx + e$
But since both pass through the points $(16,54)$ and $(20,53)$, I can set up equations using these points.
For $P(x)$:
$P(16) = 2*(16)² + b \cdot 16 + c = 54$
$P(20) = 2(20)² + b \cdot 20 + c = 53$
Similarly for $Q(x)$:
$Q(16) = -2(16)² + d \cdot 16 + e = 54$
$Q(20) = -2(20)² + d \cdot 20 + e = 53$
Let's compute the squares:
$16² = 256$
$20² = 400$
So for $P(x)$:
$2256 + 16b + c = 54 \Rightarrow 512 + 16b + c = 54 \Rightarrow 16b + c = 54 - 512 = -458 \quad ...(1)$
$2400 + 20b + c = 53 \Rightarrow 800 + 20b + c = 53 \Rightarrow 20b + c = 53 - 800 = -747 \quad ...(2)$
For $Q(x)$:
$-2256 + 16d + e = 54 \Rightarrow -512 + 16d + e = 54 \Rightarrow 16d + e = 54 + 512 = 566 \quad ...(3)$
$-2400 + 20d + e = 53 \Rightarrow -800 + 20d + e = 53 \Rightarrow 20d + e = 53 + 800 = 853 \quad ...(4)$
Now, I have two systems of linear equations:
For $P(x)$:
$16b + c = -458 \quad ...(1)$
$20b + c = -747 \quad ...(2)$
Subtract equation (1) from equation (2):
$(20b + c) - (16b + c) = -747 - (-458)$
$4b = -289 \Rightarrow b = -\frac{289}{4}$
Now plug $b$ back into equation (1):
$16\left(-\frac{289}{4}\right) + c = -458 \Rightarrow -1156 + c = -458 \Rightarrow c = -458 + 1156 = 698$
So $P(x) = 2x² - \frac{289}{4}x + 698$
For $Q(x)$:
$16d + e = 566 \quad ...(3)$
$20d + e = 853 \quad ...(4)$
Subtract equation (3) from equation (4):
$(20d + e) - (16d + e) = 853 - 566$
$4d = 287 \Rightarrow d = \frac{287}{4}$
Now plug $d$ back into equation (3):
$16\left(\frac{287}{4}\right) + e = 566 \Rightarrow 1148 + e = 566 \Rightarrow e = 566 -1148= -582$
Prime Intellect Research Team: Sami Jaghouar, Justus Mattern, Jack Min Ong, Jannik Straube, Manveer Basra, Aaron Pazdera, Matthew Di Ferrante, Kushal Thaman, Felix Gabriel, Fares Obeid, Kemal Erdem, Michael Keiblinger, Johannes Hagemann