Last update over a year ago, so I hope (2025) gets added to the title:
> [2025/05/26] (Step 1 completed!) We release Mixture-of-Thoughts--a curated reasoning dataset of 350k verified traces distilled from R1. The dataset spans tasks in mathematics, coding, and science, and is designed to teach language models to reason step-by-step. We also provide a recipe to train OpenR1-Distill-7B, which replicates the reasoning capabilities of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B and marks the completion of step 1 in the Open R1 project.
Doesn't look like they managed to actually reproduce R1, and only stopped on Step 1 out of their 3-step plan.
One of my favorite code comments of all time is still in the src:
"# TODO: implement a proper validator to compare against ground truth. For now we just check for exact string match on each line of stdout." [1]
This was one of my chief complaints about the entire R1 news cycle, it felt like no one actually read the technical report. They were being heralded for their openness, but they left out the most meaningful details that you'd need to reproduce their work.
I'm not really familiar with either, but I'm more familiar with Olmo. My impression is Nemotron is newer -- why is it less applicable? Is it not totally open like Olmo?
Open-source data coverage: The released datasets cover an estimated 8–10T tokens
(~40–50% of the internal 25T blend). Missing categories include code (~14% of blend),
nemotron-cc-code (~2%), crawl++ (~2%), and academic text (~2%). Users should
supplement with their own data for these categories and adjust train_iters
accordingly.
K2 Think V2 is another fully open model like Olmo, with full datasets released.
Note that the Nemotron models are generally stronger than Olmo and K2 Think V2 (according to Artificial Analysis benchmarks), and there is a lot of overlap in their datasets (lots of datasets are based on the same sources with different filtering, Olmo and K2 Think V2 both have used some Nemotron datasets).
Check out OpenThoughts. It has a widely used dataset, a model that beats the deepseek's smaller reasoning models, and a paper that talks in detail about the data curation methodology.
> [2025/05/26] (Step 1 completed!) We release Mixture-of-Thoughts--a curated reasoning dataset of 350k verified traces distilled from R1. The dataset spans tasks in mathematics, coding, and science, and is designed to teach language models to reason step-by-step. We also provide a recipe to train OpenR1-Distill-7B, which replicates the reasoning capabilities of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B and marks the completion of step 1 in the Open R1 project.
Doesn't look like they managed to actually reproduce R1, and only stopped on Step 1 out of their 3-step plan.
"# TODO: implement a proper validator to compare against ground truth. For now we just check for exact string match on each line of stdout." [1]
This was one of my chief complaints about the entire R1 news cycle, it felt like no one actually read the technical report. They were being heralded for their openness, but they left out the most meaningful details that you'd need to reproduce their work.
[1] https://github.com/huggingface/open-r1/blob/1416fa0cf21595d2...
https://github.com/allenai/OLMo
https://github.com/NVIDIA-NeMo/Nemotron
Nemotron only releases portions of some of their datasets, like the source code dataset that they pretrain on.
For example, from https://docs.nvidia.com/nemotron/latest/nemotron/super3/pret... :
K2 Think V2 is another fully open model like Olmo, with full datasets released.Note that the Nemotron models are generally stronger than Olmo and K2 Think V2 (according to Artificial Analysis benchmarks), and there is a lot of overlap in their datasets (lots of datasets are based on the same sources with different filtering, Olmo and K2 Think V2 both have used some Nemotron datasets).
https://www.open-thoughts.ai/