
Students at UC Berkeley turned an open-source large language model (LLM) without reasoning capabilities into a “reasoning” one. This dramatically improved the model's capabilities and only cost around $450 of compute. However, the process needed to borrow knowledge from a much better LLM.
NovaSky, a student-led initiative at UC Berkeley’s Sky Computing Lab, claims its improved open-source thinking model is on par with the o1-preview, a large language model (LLM) with capable reasoning capabilities from OpenAI.
The students finetuned Qwen2.5-32B-Instruct, a popular open-source model developed by Alibaba Cloud, and openly shared their code.
“It is possible to replicate high-level reasoning capabilities affordably and efficiently,” the team said in its post.
It only took 19 hours of training on 8 H100 GPUs, which would cost around $450, according to Lambda Cloud pricing.
The released ‘reasoning’ model was called Sky-T1-32B-Preview. The team says it significantly improves scores on various math, coding, and science benchmarks, often adding double-digit percentage points.
For example, the team’s model achieved 56.8% at the GPQA-Diamond benchmark, 43.3% at Aime2024, and 86.4% at Math500. The base Qwen model had 45.5%, 16.7%, and 76.2%, respectively.

However, its model was trained on borrowed knowledge from an existing smarter model.
The team used the ‘reasoning’ model from Alibaba, QwQ-32B-Preview, to generate training data. QwQ is already similar in performance to o1-preview.
The team said it curated the data mixture to cover diverse domains and even edited it again with another AI model to improve quality and ease parsing. However, the Sky-T1-32B-Preview didn’t beat the QwQ model, from which it borrowed training data.
“Sky-T1-32B-Preview marks the start of our journey to develop open-sourced models with advanced reasoning capabilities,” the students from NovaSky Team believe.
The team collaborated with the Qwen and Still-2 Teams and received compute support from Lambda Labs and Anyscale.
Moving forward, it plans to focus on developing “more efficient models that maintain strong reasoning performance and exploring advanced techniques that further enhance the models’ efficiency and accuracy at test time.”
Your email address will not be published. Required fields are markedmarked