Meta has introduced the next generation of its open-source large language models (LLMs) under the name Llama 3. “We believe these are the best open source models of their class, period,” Meta said.
The new models come in two sizes: the smallest one has eight billion parameters, and the larger one includes 70 billion. The company said that it’s still training its 400 Billion parameters model. For comparison, there were some estimates that OpenAI’s GPT-4 model has around 1.76 trillion parameters.
However, Meta believes that all their new models punch above their weight classes.
“This next generation of Llama demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning,” Meta said in a blog post.
The provided comparison reveals that the 70B model trades blows with Gemini Pro 1.5 from Google and Claude 3 Sonnet from Anthropic. Previously, it was demonstrated that the Claude 3's Sonnet taste outperforms GPT-3.5's from OpenAI.
“Thanks to improvements in pre-training and post-training, our pre-trained and instruction-fine-tuned models are the best models existing today at the 8B and 70B parameter scale,” Meta claims.
Meta tested their models with industry benchmarks and human evaluation across 12 key use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization.
Meta says they pre-trained Llama 3 on over 15T tokens that were all collected “from publicly available sources.”
“Our training dataset is seven times larger than that used for Llama 2, and it includes four times more code. To prepare for upcoming multilingual use cases, over 5% of the Llama 3 pre-training dataset consists of high-quality non-English data that covers over 30 languages. However, we do not expect the same level of performance in these languages as in English,” the post reads.
Llama 3 will be offered on all major platforms, including cloud providers, model API providers, and “everywhere” else. Llama 3 is currently available on Amazon SageMaker. The models will soon be available on Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms offered by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm, Meta said.
“Over the coming months, we’ll release multiple models with new capabilities, including multimodality, the ability to converse in multiple languages, a much longer context window, and stronger overall capabilities. We will also publish a detailed research paper once we are done training Llama 3.”
In the provided sneak peek of the upcoming larger model, which is still training, the 400B+ parameter Llama 3 already demonstrates scores similar to GPT-4 and Claude 3 Opus, the leading LLMs. At least in the benchmark scores provided by Meta.
The new powerful and open-source models are making waves on the Hacker News forum, receiving almost 800 comments and almost 2000 upvotes.
Andrej Karpathy, a widely recognized computer scientist and former director of AI at Tesla, praised Llama 3 as being very capable. He noted that the smallest Llama 3, the 8B, is “somewhere in the territory of Llama 2 70B, depending on where you look.”
“Super welcome, Llama 3 is a very capable-looking model release from Meta. Sticking to fundamentals, spending a lot of quality time on solid systems and data work, exploring the limits of long-training models. Also very excited for the 400B model, which could be the first GPT-4 grade open-source release. I think many people will ask for more context length,” Karpathy posted on X.
Your email address will not be published. Required fields are markedmarked