AI wars between tech giants have seen them compete to build ever-larger language models, but an unexpected new trend has emerged: small is the new big. As progress in large language models (LLMs) shows some signs of plateauing, more researchers and developers have turned their focus towards small language models (SLMs), which offer compact yet efficient and highly adaptable AI models that challenge the notion that bigger is always better; promising to change our approach to AI development altogether.
Are LLMs hitting their stride? Recent performance comparisons published by Vellum and HuggingFace suggest that performance gaps among LLMs are rapidly narrowing, especially on multi-choice questions, reasoning tasks and math problems where top models such as Claude 3 Opus, GPT-4 and Gemini Ultra all scored above 83% accuracy, while in reasoning tasks their accuracy exceeded 92% accuracy.
Mixtral 8x7B and Llama 2 – 70B models have demonstrated promising results in various areas, such as reasoning and multi-choice questions, where they outshone some of their larger counterparts. This suggests that size may not be solely responsible for performance – architecture, training data and fine-tuning techniques may play an integral part.
Research papers that announce new LLMs all point in one direction, according to Gary Marcus, former Head of Uber AI and author of Rebooting AI book on Building Trustworthy AI Systems. Marcus met with VentureBeat Thursday.
“Some are slightly better than GPT-4, but there’s no quantum leap. Most would consider GPT-4 an improvement over GPT-3.5; however, no such leap has taken place for over a year,” according to Marcus.
As the performance gap continues to narrow and more models demonstrate competitive results, it raises questions as to whether LLMs may indeed have reached a plateau. If this trend holds true, it could have serious ramifications for their development and deployment in future years; possibly shifting away from simply increasing model size towards exploring more efficient and specialized architectures.
Lessons Learnt from LLM Approach
LLMs may be powerful tools, yet they have several significant drawbacks. Training an LLM requires massive amounts of data – up to trillions or billions of parameters! This makes its training resource-intensive; both computational power and energy consumption required are immense; ultimately leading to high costs that prevent smaller organizations or individuals from engaging in core LLM development – OpenAI CEO Sam Altman noted at an MIT event last year that GPT-4 alone required over $100M worth of data training alone!
Complex tools and techniques required for LLMs pose a steep learning curve for developers, further restricting accessibility. Developers typically face long cycle times from training through building and deploying models – this slows development and experimentation significantly; according to research from Cambridge University’s paper on this matter companies may spend 90+ days or longer deploying one machine learning model!