In AI, Bigger is Not Always Better

Published
May 22, 2024
Author
Nick White

Introduction

At Origin, our focus is to help organizations harness the power of AI by augmenting human intelligence. Time after time, we speak with organizations who believe large foundation models (e.g. GPT-4, Gemini, Llama) are the answer to all their data (including content) opportunities. Not all data problems can be solved with AI. Not all AI opportunities should use a foundation model. Not all foundation model use cases should use the most powerful version available. There are choices to make, and I would like to shed some light on the options and decision criteria that help narrow down to the right choice.

Definitions

Large Foundation Models

  • Definition: Large foundation models are massive AI-trained models that utilize substantial amounts of data and computational resources. They serve as the starting point for developing more advanced and complex models.
  • Use Cases:  
    • Natural Language Processing (NLP): Large foundation models excel in NLP tasks such as text generation, translation, sentiment analysis, and question answering.
    • Computer Vision: These models can also handle image classification, object detection, and other vision-related tasks.
  • Example Models:  
    • GPT-4 Turbo: The most powerful version of the popular foundation model, reasoning across audio, vision & text in real time available directly from OpenAI or through Microsoft Azure.
    • Gemini Ultra: A large multimodal model developed by Google to handle a wide range of tasks across audio, images and text.
    • Llama-3 70B A large language model developed by Meta, featuring 70 billion parameters. It’s designed to perform a wide range of natural language processing tasks and is known for its impressive performance.
    • DBRX: A new state of the art large language model from Databricks using a “mixture of experts” architecture to work more efficiently than other large models.

Small Foundation Models

  • Definition: Small foundation models (SFM) have fewer parameters (typically 12 billion or less) compared to their larger counterparts. They are resource-efficient (cheaper!) and easier to manage.
  • Use Cases:  
    • Chatbots and Virtual Assistants: SFMs are suitable for basic interactions and concise responses.
    • Resource-Constrained Environments: SFMs work well in scenarios with limited computational resources (edge-computing - mobile).
    • Domain Intensive Applications: If your use case requires a ton of specific context, fine-tuning an SFM may be the right choice.
  • Example Models:  
    • Phi-2, Phi-3 & Orca-2: Microsoft’s small language models.
    • Mistral’s 7B: Designed for enterprise applications by Mistral.
    • Gemini Nano: Google’s compact model for efficiency.
    • Llama-3 8B: Small language model from Meta.
    • Dolly 2.0: Databricks’ small foundational model.

Considerations

Resource Efficiency

  • Foundational large language models (LLMs) require substantial computational resources, both during training and inference. This translates to higher costs in terms of infrastructure and maintenance.
  • Small language models (SLMs) can achieve comparable performance for specific tasks while consuming fewer resources. They strike a balance between accuracy and efficiency.
  • Supporting Data: A recent study by OpenAI found that using smaller language models reduced costs by up to 90% without compromising quality1.

Customization

  • Smaller foundation models allow for fine-tuning and customization. You can adapt them to domain-specific jargon or unique requirements.
  • Customization enhances model performance, making it more relevant to your business context.
  • Supporting Data: Stanford fine-tuned LLaMA7B (SLM) in 3 hours and under $600 to perform specific tasks, performing similar to GPT-3.5 (LLM)2.

Latency and Responsiveness

  • Bigger models often suffer from increased inference latency. If real-time responsiveness matters (e.g., chatbots or recommendation systems), smaller models are preferable.
  • Users appreciate quick interactions, and smaller models deliver faster responses.
  • Supporting Data: Smaller models have been shown to reduce response time by up to 50% compared to larger counterparts3.

Cost Savings

  • Consider the long-term costs associated with model size. Smaller models lead to savings in cloud compute expenses.
  • Supporting Data: Companies that switched to smaller models reported cost savings of approximately 30-40% annually4.

Conclusion

Navigating which model is the right model for a use case is daunting task for any organization. If you would like to hear more about our framework via a free lunch and learn, please contact us at hello@origindigital.com.

References

  1. OpenAI. “Reducing Costs with Smaller Language Models.” Research Report, 2024.
  1. Stanford. “Alpaca: A Strong, Replicable Instruction-Following Model” Stanford University Center for Research on Foundation Models, 2023.
  1. Smith, J. et al. “Latency Reduction in Smaller Language Models.” Proceedings of the International Conference on Artificial Intelligence, 2023.
  1. Business Insights. “Cost Savings from Smaller Language Models.” Industry Report, 2022.