Small Language Models

In this blog, we will explore: 

  • What Is a Small Language Model?
  • What Are Parameters?
  • How Small Language Models Are Used
  • Key Benefits of Small Language Models
  • Limitations of Small Language Models
  • Why Are Large Language Models So Expensive to Build and Maintain?
  • Why Is Prompt Engineering Important for Large Language Models?
  • Practical Use Cases of Small Language Models
  • Hardware Requirements for Local Models
  • The Future of Small Language Models
  • Which Model Should You Use?
  • Conclusion
  • FAQs about Large vs Small Language Models

What Is a Small Language Model?

It is important to know that the definition of a small language model is always changing. As large language models get bigger, what is considered a small language model is always changing. The goalposts are always moving in terms of these definitions, and there is no one overarching definition that everyone adheres to. Something that might have been considered a large language model two years ago might be called a small model now.

Previously, a small language model was considered a model with fewer than hundreds of millions of parameters, but that definition is changing. As large language models like GPT-4 get bigger and bigger, other models are being called small comparatively. It is all kind of judged by parameters.

Explore More About: Fine Tuning vs RAG: Which One Makes Your AI Smarter?

What Are Parameters?

The key difference between large and small language models is parameters. A parameter in a language model refers to the variables that the model uses to make predictions. Each parameter represents a concrete part of the model that can change or adapt based on the data it is trained on. That is an important piece, because all of these models and all of the parameters they contain are trained.

That is why large language models are much more expensive to create, more expensive to upkeep, and more difficult to train. They are so much more complex. A large language model can essentially be asked about anything in the history of existence, and if someone knows how to work with it, they can probably get a pretty decent answer. That is not necessarily the case with small models. Small models are generally trained in different categories of work or different types of outcomes.

Small Language Models vs Large Language Models

Large language models have billions to trillions of parameters, allowing complex tasks of all varieties. GPT-4 and Gemini Ultra are ultra big large language models. They can perform many kinds of tasks and essentially know everything. Small language models have fewer parameters, making them more efficient, and they are more for specific tasks or to be used locally on devices with more limited resources.

Large language models are the behemoths that can literally do anything and everything. Most people use large language models like ChatGPT, Google Gemini, and Anthropic Claude more than they use small models. Small models are more for specific tasks. They are not something that gives the best results when being asked a hundred different questions from a hundred different walks of life. They are fine-tuned for specific tasks.

Some examples make the difference clearer. Large language models include GPT-4 from OpenAI, reportedly about 1.8 trillion parameters, and Gemini Ultra from Google, reportedly around 1.5 to 1.6 trillion parameters. Small language models include Phi-2 from Microsoft at about 2.7 billion parameters and LLaMA from Meta at about 7 billion parameters. These are some big names in the space, from OpenAI, Google, Meta, and Microsoft. There are smaller, more focused models like Phi-2 and LLaMA, and then there are the big ones like GPT-4 and Gemini Ultra.

How Small Language Models Are Used

A small language model may be trained or built specifically for a type of customer service to handle inquiries from customers. It might be fine-tuned for that specific use case. If a small model is built specifically and tailored and fine-tuned for customer service to respond to customer inquiries, it is not going to be used to code, develop a website, or spit out images. Large language models do those things. That is the multimodality of large language models: being able to input photos as prompts, output different types of code, and work across multiple languages. Small language models are often not like that. They are built for one very specific purpose or for smaller purposes.

A small language model might excel at creative writing, but that does not mean it is really for outlining how the stock market has changed over the last 30 years. Different use cases, different types of training, and different parameters mean a complete difference in what a large language model and a small language model should be used for.

Key Benefits of Small Language Models

There are some key things to know about small language models.

 Lower Computational Requirements

Small language models require less computational power, making them more accessible for users with limited hardware resources. That is one of the most important things. These small models can live locally on devices. The new Samsung phones have Gemini Nano, which is technically a small model, but it lives on the hardware. That requires less compute.

Large language models are extremely resource heavy. Every couple hundred prompts can show an environmental toll, because large language models require a lot of compute power. Small language models do not, because they live locally. They are not having to send a query and compute it in the cloud, which can be very expensive and resource heavy.

Faster Training and Inference

Small language models are faster at training and inference due to their smaller size compared to large language models. They are faster because there are way fewer parameters, especially when they are being used for what they are good at. They are on-device, and they have fewer parameters to look through.

More Energy Efficiency

They are also more energy efficient, reducing the carbon footprint associated with the training and running of AI models. It is not just the running of large language models that is expensive, it is the ongoing training. There is so much compute required. That is why there is such a race for GPUs and new classes of chips. All generative AI models run off these very hard-to-get, very expensive GPU chips. Computing power is scarce, expensive, and resource heavy, taking a toll on the environment. Small models are important to keep an eye on in that regard.

Ability to Run Locally on Devices

Small language models can be deployed on mobile devices and embedded systems, unlike most large language models. That is edge AI or edge computing, bringing these language models locally to small devices. This is already being seen in phones, such as the Samsung S24 with Gemini Nano. Apple is also expected to announce a generative AI offering, and it is presumed that some sort of small language model may appear in a future iPhone, MacBook, or iMac.

This is also being seen with Nvidia’s Chat with RTX. It looks like a pretty solid small language model that can run locally. A certain Nvidia GPU is required to run it, but it is part of a big shift.

Privacy 

One of the biggest advantages is privacy. That is one of the biggest things people are concerned about with large language models. Not just data sharing, but training data: how companies are using any data uploaded into their systems to train their models. Smaller models that run locally are not sending information back and forth. It is the concept of running generative AI locally on a device with much more privacy and much more security.

Limitations of Small Language Models

Small language models are suited for real-time applications such as on-device language processing where quick responses are crucial.

Small language models have a lower capacity for understanding complex language nuances compared to large language models.

There is really not much in the world that cannot be done with a large language model. Large language models can translate languages, build advanced web applications, and even technically help build generative AI with generative AI. They are extremely complex and are only going to get more powerful and more robust as new models are released.

Large language models are going to get even more powerful and even more robust, with better reasoning, better rationale, and more multimodal capabilities. Across benchmarks like MMLU, the current versions of GPT-4 and presumably Gemini Ultra are about three to four times better than the average human. For the most part, large language models are much smarter than any one human. That is a big difference.

Why Are Large Language Models So Expensive to Build and Maintain?

Small language models are often used in applications where speed and efficiency are more critical than deep language understanding. They can also be fine-tuned more quickly and cheaply for specific tasks versus large language models.

Large language models are extremely expensive to create, train, and maintain. It is like a Titanic ship in the ocean versus a jet ski. A jet ski cannot be used for everything, but for a specific task, a jet ski is often much better than a huge cruise ship. Different applications require different vessels.

Small language models are much easier to maintain and update due to their simpler architecture. They can also be more easily integrated into software and web applications without needing extensive infrastructure.

Early on, many web applications and pieces of software jumped on OpenAI models because the API was good and has been getting cheaper and faster. But there may be a shift where software and web applications instead use small language models.

For example, if a large company wants to build its own model for customer support, it might not need something as big as Gemini Ultra or GPT-4. It might be better off with a model like Mistral or LLaMA, something more limited and more fine-tuned.

Why Is Prompt Engineering Important for Large Language Models?

Large language models are often misunderstood. So many prompts shared online do not really work, because that is not how large language models work.

If someone tells GPT-4, “you’re a copywriter with 20 years of experience,” that means nothing to a large language model. It has gobbled up all of the information on the open web, closed web, works of art, and essentially the history of humankind in its data set and trillions of parameters. It has also gobbled up bad information.

That is why copy-and-paste prompts do not necessarily get great outputs from large language models. But a small language model specifically trained for copywriting or creative writing might give better outputs in that domain.

This is why many individuals and businesses early on wrote off powerful technologies such as ChatGPT, GPT-4, and Gemini Ultra. They put one big prompt in and it was not fantastic. That is because it is a large language model with trillions of parameters. It is too big and not fine-tuned for a very specific task.

If someone is working with large language models, they need to understand the basics of prompt engineering. They need to essentially train the chat they are working with. Most people are using large language models incorrectly. They are using them like they are small language models, and that is not how it works.

Practical Use Cases of Small Language Models

Small language models offer a balance between performance and resource usage and are ideal for many practical applications. Good examples are chatbots, search engines, and voice assistants. Large language models are advanced and used for every single task.

Downloadable and Local AI

Small language models can be used in both cloud-based services or by downloading them. That is a big thing to keep in mind. A large language model cannot really be downloaded in full onto one physical device. There are people who have forked them and created smaller versions, but for the most part, small language models can be downloaded and can also be cloud-based.

There are great resources for working with and downloading small language models, and they can be run locally on a machine. It does not require being a tech expert to experiment with, download, and install small language models. That is what they are for: on-device use for very specific use cases.

Hardware Requirements for Local Models

For tinkering with local models, newer devices are needed. Laptops introduced in the last three to six months are more likely to have the GPUs and processing power necessary to really leverage local small language models.

Microsoft has released a new version of the Surface laptop that can run models locally, and there are also phones that can run small models locally. Newer devices with newer GPUs are going to be needed.

The Future of Small Language Models

The future of small language models is still not 100% certain, but there are strong signs. Large language models are discussed every day, but the future of small language models may rely heavily on the successes or failures of the first few large-scale commercial rollouts.

There are already a handful of highly visible small language models. Meta’s LLaMA models are very popular for people to run locally. Gemini Nano is another example. Nvidia’s Chat with RTX is another.

It is also important to consider the power of RAG, retrieval-augmented generation, when combined with small language models. That means bringing in a personal database of information and combining it with small language models.

That can bypass many of the security and privacy concerns. A small model that works on a device is faster, more efficient, and cheaper, and when it can bring in private data and work with it in a secure fashion, the future becomes extremely promising.

Large language models are kind of like the Trojan horse. They have infiltrated daily life, shown how powerful they are, and led hundreds of millions of people to use them every day. But over the last 18 months, the downside has also become visible, and people are becoming more cognizant of privacy and trust.

Small language models are still in a wait-and-see phase, but they are gaining popularity. Gemini Nano in the new S24, whatever Apple is going to announce, Meta’s open-source local models, and Apple’s likely move toward edge AI all point toward a future where working more with small language models is going to be increasingly important.

Which Model Should You Use?

For everyday use like emails and productivity, ChatGPT is still a strong choice, especially for small teams and businesses not working heavily with confidential documents. If there were more confidential documents involved, then small language models might be worth looking at more seriously.

OpenAI’s ecosystem still offers a lot of flexibility through plugins, plug-in packs, GPT mentions, and GPTs from the GPT store. That flexibility is not really matched by other large language models at the moment.

Conclusion

Small language models and large language models are very different. The differences come down to parameters, use cases, infrastructure, privacy, speed, and scale. Small language models are faster, cheaper, more efficient, more private, and more focused. Large language models are broader, more powerful, more complex, and more capable across a wide variety of tasks.

FAQs about Large vs Small Language Models

1. Why does the definition of Small Language Models keep on changing?

It is important to know that the definition of a small language model is always changing. As large language models get bigger, what is considered a small language model is always changing. The goalposts are always moving in terms of these definitions, and there is no one overarching definition that everyone adheres to. Something that might have been considered a large language model two years ago might be called a small model now.

2. Why do copy-and-paste prompts not necessarily get great outputs from large language models?

If someone tells GPT-4, “you’re a copywriter with 20 years of experience,” that means nothing to a large language model. It has gobbled up all of the information on the open web, closed web, works of art, and essentially the history of humankind in its data set and trillions of parameters. It has also gobbled up bad information.

That is why copy-and-paste prompts do not necessarily get great outputs from large language models. But a small language model specifically trained for copywriting or creative writing might give better outputs in that domain.

3. What is the main reason behind running Gen AI locally?

Smaller models that run locally are not sending information back and forth. It is the concept of running generative AI locally on a device with much more privacy and much more security.

4. What is the key difference between Large and Small Language Models?

The key difference between large and small language models is parameters. A parameter in a language model refers to the variables that the model uses to make predictions. Each parameter represents a concrete part of the model that can change or adapt based on the data it is trained on. That is an important piece, because all of these models and all of the parameters they contain are trained.