The Rise of Small Language Models

Small but Mighty

In the world of GenAI, large language models (LLM) are attracting most of the marketing and media attention. Yet, let’s look at our daily tasks. Most highly valued actions involve only a minimal subset of language constructions. After all, the fewest of us go from writing a classical novel to programming to tax preparation within a single workday. Most office work days require high domain knowledge and a focus on a few specialized tasks. Even if our tasks revolve around general communication issues, like sales, marketing, or coaching, the underlying thought process is often distinct and technically challenging, underscoring the need for specialized tools like small language models (SLM).

Often overshadowed by their larger counterparts, small language models are crucial in assisting with domain-specific tasks. Their unique capabilities make them better suited for these tasks, a fact that deserves more attention and recognition. Let’s explore why SLMs should be part of your AI strategy.

What are Small Language Models?

The number of decision parameters and, thus, the size of the training data in a language model separate them into two groups. Large language models comprise billions of parameters, with input data ranging from the latest technical documents to Shakespeare’s works. Small language models involve far fewer parameters, often in the range of single-digit millions. The training data, consequently, is far more limited and focuses on a small domain.

While this restricts the SLM to answering domain-specific questions, it also ensures that the system doesn’t have to infer the topic area from the context. Thus, it can often answer specific questions more accurately than an LLM. For example, detailed architectural descriptions and optimizations would be critical when building an SLM to assist in construction. Yet, the same model would create a bland and boring read when writing a novel.

However, the more focused area isn’t the only advantage. SLMs also benefit from reduced training and run-time costs. The limited training data ensures that the average consumer PC can train and optimize an SLM. This limited cost is precious for companies building their models based on internal data. Suppose you don’t want to share your data with a service provider or heavily invest in computational resources. In that case, an SLM might be the only choice for building a fully custom model.

Combination and Focus

However, one area where SLMs can experience problems is with end-user interactions. Limited general knowledge can lead to difficulties in understanding and parsing inputs and outputs, especially when dealing with a diverse user base. For example, doctors have a particular way of communicating with each other about medical issues that is both concise and precise. Thus, they would interact well with an SLM with similar language patterns.

On the other hand, modeling a mentoring SLM would quickly run into issues, as mentees from all walks of life use very different languages.

A combination of LLM and SLM or human and SLM is thus a possibility for diverse user groups. The LLM or human parses the input language and reduces it to a problem set the SLM can understand. The SLM, in turn, applies the optimized domain-specific knowledge to the reduced input before the LLM can expand the output into a more natural language.

For developers, this allows them to focus on domain-specific knowledge and harness the power of small language models without requiring users to deal with their restrictions. Likewise, it can speed up response times for human-driven conversations, for example, when a concierge uses AI to fulfill a customer’s request.

Think Small for Going Big

Small language models only receive a fraction of their larger cousins’ attention. It could be the impressiveness of large numbers or our human curiosity to give an LLM all kinds of input. Yet, the domain-specific power of an SLM should make them the primary choice for many applications. When building and evaluating AI strategies, we should ask whether a small language model is better for reaching our goals. Their adaptability and costs should put them at the forefront of assisting in many real-life tasks.

More Articles & Posts

Mastodon