When it comes to generative AI, the open source community has embraced Meta AI’s LLaMA (Large Language Model Meta AI), which was released in February. Meta made LLaMA available in several sizes (7B, 13B, 33B, and 65B parameters), but at first it was restricted to approved researchers and organizations. However, when it was leaked online in early March for anyone to download, it effectively became fully open source.
The developers are attracted to Meta’s LLaMA because — unlike with GPT and other popular LLMs — LLaMA’s weights can be fine-tuned. This allows devs to create more advanced and natural language interactions with users, in applications such as chatbots and virtual assistants. LLaMA isn’t that different from OpenAI’s GPT 3 model, except that Meta has shared the weights. The other major LLMs have not done that.
In the context of AI models, “weights” refers to the parameters learned by a model during the training process. These parameters are stored in a file and used during the inference or prediction phase. What Meta did, specifically, was release LLaMA’s model weights to the research community under a non-commercial license. Other powerful LLMs, such as GPT, are typically only accessible through limited APIs.
You have to go through OpenAI and access the API, but you cannot really, let’s say, download the model or run it on your computer, you cannot do anything custom, basically. In other words, LLaMA is much more adaptable for developers. This is potentially very disruptive to the current leaders in LLM, such as OpenAI and Google.
Finance and legal use cases are good candidates for fine-tuning and local hosting. Some larger companies may want to go beyond just fine-tuning and instead pre-train the entire model using their own data. Classification tasks are also popular so far — such as toxicity prediction, spam classification, and customer satisfaction ranking.
One of the tools developers can use to fine-tune LLaMA is LoRA (Low-Rank Adaptation of Large Language Models). The adapter method is attractive because it allows training of the whole LLM, while keeping the rest of the transformer frozen — which results in smaller parameters and faster training time. LoRA is one type of adapter method and it uses a mathematical trick to decompose large matrices into smaller matrices, resulting in fewer parameters and more storage efficiency. In effect, this means you can do the fine-tuning in much quicker time.
Devs and Fine-Tuning
Understanding how to use language models will be a useful skill for developers, but it’s not necessary for them to be in charge of fine-tuning the models at their company unless they have very specific needs. For small company and without sensitive information, they can use a general tool like GPT, and for larger companies there will be a team member who is in charge of fine-tuning the models.
Conclusion
LLaMA does seem like a great option for developers wanting more flexibility in using large language models. While fine-tuning is becoming increasingly accessible, it is still a specialized skill that may not be necessary for every developer to learn. Regardless of whether or not they do the fine-tuning, developers increasingly need to understand how to use LLMs to improve certain tasks and workflows in their applications. So LLaMA is worth checking out, especially since it’s more open than GPT and other popular LLMs.
in Insights
James Huang
June 10, 2022