Contemporary Large Language Models (LLMs) undergo pre-training on extensive self-supervised textual data and are then fine-tuned to align with human preferences using techniques like reinforcement learning with human feedback (RLHF).
The evolution of LLMs has been swift over the past decade, notably since the inception of GPT (Generative Pre-trained Transformer) in 2012. Google’s BERT, introduced in 2018, marked a substantial advancement in capability and architecture. Subsequent to this, OpenAI released GPT-3 in 2022 and GPT-4 in the current year.
Simultaneously, although the open-sourcing of AI models raises concerns due to the potential for misuse, including spam generation and dissemination of disinformation, there has been a surge in open-source alternatives in recent months. Meta’s Llama 2 is one such example introduced recently.
Use Cases for LLMs
As we navigate the novelty of this technology, the capabilities of Large Language Models (LLMs) are undeniably impressive, offering a wide array of potential applications in the business domain. These applications span from serving as chatbots in customer support scenarios to generating code for developers and even business users. LLMs exhibit utility in audio transcription, summarization, and paraphrasing, as well as in translation and content creation.
Consider a scenario where customer meetings are transcribed and summarized in near real-time by a suitably trained LLM, with the results shared across sales, marketing, and product teams. Similarly, an organization’s web pages could be automatically translated into different languages, with the option for quick review and correction by human reviewers.
In the coding realm, several popular internal development environments now support AI-powered code completion, with GitHub Copilot and Amazon CodeWhisperer leading the way. Additionally, LLMs show promise in applications such as natural language database querying and the generation of developer documentation from source code.
In specific industries dealing with unstructured data, LLMs prove beneficial. Madhukar Kumar, CMO of SingleStore, a relational database company, cites the example of wealth management, where clients leverage LLMs for querying vast amounts of unstructured data stored in formats like legal documents in PDFs. These applications involve both deterministic and non-deterministic querying, allowing users to extract insights from data like income statements of individuals aged 45 to 55 who recently left their jobs.
Large language models are also applied in sentiment analysis, aiding organizations in collecting data to enhance customer satisfaction and identifying prevalent themes and trends in large text datasets for more informed decision-making.
It’s crucial to note, however, that LLMs lack factual reliability and should be employed with human oversight in settings where accuracy is paramount. Training an LLM from scratch is a substantial undertaking, making it more practical to build upon existing models. As this field rapidly evolves, we’ve collaborated with Madhukar Kumar to highlight what we believe are the top five LLMs currently, offering a valuable starting point for those exploring potential uses for LLMs.
The Top 5 LLMs
1. GPT-4
At present, GPT-4 stands out as a leading model, and OpenAI has crafted a remarkable product centered around it, featuring an efficient ecosystem that facilitates the creation of plugins and the execution of code and functions. GPT-4 excels, especially in tasks related to text generation and summarization.
“If you look at GPT-4,” Kumar said, “it is a little bit more conservative but it is far more accurate than 3.5 was, particularly around code generation.”
2. Claude 2
Anthropic’s Claude 2, introduced in July this year, is accessible through an API and a new public beta website, claude.ai.
Claude’s primary advantage lies in the size of its context window, recently expanded from 9K to 100K tokens. This is significantly larger than the maximum 32K tokens supported by GPT-4 at the time of this writing. The expanded context window equates to approximately 75,000 words, enabling businesses to submit extensive material for Claude to process.
3. Llama 2
“Llama 2, recently unveiled by Meta, marks the first open-source model on our list, although some industry observers challenge Meta’s designation of Llama 2 as “open source.”
It is freely available for both research and commercial purposes, with certain oddly specific restrictions in the license. For instance, if the technology is applied in an application or service with over 700 million monthly users, a special license is mandated from Meta. Additionally, the community agreement prohibits the use of Llama 2 for training other language models.
While open source offers advantages, particularly for research purposes, the substantial cost involved in training and fine-tuning models means that, at least for now, commercial Large Language Models (LLMs) generally exhibit superior performance.
As described in the Llama 2 whitepaper, “[C]losed product LLMs are heavily fine-tuned to align with human preferences, significantly enhancing their usability and safety. This process may involve substantial costs in compute and human annotation and is often not transparent or easily reproducible, limiting advancements in AI alignment research within the community.”
In February, Meta released LLaMA, the precursor to Llama 2, as source-available with a non-commercial license. It subsequently leaked and gave rise to various fine-tuned models, including Alpaca from Stanford University and Vicuna, developed by a team comprising the University of California, Berkeley, Carnegie Mellon University, Stanford, and UC San Diego.
Although these models adopted a unique approach involving training with synthetic instructions and exhibited promise, the Llama 2 paper suggested that “they fall short of the bar set by their closed-source counterparts.”
Despite this, open-source models come at no cost. Therefore, while determining the utility of this technology in a specific use case, Llama 2 could serve as a valuable starting point.
4. Orca
Orca, developed by Microsoft Research, stands out as the most experimental model in our selection. Its intriguing aspect lies in being a smaller open-source model that employs a distinct technique called progressive learning to self-train from large foundation models.
Essentially, Orca has the ability to learn from models such as GPT-4 through imitation, enhancing its own reasoning capabilities. This approach may suggest a potential avenue for open-source models to rival their closed-source counterparts in the future, making Orca a model worth monitoring.
5. Cohere
Cohere, a commercial offering, is developed by a company co-founded by Aidan Gomez, a co-author of the influential transformer research paper titled ‘Attention Is All You Need.’ Positioned as a cloud-neutral vendor, Cohere is strategically aiming at enterprises, evident from its recent partnership announcement with McKinsey.
Picking an LLM
After creating a shortlist of LLMs and pinpointing a couple of low-risk use cases for experimentation, you can conduct multiple tests using different models to determine the most effective one, akin to evaluating an observability tool or a similar solution.
Additionally, it’s worth exploring the possibility of employing multiple LLMs simultaneously. According to Kumar, “I think that the future is not just picking one but an ensemble of LLMs that are good at different things.”
However, all these considerations are only valuable if you have timely access to data. Kumar highlighted that contextual databases like SingleStore play a crucial role in harnessing the power of LLMs. He noted, “To truly use the power of LLMs, you need the ability to do both lexical and semantic search, manage structured and unstructured data, handle both metadata and the vectorized data, and handle all of that in milliseconds, as you are now sitting between the end user and the LLM’s response.”