Large language models (LLMs) are the newest in a series of rapid developments in the field of generative AI. They make it possible for machines to write, code, draw and create ways that are starting to come close to human-level execution. ChatGPT is grabbing headlines, but it’s worth noting it is only the latest in an already long line of innovations.
We want to take this opportunity to summarize some of the pros, cons and question marks surrounding this emerging technology. Frankly speaking, we are still in the early days and much is unclear. But with the huge influx of attention, predominantly generated by ChatGPT, companies want to know if this technology can bring value to their business, now or in the future.
Commercial use
Let’s start with some practical information. Commercially available LLMs are quite new. They are generally sold as a ‘Model as a Service’ accessible through an API. Some of today’s providers are: OpenAI, Cohere, Goose.ai, 24AILabs, and Bloom (open source).
These foundational models, as they are also referred to, can be used to build applications on top of them. They can be fine-tuned with custom data with the goal to: 1) improve the model’s performance for a specific problem or task, and; 2) decrease model’s size and costs.
As things stand right now, there are a handful of bot platform providers who have started to implement this technology into their existing solutions, but since we are in uncharted territory, it’s hard to tell how fast they will move and what that innovation will look like. That doesn’t mean that your business can’t start experimenting with this technology, however, caution is advised.
The pro’s of large language models
Let’s start with the positive case for large language models. There’s a lot of noise when it comes to its potential. At CDI Services we like to get to the bottom of things, so let’s break down what makes them so powerful and revolutionary.
- Natural Language Understanding (NLU)
In existing bot platforms, AI trainers need to provide the model with so-called training phrases. Getting an AI assistant to accurately distinguish between 200-300 intents and handling context is considered to be a monumental task. Managing the cognition of any enterprise AI assistant therefore consists of hard work and a lot of time and resources is spent on it. Compare that with large language models. Anyone who has interacted with ChatGPT can’t help but acknowledge that it has an astonishing ability to interpret language. It never says ‘I don’t get that’. While much of the attention in large language models revolves around their capabilities to generate language, the leap in natural language understanding is what really blows existing bot platforms out of the water. Leveraging that power will reduce reliance on AI trainers and significantly improve customer experiences with AI assistants. - Natural Language Generation (NLG)
Large language models are unmatched when it comes to computer-generated text. They can produce language that is compelling, grammatically correct, in any tone of voice you like. All of this is achieved by giving the model instructions on what you want the output to be in natural language. In the short term, LLMs can assist conversation designers in the design process, most likely resulting in increased productivity and higher quality output. It can help you write training phrases and entities, write variations on responses, summarize knowledge articles or help with translating. In the long term, it could very well be that we don’t need pre-written answers at all anymore and we’ll generate everything in real-time, instead. - Information retrieval
Another exceptional ability of LLMs is to find and extract relevant information from other text sources, such as knowledge articles. This is also known as semantic search, and basically allows you to query text sources and receive precisely the piece of information that you were looking for in language that is easy-to-understand. It could be used to augment AI assistants or become a first step into leveraging existing knowledge bases more efficiently. Performance can be enhanced even further through fine-tuning and prompt engineering. It will undoubtedly lead to the emergence of all kinds of hybrid solutions that combine the power of conversational interfaces with the information retrieval capabilities of LLMs.
The cons of large language models
So far the happy part. We have to remain critical, too. It looks like large language models like ChatGPT have some clear limitations. Most of these are due to the nature of the technology. Overcoming them is crucial if we want to leverage this technology for enterprise conversational AI.
- Not suitable for goal-oriented conversations (yet)
Most enterprise AI assistants aren’t purely information-retrieval machines, they can also take action on behalf of the customer or employee via integrations with the company’s back-end systems. This is currently not something you can do with LLMs, which is a clear barrier for enterprise application of this technology. This might change in the future. - Lack of control
Large language models are phenomenal at predicting answers, but they aren’t foolproof. They occasionally generate partially incorrect information or fabricate information altogether. Another downside is the fact that most models tend to be ‘text-in, text-out’ making it difficult to leverage visual components like buttons, video, carousels, and cards. Existing CAI solutions on the other hand allow you to have full control over how you structure your intents and what the matching responses should look like – this can be considered a major advantage. - Lack of explainability
Another big difference between existing CAI solutions and large language models is that it’s hard to see what’s going on under the hood. When the model produces an unsatisfactory answer, it’s virtually impossible to find out why and how that happened. It’s basically a black box, which might raise compliance issues for industries like healthcare, financial services, and insurance.
Open-standing questions
Finally, there are some question marks that we need answers for, but do not have yet. Some things are uncertain, other things simply unknown.
- Unclear ROI
Even if the aforementioned problems are resolved, it’s unclear if large language models are a viable solution from a cost standpoint. LLM providers are still working on refining the pricing. They vary from charging per API call to the number of characters or words. You will have to think about how to integrate an LLM into your stack and optimize the length of the prompts used, in order to minimize costs. As an interesting comparison, Google Dialogflow CX currently charges $0.007 per API call for a text input, which will allow for intent detection, entity extraction, next action and response. Using GPT-3 may require you to have one model for entity extraction, a separate model for intent classification, and a third to manage responses. Just a single API call to GPT-3 text-Davinci-003 would cost $0.02 per approximately 750 words, which would include the input prompt, any training phrases, and the output. - Privacy and data governance
Another key consideration before you jump into using a LLM is data privacy. How will you maintain compliance with GDPR? Is the data you feed into the LLM going to be used for improvements or incorporated into the LLMs training data? How safe is it to integrate a large language model in your tech stack? Can it be exploited as part of a cyber attack? Lots of question marks here. - Truthfulness and robustness
Lastly, it’s important to stress the issue of truthfulness. Although, it’s a bit of a misleading term. It’s more precise to talk about accuracy. Like we mentioned before, LLMs sometimes generate partially incorrect information or outright make stuff up. It is unclear exactly how reliable models are and if they are suitable for enterprise deployment. Even if you’re a company with a higher risk tolerance, be mindful.
In conclusion
Large language models like ChatGPT are challenging our thinking about the way we currently design and build AI assistants. Already they are being used to support people creatively, from generating training phrases to first drafts of dialogues, making the day-to-day work of conversation designers and AI trainers easier.
It looks like LLMs won’t replace the current-day bot platforms yet. Instead, the technology will most likely be used to augment these platforms’ language understanding capabilities, which in turn helps us build better AI assistants. All kinds of hybrid applications will emerge over time, combining the strengths of LLMs whilst simultaneously mitigating some of the risks and weaknesses, by embedding them in solutions that offer more control and transparency.
Rest assured, we will be on top of these developments. Our experts can help you explore the possibilities, now and in the future, so you can derive real business value from LLMs. If you are looking for guidance on the topic, feel free to reach out.
Editor’s note: This article was written completely without using ChatGPT.