Large language models (LLMs) are becoming the bread and butter of modern NLP applications and have, in many ways, replaced a variety of more specialized tools such as named entity recognition models, question-answering models, and text classifiers. As such, it’s difficult to imagine an NLP product that doesn’t use an LLM in at least some fashion. While LLMs bring a host of benefits such as increased personalization and creative dialogue generation, it’s important to understand their pitfalls and how to address them when integrating these models into a software product that serves end users. As it turns out, monitoring is well-posed to address many of these challenges and is an essential part of the toolbox for any business working with LLMs.
Data, Privacy, and Prompt Injection
Data and privacy
Privacy and data usage are among the primary concerns of the modern day consumer, and in the wake of well-known data sharing scandals such as Cambridge Analytica consumers are becoming less and less likely to use services and products that put their personal privacy at risk. While LLMs provide users with an incredible degree of personalization, it’s important to understand the risks they pose. As with all machine learning models, LLMs are vulnerable to targeted attacks designed to reveal training data and they are particularly at risk due to their generative nature and can even leak data accidentally while performing free-form generation. For example, in a 2020 blog post, Nicholas Carlini, a research scientist at Google Brain, discussed how LLMs such as GPT can be prompted in a way that leads them to reveal personally identifiable information such as name, address, and email address that are contained in the model’s training data. This suggests that businesses that fine-tune LLMs on their customer’s data are likely to engender these same sorts of privacy risks. Similarly, a paper from researchers at Microsoft corroborates these claims as well as suggests specific mitigation strategies which utilize techniques from differential privacy in order to train LLMs while reducing data leakage concerns. Unfortunately, many businesses cannot leverage these techniques due to using LLM APIs that do not give them control over the fine-tuning process. The solution for these companies lies in inserting a monitoring step that validates and constrains a model’s outputs prior to returning the results to an end user. In this way, businesses can identify and flag potential instances of training data leakage prior to the actual occurrence of a privacy violation. For example, a monitoring tool can apply techniques such as Named Entity Recognition and Regex filtering to identify names of persons, addresses, emails, and other sensitive information generated by a model before it gets into the wrong hands. This is particularly essential for organizations working in a privacy-restricted space such as healthcare or finance where strict regulations such as HIPAA, and FTC/FDIC come into play. Even businesses who simply work internationally are at risk of violating complex location-specific regulations such as the EU’s GDPR.
Prompt injection refers to the (often malicious) process of designing LLM prompts that somehow “trick” or confuse the system into providing harmful outputs. For example, a recent article showed how well-designed prompt injection attacks make it possible to subvert OpenAI’s GPT-4 model and have it provide factually false information and even promote conspiracy theories. One can imagine even more nefarious scenarios in which a user prompts an LLM to provide advice on how to build a bomb, to give details on how to best commit suicide, or to generate code that can be used to infect other computers. Vulnerability to prompt injection attacks is an unfortunate side effect of how LLMs are trained, and it’s difficult to do anything on the front-end that will prevent every possible prompt injection attack. Even the most robust and recent LLMs, such as OpenAI’s ChatGPT – which was aligned specifically for safety – have proven vulnerable to prompt injections.
Due to the myriad ways in which prompt injection can manifest, it’s nearly impossible to guard against all possibilities. As such, monitoring of LLM generated outputs is crucial as it provides a mechanism for identifying and flagging specious information as well as outright harmful generations. Monitoring can use simple NLP heuristics or additional ML classifiers to flag responses from the model that contain harmful content and intercept them before they are returned to the user. Similarly, monitoring of the prompts themselves can catch some of the harmful ones prior to their being passed to the model.
The term hallucination refers to the propensity of an LLM to occasionally “dream up” outputs that are not actually grounded in reality. Prompt injection and hallucinations can manifest as two sides of the same coin, although with prompt injection the generation of falsities is a deliberate intention of the user, whereas hallucinations are an unintended side effect of an LLM’s training objective. Because LLMs are trained to, at each time step, predict the next most likely word in a sequence, they are able to generate highly realistic text. As a result, hallucinations are a simple consequence of the fact that what is most likely is not always true.
The latest generation of LLMs, such as GPT-3 and GPT-4, are optimized using an algorithm called Reinforcement Learning from Human Feedback (RLHF) in order to match a human’s subjective opinion of what makes a good response to a prompt. While this has allowed LLMs to reach higher levels of conversational fluency, it also sometimes leads them to speak too confidently when issuing their responses. For example, it is not uncommon to ask ChatGPT a question and have it confidently give a reply that seems plausible at first glance, yet which upon further examination turns out to be objectively false. Infusing LLMs with the ability to provide quantifications of uncertainty is still very much an active research problem and is not likely to be solved anytime soon. Thus, developers of LLM-based products should consider monitoring and analyzing outputs in an attempt to detect hallucinations and yield more nuanced responses than what LLM models provide out-of-the-box. This is especially vital in contexts where outputs of an LLM might be guiding some downstream process. For example, if an LLM chatbot is assisting a user by providing product recommendations and helping to place an order on a retailer’s website, monitoring procedures should be in effect to ensure that the model does not suggest purchasing a product that is not actually sold on that retailer’s website.
Because LLMs are becoming increasingly commoditized via APIs, it’s important that businesses integrating these models into their products have a plan in place to prevent unbounded increases in costs. Without safeguards in place, it can be easy for users of a product to generate thousands of API calls and issue prompts with thousands of tokens (think of the case where a user copy-pastes an extremely long document into the input and asks the LLM to analyze it). Because LLM APIs are usually metered on the basis of number of calls and token counts (both in the prompt and the model’s response), it’s not difficult to see how costs can rapidly spiral out of control. Therefore, businesses need to be mindful in how they create their pricing structures in order to offset these costs. Furthermore, businesses should have monitoring procedures in place that allow them to understand how surges in usage impact costs and allow them to mitigate these surges by imposing usage caps or taking other remediative measures.
Every business that uses LLMs in their products should be sure to incorporate monitoring into their systems in order to avoid and address the many pitfalls of LLMs. In addition, the monitoring solutions used should be specifically geared towards LLM applications and allow users to identify potential privacy violations, prevent and/or remediate prompt injections, flag hallucinations, and diagnose rising costs. The best monitoring solutions will address all of these concerns and provide a framework for businesses to ensure that their LLM-based applications are ready to be deployed to the public. Have confidence your LLM application is fully optimized and performing as intended by booking a demo to see Mona's comprehensive monitoring capabilities.