Retrieval Augmented Generation

What is it?

Retrieval Augmented Generation, or RAG, is a generative AI architecture design that enhances the performance of GenAI services by consulting an authoritative source of knowledge before producing a response.

How does it work?

To start, a user inputs a prompt or question into a GenAI service. Next, the RAG architecture begins retrieving the resources most relevant to the user's prompt. After that, the service takes the relevant content, combines it with the user prompt, and sends it all to the large language model (LLM). The LLM uses all of this data to generate a more informed response for the user.

a flowchart showing the way data moves through a RAG system

Consider a theoretical example of RAG at BYU: A college sets up a chatbot using generative AI on its website that students can ask questions about specific programs, classes, and department policies. A general AI model wouldn't know the answers to these questions because they are not in its training data. However, an AI model with RAG would have access to the college's documents and would use them to give a more informed response. When a student asks a question, the model searches through the documents, finds the answer, and uses it in its answer.

Why use RAG over other types of AI architectures?

RAG architecture gives a GenAI service access to organization-specific data, ensuring that its answers are based on up-to-date, accurate information. RAG architecture also gives organizations greater control over the data that their models use to respond to questions, which can help the service perform as expected and avoid misleading answers.

What are some downsides of RAG?

Despite the greater accuracy that RAG architecture can provide, hallucinations are still possible. Another downside is the fact that the amount of data that the GenAI service is trained on has a major effect on the quality of the answer. Too little training data can cause the GenAI service to hallucinate and output incorrect data, while too much training data can result in the service giving unrelated or confusing answers. Lastly, connecting databases to LLMs, a fundamental requirement of RAG, adds additional complexity to an organization's IT systems.

Fine-Tuning

Fine-tuning, a concept closely related to RAG, is the idea of training a large language model to become an expert in completing a specific task. With the use of RAG architecture, documents can be supplied to a GenAI service in order to further train a model to give a specific answer based on a specific need. RAG is an effective way to fine tune AI models on specific subsets of tasks and information. OpenAI's ChatGPT is a great example of how models can be trained for specific needs: it has been fine-tuned to act as a digital personal assistant.