Introduction

As AI continues to advance, the way we train models also evolves. Modern AI concepts like Retrieval-Augmented Generation, Agentic AI, and multimodal models open up new possibilities while requiring more sophisticated training processes to meet today’s demands. Foundational techniques such as Supervised Fine-Tuning and Reinforcement Learning from Human Feedback can be adapted to align with these advancements.


Retrieval-Augmented Generation (RAG)



What is RAG?

Retrieval-augmented generation, or RAG, enhances a model’s ability to provide accurate and reliable outputs by integrating retrieval and generation, and fine-tuning an RAG model focuses on those 2 critical steps.


The Retrieval Step:

The retrieval step trains the model to find the most relevant information from external sources such as a database or knowledge base for answering an input query. This combination of retrieving and generating ensures the model’s outputs are grounded in factual information.


For example:

Input: What are the symptoms of diabetes?

Knowledge Base:

• Document 1: “...Symptoms include excessive thirst and fatigue...”

• Document 2: “...Diabetes affects blood sugar regulation...”


Document 1 is more relevant because it contains the information needed to answer the question in the input. Trainers teach the model to prioritize this information by ranking documents based on how well they address the query. (Side note: Ranking, if you recall, is part of Reinforcement Learning from Human Feedback.)


The Generation Step:

Once the model retrieves relevant information, it generates a response. At this stage, trainers evaluate whether the output aligns with the retrieved content.


For example:

If the model generates the output: “The symptoms include thirst and hives..." it would be labeled as inaccurate and edited since “hives” wasn’t mentioned in the retrieved content.


Agentic AI

Agentic AI models take things a step further by autonomously performing multi-step tasks through interactions with external tools, APIs, or databases. These models act like “agents” that can make decisions and take action to achieve specific goals.

Here's an example of an Agentic Model in action:


Fine-Tuning Agentic AI Models

Training these models involves these essential elements:


1. Reasoning

Agentic models need to understand the user’s goals and reason through the best actions to achieve them. This involves evaluating options, weighing their relevance, and making informed decisions about how to proceed.


For Example:

Input: Help me adjust my financial plan after unexpected medical expenses.


The model would reason through the situation by assessing the user’s financial context, identifying urgent needs, and deciding whether to suggest a budget adjustment, explore loan options, or propose other solutions based on what would best achieve the user’s goal.

Trainers use scenarios like this to teach the model how to evaluate the context, prioritize the most relevant options, and reason through the optimal action to take.


2. Teaching Tool Use and Workflow Generation

The model learns how to create a relevant workflow by determining which tools to use and what steps are necessary to complete a task.


For Example:

Input: Book a flight to DC tomorrow.

Workflow:

• Extract key details like the location(use location data to determine where the user is and where "DC" is), travel date(from a calendar), and preferences(from flight history using a travel API tool).

• Use a flight-booking API tool to retrieve available flight options.

• Select the best flight based on preferences and criteria such as cost or timing.

• Finalize the booking (using a booking platform API) and send a confirmation email.

Trainers guide the model through these actions, ensuring it understands how to interact with tools and perform each step correctly.


3. Evaluating the Workflow

During workflow execution, the model carries out the steps it has planned. Trainers focus on monitoring the model’s performance to ensure it executes each step accurately, efficiently, and in line with the task’s objectives.


For Example:

Input: Create a project plan for launching a new app.

While the model executes the workflow, trainers evaluate:

• Task Completion: Is the model accurately executing each step, such as listing tasks, assigning deadlines, and organizing the plan?

• Consistency: Are all steps logically connected, with no skipped or repeated actions?

• Precision: Does the model follow through with the details and constraints of the task, like aligning deadlines with project goals?


Trainers also monitor for potential errors or inefficiencies, such as unnecessary steps or deviations from the intended workflow. If needed, corrections or adjustments are made in real-time to improve the model’s overall performance.


Multimodal AI Models

Multimodal models are capable of processing and understanding multiple types of data such as text, images, and audio all at once. This allows them to perform complex analyses or generate richer outputs by combining insights from different sources.


Training Multimodal Models focuses on two key areas:


1. Understanding Relationships Between Data Types

Trainers teach the model how to link different types of data to create a unified understanding.


For Example:

The input includes an image of a broken bone along with a medical report.

The model needs to:

• Identify the fracture in the image.

• Relate it to the correct terminology or diagnosis in the text.


Trainers guide the model to make these connections smoothly and accurately.


2. Ensuring Accuracy and Context

Trainers also assess whether the model’s outputs are both accurate and appropriate in context.


For Example: If the model generates a response based on both visual and text data, trainers ensure the output reflects the full meaning and intent of the combined input. This involves checking for the correct alignment between the data types and ensuring the output is not misinterpreted or out of context.


Modern AI models like RAG, Agentic AI, and multimodal systems are pushing the boundaries of what AI can achieve. They tackle real-world challenges with precision and versatility, but their success depends on effective training. And with that, we have now come to the end of the AI Data training techniques section of the course.