RAG

読み方：R-A-G

Retrieval-Augmented Generation. A technique that enhances LLM responses by retrieving relevant documents from a knowledge base and including them as context in the prompt. Enables AI to answer questions based on proprietary or up-to-date information not in the model's training data.

What is RAG

RAG (Retrieval-Augmented Generation) solves a fundamental LLM limitation: models only know what was in their training data. RAG connects an LLM to an external knowledge base, allowing it to answer questions from your company documents, policies, or real-time data.

How RAG Works

1. Documents are split into chunks (300–500 tokens each)

2. Chunks are converted to vector embeddings and stored in a vector database

3. User's query is also embedded

4. Similar chunks are retrieved via similarity search

5. Retrieved chunks + query are passed to the LLM for answer generation

When to Use RAG

• Employee Q&A from internal knowledge bases
• Customer support automation using product documentation
• Legal and compliance information retrieval
• Sales assistance (proposal generation from internal data)

RAG vs. Fine-Tuning

RAG is better for frequently-changing information and specific document retrieval. Fine-tuning is better for style adaptation or domain-specific reasoning patterns.

RAG

What is RAG

How RAG Works

When to Use RAG

RAG vs. Fine-Tuning

関連用語