RAG
読み方:R-A-G
Retrieval-Augmented Generation. A technique that enhances LLM responses by retrieving relevant documents from a knowledge base and including them as context in the prompt. Enables AI to answer questions based on proprietary or up-to-date information not in the model's training data.
What is RAG
RAG (Retrieval-Augmented Generation) solves a fundamental LLM limitation: models only know what was in their training data. RAG connects an LLM to an external knowledge base, allowing it to answer questions from your company documents, policies, or real-time data.
How RAG Works
1. Documents are split into chunks (300–500 tokens each)
2. Chunks are converted to vector embeddings and stored in a vector database
3. User's query is also embedded
4. Similar chunks are retrieved via similarity search
5. Retrieved chunks + query are passed to the LLM for answer generation
When to Use RAG
- • Employee Q&A from internal knowledge bases
- • Customer support automation using product documentation
- • Legal and compliance information retrieval
- • Sales assistance (proposal generation from internal data)
RAG vs. Fine-Tuning
RAG is better for frequently-changing information and specific document retrieval. Fine-tuning is better for style adaptation or domain-specific reasoning patterns.