AI Glossary: What Do LLM, RAG, and Fine-Tuning Actually Mean? (30 Terms)

LLM (Large Language Model)

A type of AI model trained on massive amounts of text data that can generate, summarize, translate, and reason about language. Examples include GPT-4, Claude, and Gemini.

Prompt

The text input you give to an AI model to get a response. Better prompts lead to better outputs — prompt engineering is the skill of crafting effective instructions.

Hallucination

When an AI model generates information that sounds plausible but is factually incorrect. All current LLMs can hallucinate, which is why fact-checking AI output matters.

Fine-tuning

The process of training a pre-existing AI model on your own specific data to make it better at a particular task or to match your brand's voice.

RAG (Retrieval-Augmented Generation)

A technique where an AI model retrieves relevant documents from a database before generating a response, reducing hallucinations and grounding answers in real data.

Token

The basic unit of text that AI models process. Roughly 1 token = ¾ of a word. When tools mention 'context window' or 'token limits,' they're referring to how much text the model can handle at once.

Context Window

The maximum amount of text an AI model can process in a single conversation. Claude has 200K tokens (~150K words), while GPT-4 Turbo has 128K tokens.

Multimodal

An AI model that can understand and generate multiple types of content — text, images, audio, video — not just text. GPT-4o and Gemini are multimodal models.

Agentic AI

AI systems that can autonomously plan, execute multi-step tasks, use tools, and make decisions without constant human input. Claude Code and Devin are examples of agentic AI.

Vibe Coding

A 2025/2026 trend where people describe what they want to build in plain English, and AI generates working code. Popularized by tools like Cursor, Lovable, and Bolt.new.

API (Application Programming Interface)

A way for developers to access AI capabilities programmatically. Most AI tools offer APIs so you can build custom integrations into your own products.

Embedding

A numerical representation of text (or images/audio) that captures its meaning. Used for semantic search, recommendations, and finding similar content.

Inference

The process of running a trained AI model to generate predictions or outputs. When you chat with ChatGPT, you're running inference on OpenAI's servers.

Transformer

The neural network architecture behind modern AI models like GPT, Claude, and Gemini. Introduced in 2017, transformers use 'attention' mechanisms to process language.

Open Source AI

AI models whose code and weights are publicly available. Examples include Llama (Meta), Mistral, and Stable Diffusion. Anyone can run, modify, or build on them.

MCP (Model Context Protocol)

An open standard created by Anthropic that lets AI assistants connect to external tools and data sources. Think of it as USB-C for AI integrations.

AI Agent

An AI system that can independently perform tasks by planning steps, using tools, browsing the web, or executing code. More autonomous than a chatbot.

Diffusion Model

The AI architecture behind image generators like Midjourney, DALL-E, and Stable Diffusion. Works by learning to remove noise from images, generating new images in reverse.

Text-to-Speech (TTS)

AI technology that converts written text into spoken audio. Modern TTS from ElevenLabs and Play.ht produces voices nearly indistinguishable from humans.

Voice Cloning

Using AI to replicate a specific person's voice from a short audio sample. The cloned voice can then speak any text. Raises both exciting possibilities and ethical concerns.

Prompt Injection

A security vulnerability where malicious instructions are embedded in input to trick an AI model into ignoring its original instructions or revealing private data.

Benchmark

A standardized test used to evaluate and compare AI models. Examples include MMLU (knowledge), HumanEval (coding), and MT-Bench (conversation quality).

Temperature

A setting that controls how creative or random an AI model's output is. Low temperature (0.0) = predictable and focused. High temperature (1.0+) = creative and varied.

Zero-shot / Few-shot

Zero-shot means asking an AI to do a task with no examples. Few-shot means providing a few examples first. Few-shot prompting usually gets much better results.

RLHF (Reinforcement Learning from Human Feedback)

A training technique where humans rate AI outputs, and the model learns to produce responses humans prefer. Key to making ChatGPT and Claude helpful and safe.

Latent Space

The internal representation where AI models store compressed understanding of data. When you ask Midjourney for 'a sunset over mountains,' it navigates latent space to generate the image.

Quantization

A technique to make AI models smaller and faster by reducing the precision of their numerical weights. Allows large models to run on consumer hardware.

AI Guardrails

Safety mechanisms built into AI systems to prevent harmful outputs. Includes content filters, usage policies, and technical constraints on model behavior.

Synthetic Data

Data generated by AI rather than collected from the real world. Used to train other AI models when real data is scarce, expensive, or privacy-sensitive.

Edge AI

Running AI models directly on a device (phone, laptop, car) rather than in the cloud. Enables offline use, lower latency, and better privacy.

📖 AI Glossary — 30 Terms Explained