An Interactive Guide to Foundational Models
Generative AI is a type of AI that can create new content—like text, images, or code. Its engine is a Neural Network, a system inspired by the human brain that learns patterns from data. A Large Language Model (LLM) is a massive neural network trained on vast amounts of text.
At its core, an LLM is a machine built to do one thing incredibly well: predict the next word in a sequence based on probability.
The journey begins with unimaginable amounts of data—a significant portion of the digital world. This is filtered down to trillions of high-quality words.
Text is broken into numerical IDs called tokens. Each token ID points to a high-dimensional vector (an embedding) that represents its meaning.
The Transformer architecture uses a self-attention mechanism to understand context. During pre-training, the model adjusts trillions of parameters over months by constantly predicting the next word in its vast dataset. This is where it learns grammar, facts, and reasoning.
Each layer in a neural network performs a simple linear algebra operation followed by a non-linear activation. (Click to see calculation)
Input = [0.5, -0.2]
Weights (W) = [[0.7, 0.2], [-0.1, 0.4]]
Bias (b) = [0.1, -0.1]
Result = W * Input + b = [0.47, -0.08]
Self-attention is a special layer that dynamically weighs the importance of other words in a sequence. It projects the input into three matrices: Query, Key, and Value.
X = [[1, 0, 1, 0], [0, 1, 0, 1]]
Q = [[1,1,2],[1,1,1]], K = [[0,2,1],[2,1,1]], V = [[1,2,3],[1,4,0]]
Scores = QK^T = [[4,5],[3,4]] Weights = softmax(Scores) = [[0.35, 0.65], [0.36, 0.64]]
Output = Weights * V = [[1.0, 3.3, 1.05], [1.0, 3.28, 1.08]]
The model predicts, we compare it to the truth, and calculate a "loss" score representing the error. (Click for details)
Backpropagation uses the chain rule from calculus to find the gradient of the loss for every single weight. (Click for details)
We adjust each weight by taking a small step in the opposite direction of its gradient. This is called Gradient Descent. (Click for details)
Alignment teaches the model to be helpful and safe using human feedback.
Humans rank different model responses to the same prompt. (Click)
A model is trained to predict human preference scores. (Click)
The LLM is tuned to maximize the score from the Reward Model. (Click)
"Inference" is using the trained model. It produces raw scores (logits) for all possible next tokens. The Softmax function converts these scores into probabilities.
An LLM becomes an Agent when it can use tools to accomplish multi-step tasks. To do this reliably, its output must be constrained.
The agent thinks, acts by calling a tool, observes the result, and repeats.
🤔 Thought: I need to find a number first...
🎬 Action: SearchAPI("Messi goals 2023")🔭 Observation: "Scored 11 goals"
✅ Final Answer: "The square root of 11 is 3.32."
To use tools, the agent must generate perfect code, like JSON. We can force its output to follow a strict format (a schema).
// Constraint (JSON Schema)
{ "name": "string", "age": "integer" }
// Guaranteed Valid Output
{ "name": "John Doe", "age": 34 }
The field of Generative AI is evolving at an incredible pace. What was state-of-the-art yesterday is foundational today. The core principles of data, architecture, and feedback, however, remain central to this progress.