Attention-based neural architecture that underpins most modern large language models.

2017 Attention Is All You Need Language modeling Machine translation Text summarization

Architecture that routes each input through a small subset of specialized sub-networks, enabling much larger total parameter counts at manageable inference cost.

2017 Scaling model capacity Reducing per-token compute

Training a pre-trained model on labelled input-output pairs to adapt it to a specific task or style.

Task specialization Instruction following

Training method that uses human preference data to shape a reward model, then optimizes the language model against it.

2017 Alignment Reducing harmful outputs

An alternative to reward-model-based RLHF that learns directly from preference pairs without training a separate reward model.

2023

Pattern that retrieves relevant documents at query time and provides them as context to a language model, improving accuracy on knowledge-intensive tasks.

2020 Question answering Enterprise search Documentation assistants

Inference acceleration technique where a small draft model proposes tokens that a larger model then verifies in parallel.

2022

A dense vector representation of text, images, or other data that preserves semantic similarity under geometric operations.

Semantic search Clustering Classification

A component that splits raw text into discrete units (tokens) that a model can process, typically using byte-pair encoding or similar schemes.

Input preparation Compression of vocabulary