Attention-based neural architecture that underpins most modern large language models.

2017

Attention Is All You Need


Language modeling

Machine translation

Text summarization


Architecture that routes each input through a small subset of specialized sub-networks, enabling much larger total parameter counts at manageable inference cost.

2017


Scaling model capacity

Reducing per-token compute


Training a pre-trained model on labelled input-output pairs to adapt it to a specific task or style.


Task specialization

Instruction following


Training method that uses human preference data to shape a reward model, then optimizes the language model against it.

2017


Alignment

Reducing harmful outputs


An alternative to reward-model-based RLHF that learns directly from preference pairs without training a separate reward model.

2023


Pattern that retrieves relevant documents at query time and provides them as context to a language model, improving accuracy on knowledge-intensive tasks.

2020


Question answering

Enterprise search

Documentation assistants


Inference acceleration technique where a small draft model proposes tokens that a larger model then verifies in parallel.

2022


A dense vector representation of text, images, or other data that preserves semantic similarity under geometric operations.


Semantic search

Clustering

Classification


A component that splits raw text into discrete units (tokens) that a model can process, typically using byte-pair encoding or similar schemes.


Input preparation

Compression of vocabulary