Attention-based neural architecture that underpins most modern large language models. 2017 Attention Is All You Need Language modeling Machine translation Text summarization Architecture that routes each input through a small subset of specialized sub-networks, enabling much larger total parameter counts at manageable inference cost. 2017 Scaling model capacity Reducing per-token compute Training a pre-trained model on labelled input-output pairs to adapt it to a specific task or style. Task specialization Instruction following Training method that uses human preference data to shape a reward model, then optimizes the language model against it. 2017 Alignment Reducing harmful outputs An alternative to reward-model-based RLHF that learns directly from preference pairs without training a separate reward model. 2023 Pattern that retrieves relevant documents at query time and provides them as context to a language model, improving accuracy on knowledge-intensive tasks. 2020 Question answering Enterprise search Documentation assistants Inference acceleration technique where a small draft model proposes tokens that a larger model then verifies in parallel. 2022 A dense vector representation of text, images, or other data that preserves semantic similarity under geometric operations. Semantic search Clustering Classification A component that splits raw text into discrete units (tokens) that a model can process, typically using byte-pair encoding or similar schemes. Input preparation Compression of vocabulary