- Top-k sampling: Keeps only the k highest probability tokens, renormalizes the distribution, then samples. Controls diversity by limiting the vocabulary size to the most likely tokens
- Top-p sampling: Filters tokens using cumulative probability threshold (nucleus sampling). Dynamically adjusts vocabulary size based on probability mass, maintaining diversity while avoiding low-probability tokens
- Top-k + Top-p sampling: Combines both filtering methods for fine-grained control over generation quality and diversity
batch_size: variablevocab_size: constant
probs: probability distributions after softmax [batch_size, vocab_size]- Sampling-specific parameters:
top_k: for top-k sampling [batch_size]top_p: for top-p/nucleus sampling [batch_size]
samples: sampled token indices [batch_size]

