Speculative Decoding

1 материала

Разбираем путь запроса к LLM: фазы prefill и decode, KV-кэш, speculative decoding и оптимизации, …