Unweight: how we compressed an LLM 22% without sacrificing qualityCloudflare / Apr 17, 2026BF16指数をハフマン符号化SMEMで復元してtensor coresへ直渡しMLPで約30%、モデルで15–22%削減bf16huffmanh100shared-memorytensor-coresautotuningmlp