MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

reactive:llm-inference-efficiency

(No summary yet for this item — extraction summaries are still backfilling.)

Appears in