Our paper "MoE²: Optimizing Collaborative Inference for Edge Large Language Models" is accepted by IEEE/ACM Transactions on Networking
We are glad to share that our paper "MoE²: Optimizing Collaborative Inference for Edge Large Language Models" has been accepted by the IEEE/ACM Transactions on Networking (ToN).
The paper optimizes collaborative inference across a mixture of edge large language models, jointly improving response quality, latency, and serving cost when LLM requests are handled at the network edge.