Towards Efficient LLMs: Analyzing Computational Bottlenecks and Optimization Strategies
Published in 2025 3rd International Conference on Artificial Intelligence, Database and Machine Learning (AIDML 2025), 2025
Undergraduate thesis produced during the Introduction to Computer Graphics online research project at the USC Viterbi School of Engineering, advised by Prof. Saty (DreamWorks Animation). The paper surveys the principal computational bottlenecks in large language models — attention memory footprint, key–value caches, and matrix-multiplication throughput — and compares contemporary optimization strategies such as quantization, structured pruning, mixture-of-experts routing, and FlashAttention-style kernel fusion. Accepted by the 2025 3rd International Conference on Artificial Intelligence, Database and Machine Learning (AIDML 2025).
Recommended citation: Qian, T. (2025). "Towards Efficient LLMs: Analyzing Computational Bottlenecks and Optimization Strategies." 2025 3rd International Conference on Artificial Intelligence, Database and Machine Learning (AIDML 2025).