This technical report presents K-EXAONE, a large-scale multilingual language model developed by LG AI Research. K-EXAONE is built on a Mixture-of-Experts architecture with 236B total parameters, activating 23B parameters during inference. It supports a 256K-token context window and covers six languages: Korean, English, Spanish, German, Japanese, and Vietnamese. We evaluate K-EXAONE on a comprehensive benchmark suite spanning reasoning, agentic, general, Korean, and multilingual abilities. Across these evaluations, K-EXAONE demonstrates performance comparable to open-weight models of similar size. K-EXAONE, designed to advance AI for a better life, is positioned as a powerful proprietary AI foundation model for a wide range of industrial and research applications.
Figure 1: The main evaluation results of K-EXAONE across eight categories: world knowledge (MMLU-PRO), math (AIME 2025), coding (LIVECODEBENCH V6), agentic tool use (τ 2 -BENCH), instruction following (IFBENCH), Korean (KOBALT), multilinguality (MMMLU), and safety (KGC-SAFETY). All models used in assessment are reasoning models. τ 2 -BENCH scores are weighted average.
The global development of large language models (LLMs) is currently experiencing intense competition, with leading countries striving to deploy models with superior performance. In this race, closed-source models currently hold a competitive advantage, while open-weight models are rapidly catching up by employing aggressive scaling strategies. A major factor behind the momentum of open-weight models is the effectiveness of scaling in terms of model size, which has now surpassed hundreds of billions of parameters and is approaching the trillion-parameter scale. The scaling effort is crucial in reducing the performance gap between closed-source and open-weight models.
However, the situation in South Korea presents unique challenges. Compared to global leaders, Korea faces relative shortages in AI-specialized data centers and AI chips, which have limited the development of large-scale models. As a result, previous efforts have focused on cost-effective smaller-scale models (on the order of tens of billions of parameters). Despite these challenges, building a robust and reliable foundation for AI transformation fundamentally requires acquiring a model that demonstrates top-tier performance on a global scale. To address this infrastructure gap, the Korean government has initiated a strategic program aimed at providing essential resources-such as GPUs-for the development of large-scale AI models.
LG AI Research has actively participated in this initiative, leveraging government support to develop the K-EXAONE foundation model, which is detailed in this technical report. K-EXAONE builds on the hybrid architecture of EXAONE 4.0 [2], combining reasoning and non-reasoning capabilities to enhance both general-purpose and specialized use cases. It also uses a hybrid attention mechanism that integrates global and local attention modules, enabling efficient processing of long-context inputs-a critical feature for real-world applications.
A key architectural innovation that sets K-EXAONE apart is the adoption of the Mixture-of-Experts (MoE) paradigm, a design increasingly used in the state-of-the-art models, which allows for scalable and efficient computation. Additionally, while EXAONE 4.0 supports Korean, English, and Spanish, K-EXAONE extends multilingual coverage by enhancing the tokenizer to include German, Japanese, and Vietnamese, thereby broadening its applicability across diverse linguistic contexts.
K-EXAONE is architecturally distinct from the EXAONE series previously released by LG AI Research. While EXAONE adopts a dense modeling paradigm, K-EXAONE is designed with a MoE architecture, which enables resource-efficient scaling of model capacity and has been increasingly adopted for training models at the 100Bparameter scale and beyond.
As illustrated in Figure 2, K-EXAONE employs a fine-grained sparse MoE design inspired by prior work [7], consisting of 128 experts, where the top-8 experts are activated per token together with an additional shared expert, resulting in nine concurrently active experts per routing decision. Although the total number of parameters amounts to 236B, only approximately 23B parameters are activated, enabling high representational diversity and strong performance while maintaining resource-efficient training and inference. To improve routing stability and expert utilization efficiency, sequence-level load balancing is employed in the MoE routing mechanism, and a dropless routing policy [10] is adopted to ensure that all tokens are dispatched to experts without capacity-based dropping, thereby stabilizing gradient flow and improving convergence behavior in large-scale MoE training.
In addition, K-EXAONE integrates a dense-layer-based Multi-Token Prediction (MTP) module [7,11] to enable resource-efficient auxiliary training, minimizing routing overhead and memory consumption while enhancing futuretoken predictive capability. During inference, K-EXAONE leverages the MTP block for self-drafting, achieving an approximately 1.5× improvement in decoding throughput compared to standard autoregressive decoding. K-EXAONE supports a maximum context length of 256K tokens and incorporates the hybrid attention architecture originally introduced in EXAONE 4.0, which significantly reduces memory consumption and computational overhead compared to full global attention (GA) across all layers, enabling cost-efficient long-context modeling.
To enhance training stability and long-context extrapolation, K-EXAONE incorporates two architectural features-QK Norm and SWA (Sliding Window Attention) [5]-only RoPE (Rotary Positional Embedding
This content is AI-processed based on open access ArXiv data.