Meta Hierarchical Reinforcement Learning for Scalable Resource Management in O-RAN

December 08, 2025

Reading time: 5 minute

...

📝 Original Info

Title: Meta Hierarchical Reinforcement Learning for Scalable Resource Management in O-RAN
ArXiv ID: 2512.13715
Date: 2025-12-08
Authors: Fatemeh Lotfi, Fatemeh Afghah

📝 Abstract

The increasing complexity of modern applications demands wireless networks capable of real time adaptability and efficient resource management. The Open Radio Access Network (O-RAN) architecture, with its RAN Intelligent Controller (RIC) modules, has emerged as a pivotal solution for dynamic resource management and network slicing. While artificial intelligence (AI) driven methods have shown promise, most approaches struggle to maintain performance under unpredictable and highly dynamic conditions. This paper proposes an adaptive Meta Hierarchical Reinforcement Learning (Meta-HRL) framework, inspired by Model Agnostic Meta Learning (MAML), to jointly optimize resource allocation and network slicing in O-RAN. The framework integrates hierarchical control with meta learning to enable both global and local adaptation: the high-level controller allocates resources across slices, while low level agents perform intra slice scheduling. The adaptive meta-update mechanism weights tasks by temporal difference error variance, improving stability and prioritizing complex network scenarios. Theoretical analysis establishes sublinear convergence and regret guarantees for the two-level learning process. Simulation results demonstrate a 19.8% improvement in network management efficiency compared with baseline RL and meta-RL approaches, along with faster adaptation and higher QoS satisfaction across eMBB, URLLC, and mMTC slices. Additional ablation and scalability studies confirm the method's robustness, achieving up to 40% faster adaptation and consistent fairness, latency, and throughput performance as network scale increases.

💡 Deep Analysis

📄 Full Content

N EXT-generation wireless networks built on the Open RAN (O-RAN) architecture are designed for flexibility, enabling operators to dynamically adapt to changing user demands and network conditions. This adaptability is primarily driven by RAN Intelligent Controller (RIC) modules, which enhance network functionality through intelligent resource management and real-time data analysis [1], [2]. These capabilities enable operators to maintain high levels of responsiveness and adaptability, ensuring that the network can effectively address diverse and evolving use cases [2], [3].

Effective network configuration management ensures that modern wireless networks remain adaptable and scalable. The dynamic nature of resource allocation in O-RAN sys-tems allows operators to modify network architecture onthe-fly, and ensures seamless operation even under fluctuating conditions. This adaptability enables networks to scale from simple configurations to complex architectures as user demands change. However, significant challenges remain in real-time resource management, particularly in unpredictable environments. These challenges are heightened in densely populated areas or hotspots, where demand surges create corner cases that challenge the network’s responsiveness. Additionally, the real-time orchestration of virtualized distributed units (DUs) and the deployment of xApps introduce further complexities, requiring advanced strategies for dynamic resource management. Existing works have explored ML-based approaches for real-time resource management [4]- [25], but their applicability in dynamic and complex O-RAN environments is limited. However, existing learning-based methods often struggle with unpredictable traffic patterns, non-stationary environments, and scalability limits.

To address these challenges, fast adaptation in O-RAN slicing and scheduling is essential for managing complexity and maintaining high-quality service. Advanced resource management strategies are indispensable for optimizing resource utilization, ensuring flexibility, and achieving rapid convergence in dynamic conditions. Building on this necessity, literature recently explored a meta-learning framework to enhance the efficiency and adaptability of Deep Reinforcement Learning (DRL) approaches, proving particularly advantageous in few-shot learning scenarios where deriving meaningful insights from small data samples is crucial [15], [26]- [28]. Unlike federated learning (FL), which focuses on decentralized training without data exchange [5], [7], [15], [19], meta-learning optimizes the learning process to generalize from past experiences and accelerate adaptation to new tasks. Meta-learning proves highly effective in leveraging prior knowledge to accelerate the learning process. It enables the extraction of reusable patterns and skills that support rapid adaptation to new tasks. This capability is particularly advantageous in unpredictable dynamic wireless networks, where fluctuating conditions and diverse application demands create a complex operating environment. Building on this concept, this work introduces a meta-learningbased hierarchical approach for network slicing and resource block scheduling inspired by Model-Agnostic Meta-Learning (MAML) [29]. Designed for advanced wireless O-RAN architectures [16], [30], this solution addresses the challenges of dynamic resource allocation and system adaptability. By minimizing performance disruptions and optimizing service delivery to user equipment (UEs), our strategy establishes a scalable framework for efficient network management in highly dynamic and demanding environments.

Building on these foundations, this paper builds upon our recent prior work [16], which focused on optimizing resource block (RB) allocation for users within the eMBB slice of an O-RAN environment. While that study primarily addressed resource management in a single slice type, this work extends the scope to a more comprehensive and realistic network slicing scenario involving multiple slice types, including eMBB, URLLC, and mMTC. This study introduces a novel adaptive meta-hierarchical reinforcement learning (meta-HRL) framework. This framework addresses resource management across diverse slice types while incorporating efficient scheduling strategies within each slice. Meta-RL is particularly suited for rapidly changing network conditions, enabling the model to generalize across tasks and adapt quickly to dynamic environments. This capability is essential in O-RAN systems with highly variable traffic patterns and user demands. The proposed hierarchical approach complements this meta-RL solution by decomposing complex problems into manageable subproblems, resource allocation across slices and scheduling within slices. By structuring decisions hierarchically, the framework efficiently handles multi level resource management challenges inherent in O-RAN. By integrating these approaches, the proposed framework ensures rapid ad

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on open access ArXiv data.

Meta Hierarchical Reinforcement Learning for Scalable Resource Management in O-RAN

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

SUGAR: A Sweeter Spot for Generative Unlearning of Many Identities

A Unified Theory of Sparse Dictionary Learning in Mechanistic Interpretability: Piecewise Biconvexity and Spurious Minima

Quantifying Memory Use in Reinforcement Learning with Temporal Range

Start searching

No results found