Date of Defense
29-4-2026 10:00 AM
Location
Microsoft Teams
Document Type
Dissertation Defense
Degree Name
Doctor of Philosophy in Electrical Engineering
College
COE
Department
Electrical and Communication Engineering
First Advisor
Prof. Hussain Shareef
Keywords
Deep reinforcement learning, Energy management systems, Uncertainty handling, Two-stage stochastic optimization, Robust Optimization, Risk management, Microgrids
Abstract
Microgrid technology is essential in facilitating the transition to smart energy grids in developed countries and mitigating energy poverty in developing countries, particularly in areas where grid extensions are not feasible. Recently, the concept of networked microgrids (NMGs) has garnered tremendous attention due to the plausibility of interactions among interconnected microgrids leading to power networks that are more resilient, reliable, and stable. However, because each microgrid has diverse distributed generation resources (renewables and controllable generators) and each microgrid operator (MO) has different objectives, coordinated energy management is required to satisfy local and system-wide goals under conditions with significant uncertainty. Existing mathematical formulations for energy management among NMGs are often intractable; when tractable, they tend to be non-scalable in large and dynamic environments due to uncertainties in load demand, electricity prices, solar irradiance, and wind speed.
This study focuses on grid-tied NMGs with multiple electricity retailers (ERs) and multiple microgrids (MGs). The problem is first formulated as a bi-level Stackelberg game in which the upper level maximizes ER profits and the network’s available transfer capacity (ATC), while simultaneously minimizing the carbon emissions pertaining to power purchases from the main grid. The lower level on the other hand minimized the operating costs of the MGs. In making the bi-level formulation tractable, the lower-level problem was replaced by its Karush–Kuhn–Tucker (KKT) conditions and eventually appended to the upperlevel formulation consequently yielding a tractable single-level mathematical program with equilibrium constraints (MPEC). In solving the resulting NMG energy management problem, a lean multi-agent deep reinforcement learning (L-MADRL)/deep reinforcement learning in the loop (DRL-ITL) framework was developed. The framework combines a multi-agent deep reinforcement learning algorithm with the single-level MPEC obtained after the KKT-based reformulation and adopts a modular architecture in which constraint handling is delegated to this derived analytically tractable optimization layer. This ensures that every action proposed by the DRL agents is feasible with respect to network and market constraints, allowing the agents to focus on learning optimal policies over stochastic variables such as solar irradiance, wind speed, and load demand. As a result, training is more stable and computationally efficient because the agents do not need to navigate complex feasibility regions since technical constraints such as power flow limits, generator capacities, and market rules are embedded in the MPEC. The DRL agents are thus free from constraint enforcement, consequently enhancing the reliability and accuracy of the policy from the agents.
The L-MADRL framework which utilized a deep q network (DQN) as the DRL agent was evaluated across three benchmark systems, namely PJM 5-bus, IEEE 14-bus, and IEEE 30-bus, and benchmarked against a deterministic, risk-neutral stochastic optimization (RNSO), and risk-averse stochastic optimization (RASO) with conditional value at risk (CVaR) approach. In the 5- bus case, the L-MADRL achieved an overall 10.3% cost reduction for the MGs and increased the profits of the ERs by up to 3.7% over the best baseline. In the 14-bus case, total MG costs decreased by 2.6% and ER profits rose by 11.4%. For the 30- bus system, the framework delivered savings of about 2.7% in MG costs and boosted ER profits by up to 9.1%. Notably, LMADRL enhanced the network’s ATC from a starting value of 78 MW to a peak of 94 MW, exceeding the best benchmark by 17.5%. In another example using the DRL-ITL framework strictly on a modified IEEE 14-bus system where a double deep q network (DDQN) was utilized as the DRL agent and carbon emissions were considered, compared to the RASO and a robust optimization approach (RO), the framework reduced the aggregate MG operating costs to €128,408 (which are 9.01% and 5.28% lower than RASO and RO) and increased the total ER profits to €34,466 (10.43% and 15.44% higher), respectively. It further lowered the total CO2 emissions to 305.47 kg, representing a 16.7% and 11.3% reduction relative to the RASO and RO baselines. Across the test systems studied, the DRL-ITL/L-MADRL framework performed better on both economic and technical metrics when compared with other conventional approaches, and at the same time maintained a runtime below 3 seconds, which was also significantly lower than the benchmark methods. This consequently demonstrates the computational efficiency and scalability of the proposed DRL approach.
Included in
EFFICIENT ENERGY MANAGEMENT IN NETWORKED MICROGRIDS USING MULTI-AGENT DEEP REINFORCEMENT LEARNING IN THE PRESENCE OF UNCERTAINTIES
Microsoft Teams
Microgrid technology is essential in facilitating the transition to smart energy grids in developed countries and mitigating energy poverty in developing countries, particularly in areas where grid extensions are not feasible. Recently, the concept of networked microgrids (NMGs) has garnered tremendous attention due to the plausibility of interactions among interconnected microgrids leading to power networks that are more resilient, reliable, and stable. However, because each microgrid has diverse distributed generation resources (renewables and controllable generators) and each microgrid operator (MO) has different objectives, coordinated energy management is required to satisfy local and system-wide goals under conditions with significant uncertainty. Existing mathematical formulations for energy management among NMGs are often intractable; when tractable, they tend to be non-scalable in large and dynamic environments due to uncertainties in load demand, electricity prices, solar irradiance, and wind speed.
This study focuses on grid-tied NMGs with multiple electricity retailers (ERs) and multiple microgrids (MGs). The problem is first formulated as a bi-level Stackelberg game in which the upper level maximizes ER profits and the network’s available transfer capacity (ATC), while simultaneously minimizing the carbon emissions pertaining to power purchases from the main grid. The lower level on the other hand minimized the operating costs of the MGs. In making the bi-level formulation tractable, the lower-level problem was replaced by its Karush–Kuhn–Tucker (KKT) conditions and eventually appended to the upperlevel formulation consequently yielding a tractable single-level mathematical program with equilibrium constraints (MPEC). In solving the resulting NMG energy management problem, a lean multi-agent deep reinforcement learning (L-MADRL)/deep reinforcement learning in the loop (DRL-ITL) framework was developed. The framework combines a multi-agent deep reinforcement learning algorithm with the single-level MPEC obtained after the KKT-based reformulation and adopts a modular architecture in which constraint handling is delegated to this derived analytically tractable optimization layer. This ensures that every action proposed by the DRL agents is feasible with respect to network and market constraints, allowing the agents to focus on learning optimal policies over stochastic variables such as solar irradiance, wind speed, and load demand. As a result, training is more stable and computationally efficient because the agents do not need to navigate complex feasibility regions since technical constraints such as power flow limits, generator capacities, and market rules are embedded in the MPEC. The DRL agents are thus free from constraint enforcement, consequently enhancing the reliability and accuracy of the policy from the agents.
The L-MADRL framework which utilized a deep q network (DQN) as the DRL agent was evaluated across three benchmark systems, namely PJM 5-bus, IEEE 14-bus, and IEEE 30-bus, and benchmarked against a deterministic, risk-neutral stochastic optimization (RNSO), and risk-averse stochastic optimization (RASO) with conditional value at risk (CVaR) approach. In the 5- bus case, the L-MADRL achieved an overall 10.3% cost reduction for the MGs and increased the profits of the ERs by up to 3.7% over the best baseline. In the 14-bus case, total MG costs decreased by 2.6% and ER profits rose by 11.4%. For the 30- bus system, the framework delivered savings of about 2.7% in MG costs and boosted ER profits by up to 9.1%. Notably, LMADRL enhanced the network’s ATC from a starting value of 78 MW to a peak of 94 MW, exceeding the best benchmark by 17.5%. In another example using the DRL-ITL framework strictly on a modified IEEE 14-bus system where a double deep q network (DDQN) was utilized as the DRL agent and carbon emissions were considered, compared to the RASO and a robust optimization approach (RO), the framework reduced the aggregate MG operating costs to €128,408 (which are 9.01% and 5.28% lower than RASO and RO) and increased the total ER profits to €34,466 (10.43% and 15.44% higher), respectively. It further lowered the total CO2 emissions to 305.47 kg, representing a 16.7% and 11.3% reduction relative to the RASO and RO baselines. Across the test systems studied, the DRL-ITL/L-MADRL framework performed better on both economic and technical metrics when compared with other conventional approaches, and at the same time maintained a runtime below 3 seconds, which was also significantly lower than the benchmark methods. This consequently demonstrates the computational efficiency and scalability of the proposed DRL approach.