UNIST site map


Connection Points of Knowledge, Everything About UNIST
Try searching.
Recommended search terms




Discover not only Research Findings and event news, but also the diverse facets of UNIST presented by reporters and writers.
UNIST Demonstrates Global Competitiveness in Reinforcement Learning with Three Papers Accepted to ICLR 2026
Three papers by Professor Seungyul Han's team have been accepted for publication at the International Conference on Learning Representations (ICLR 2026).
UNIST has demonstrated strong international competitiveness in reinforcement learning, a core technology for physical AI, with three papers by Professor Seungyul Han’s research group from the Graduate School of Artificial Intelligence accepted to the International Conference on Learning Representations (ICLR 2026), held in Rio de Janeiro, Brazil, from April 23 to 27, 2026.
ICLR, alongside NeurIPS and ICML, is widely regarded as one of the world’s leading artificial intelligence conferences. This year, approximately 5,300 papers—about 27% of more than 19,000 submissions—were accepted, making the selection of three papers from a single research group a notable achievement.
The accepted studies address key challenges in reinforcement learning, where AI systems learn optimal actions through interaction with their environment—an approach essential for applications, such as robotics and autonomous systems operating in complex, real-world conditions.
The first study proposes Self-Improving Skill Learning (SISL), a method designed to enable robust learning from noisy offline data. By decomposing long-horizon tasks into reusable skills and refining them through prioritized updates, SISL mitigates the impact of imperfect data and supports stable adaptation across complex tasks.
The second study introduces Strict Subgoal Execution (SSE), which improves long-horizon planning by distinguishing feasible subgoals from unreachable ones. By leveraging past failures and partial successes, the method enhances planning efficiency and increases overall task reliability in goal-conditioned environments.
The third study presents Successive Sub-value Q-learning (S2Q), a framework for multi-agent reinforcement learning (MARL) that retains multiple high-value action candidates. This approach enables agents to adapt more effectively in dynamic environments where optimal strategies shift over time, improving both coordination and overall performance.
The research was led by Sanghyun Lee, Jaebak Hwang, and Yonghyeon Cho as first authors, respectively, and supported by programs funded by the Ministry of Science and ICT (MSIT) and the National Research Foundation of Korea (NRF).
Professor Han said, “Our research demonstrates that reinforcement learning can be applied more reliably in environments with limited data and uncertainty, with strong potential for applications in autonomous driving, robotics, and smart manufacturing.”
Journal Reference
[1] Sanghyeon Lee, Sangjun Bae, Yisak Park, Seungyul Han, "Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning," ICLR 2026.
[2] Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han, "Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning," ICLR 2026.
[3] Yonghyeon Jo, Sunwoo Lee, Seungyul Han, "Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning," ICLR 2026.
Related Links