Research Overview and Goal
My research focuses on reinforcement learning and
robotics.
My research goal is to develop efficient and robust algorithms for agents, reducing their reliance
on large amounts of data and making them portable. I aim for these algorithms to be deployed in
real-world robots and other decision-making systems, facilitating the acquisition of general,
reliable, and complex operational skills by intelligent systems.
To achieve this goal, I have decided to focus my research in two main areas. Firstly, I aim to
develop efficient reinforcement learning algorithms. Secondly, I plan to deepen my understanding of
robotic systems and apply these algorithms to the control and decision-making processes of robots.
|
center_pic
|
Efficient Recurrent Off-Policy RL with Varied-Learning-Rate State Space Models
Fan-Ming Luo, Zuolin Tu, Zefang Huang, Yang Yu
Conference on Neural Information Processing Systems (NeurIPS), 2024
RESeL
Paper
/
Code
/
Abstract
Real-world decision-making tasks are usually partially observable Markov
decision processes (POMDPs), where the state is not fully observable. Recent progress has
demonstrated that recurrent reinforcement learning (RL), which consists of a context encoder
based on recurrent neural networks (RNNs) for unobservable state prediction and a multilayer
perceptron (MLP) policy for decision making, can mitigate partial observability and serve as a
robust baseline for POMDP tasks. However, previous recurrent RL methods face training stability
issues due to the gradient instability of RNNs. In this paper, we propose Recurrent Off-policy
RL with Context-Encoder-Specific Learning Rate (RESeL) to tackle this issue. Specifically, RESeL
uses a lower learning rate for context encoder than other MLP layers to ensure the stability of
the former while maintaining the training efficiency of the latter. We integrate this technique
into existing off-policy RL methods, resulting in the RESeL algorithm. We evaluated RESeL in 18
POMDP tasks, including classic, meta-RL, and credit assignment scenarios, as well as five MDP
locomotion tasks. The experiments demonstrate significant improvements in training stability
with RESeL. Comparative results show that RESeL achieves notable performance improvements over
previous recurrent RL baselines in POMDP tasks, and is competitive with or even surpasses
state-of-the-art methods in MDP tasks. Further ablation studies highlight the necessity of
applying a distinct learning rate for the context encoder.
|
|
Offline Transition Modeling via Contrastive Energy Learning
Ruifeng Chen∗, Chengxing Jia∗, Zefang Huang, Tian-Shuo Liu, Xu-Hui Liu,
Yang Yu
The International Conference on Machine Learning (ICML), 2024
EMPO-AB
Paper
/
Code
/
Abstract
Learning a high-quality transition model is of great importance for sequential
decision-making tasks, especially in offline settings. Nevertheless, the complex behaviors of
transition dynamics in real-world environments pose challenges for the standard forward models
because of their inductive bias towards smooth regressors, conflicting with the inherent nature
of transitions such as discontinuity or large curvature. In this work, we propose to model the
transition probability implicitly through a scalar-value energy function, which enables not only
flexible distribution prediction but also capturing complex transition behaviors. The
Energy-based Transition Models (ETM) are shown to accurately fit the discontinuous transition
functions and better generalize to out-of-distribution transition data. Furthermore, we
demonstrate that energy-based transition models improve the evaluation accuracy and
significantly outperform other off-policy evaluation methods in DOPE benchmark. Finally, we show
that energy-based transition models also benefit reinforcement learning and outperform prior
offline RL algorithms in D4RL Gym-Mujoco tasks.
|
|
Denoising-based Contractive Imitation Learning
Macheng Shen, Jishen Peng, Zefang Huang
In Submission
DeCIL
Arxiv
/
Abstract
A fundamental challenge in imitation learning is the covariate shift problem.
Existing methods to mitigate covariate shift often require additional expert interactions,
access to environment dynamics, or complex adversarial training, which may not be practical in
real-world applications. In this paper, we propose a simple yet effective method (DeCIL) to
mitigate covariate shift by incorporating a denoising mechanism that enhances the contraction
properties of the state transition mapping. Our approach involves training two neural networks:
a dynamics model ( f ) that predicts the next state from the current state, and a joint
state-action denoising policy network ( d ) that refines this state prediction via denoising and
outputs the corresponding action. We provide theoretical analysis showing that the denoising
network acts as a local contraction mapping, reducing the error propagation of the state
transition and improving stability. Our method is straightforward to implement and can be easily
integrated with existing imitation learning frameworks without requiring additional expert data
or complex modifications to the training procedure. Empirical results demonstrate that our
approach effectively improves success rate of various imitation learning tasks under noise
perturbation.
|
|
Jade: A Differentiable Physics Engine for Articulated Rigid Bodies with
Intersection-Free Frictional Contact
Gang Yang, Siyuan Luo, Yunhai Feng, Zhixin Sun, Chenrui Tie, Lin Shao
IEEE International Conference on Robotics and Automation (ICRA), 2024
展开收起文字示例
ArXiv
/
Website
/
Video
/
Abstract
We present Jade, a differentiable physics engine for articulated rigid bodies.
Jade models contacts as the Linear Complementarity Problem (LCP). Compared to existing
differentiable simulations, Jade offers features including intersection-free collision
simulation and stable LCP solutions for multiple frictional contacts. We use continuous
collision detection to detect the time of impact and adopt the backtracking strategy to prevent
intersection between bodies with complex geometry shapes. We derive the gradient calculation to
ensure the whole simulation process is differentiable under the backtracking mechanism. We
modify the popular Dantzig algorithm to get valid solutions under multiple frictional contacts.
We conduct extensive experiments to demonstrate the effectiveness of our differentiable physics
simulation over a variety of contact-rich tasks.
I participated in part of the experimental work of the paper, mainly responsible for training the
reinforcement learning algorithm to solve the LCP, but I did not signature.
|
Projects
(Selected, view more on github)
|
 |
Language based: C/C++.
This project faithfully recreates the well-known game BOMBIT, it can be played in the QT.
|
 |
Language based: C/C++.
This project simplifies and recreates the recently globally popular game Overcooked, and I designed
manual and automatic these two modes of play for it. It can be run directly in Windows terminal.
|
Zhejiang University, Hangzhou, China
Ph.D student in Control Science and Engineering • Sep. 2025 (expected) to -
|
 |
Nanjing University, Nanjing, China
B.E. in Computer Science and Technology • Sep. 2021 to Jun. 2025 (expected)
|
 |
图标和链接示例
Before university, I had a very rich experience in studying for mathematical Olympiads, spanning ten
years. During this period, I achieved many honors, including a bronze medal in the China Mathematical
Olympiad (CMO 2020), and consistently placed in the top three in all provincial mathematics
competitions. My extensive experience in mathematical Olympiads has greatly helped my growth,
cultivating many habits, especially how to independently think when encountering difficult problems,
and actively communicate and discuss with teachers and classmates afterwards. I am very grateful for
this unforgettable and meaningful learning experience!
I am very passionate about sports, especially volleyball and swimming. I served as a referee for the
men's volleyball department cup at Nanjing University in the autumn semester of 2022-2023.
|
 |
Language based: MicroPython.
Under the guidance of teachers from the
Medical School of Nanjing University, I independently completed the welding and connection of
hardware devices, as well as the design of the code related to the blood oxygen meter. It can
measure heart rate (HR), blood oxygen saturation (SPO2), and body temperature (Temp) relatively
accurately. Although the device is relatively rudimentary, I hope it can fulfill its role.
|
|