Zefang Huang 「黄泽方」

I am now an undergraduate student at Nanjing University (NJU), majoring in Computer Science and Technology (National Elite Program).

I have been working as a research intern at LAMDA since 2022, advised by Prof. Yang Yu. I was also a research intern in NUS , advised by Prof. Lin Shao .


Email  /  CV  /  GitHub  /  Google Scholar

profile photo
Research Overview and Goal

My research focuses on reinforcement learning and robotics.

My research goal is to develop efficient and robust algorithms for agents, reducing their reliance on large amounts of data and making them portable. I aim for these algorithms to be deployed in real-world robots and other decision-making systems, facilitating the acquisition of general, reliable, and complex operational skills by intelligent systems.

To achieve this goal, I have decided to focus my research in two main areas. Firstly, I aim to develop efficient reinforcement learning algorithms. Secondly, I plan to deepen my understanding of robotic systems and apply these algorithms to the control and decision-making processes of robots.

center_pic
Publications
Efficient Recurrent Off-Policy RL with Varied-Learning-Rate State Space Models
Fan-Ming Luo, Zuolin Tu, Zefang Huang, Yang Yu
In Submission
Conference on Neural Information Processing Systems (NeurIPS),  2024  ROVL

Paper (Coming Soon)  /  Abstract

Real-world decision-making tasks are usually partially observable Markov decision processes (POMDPs), where the state is not fully observable. Recent progress has demonstrated that recurrent reinforcement learning (RL), which consists of a context encoder based on recurrent neural networks (RNNs) for unobservable state prediction and a multilayer perceptron (MLP) policy for decision making, can mitigate partial observability and serve as a robust baseline for POMDP tasks. However, previous recurrent RL methods face training stability issues due to the gradient instability of RNNs. In this paper, we propose Recurrent Off-policy RL with Context-Encoder-Specific Learning Rate (RESeL) to tackle this issue. Specifically, RESeL uses a lower learning rate for context encoder than other MLP layers to ensure the stability of the former while maintaining the training efficiency of the latter. We integrate this technique into existing off-policy RL methods, resulting in the RESeL algorithm. We evaluated RESeL in 18 POMDP tasks, including classic, meta-RL, and credit assignment scenarios, as well as five MDP locomotion tasks. The experiments demonstrate significant improvements in training stability with RESeL. Comparative results show that RESeL achieves notable performance improvements over previous recurrent RL baselines in POMDP tasks, and is competitive with or even surpasses state-of-the-art methods in MDP tasks. Further ablation studies highlight the necessity of applying a distinct learning rate for the context encoder.
Offline Transition Modeling via Contrastive Energy Learning
Ruifeng Cheng, Chengxing Jia, Zefang Huang, Tian-Shuo Liu, Xu-Hui Liu, Yang Yu
The International Conference on Machine Learning (ICML),  2024  EMPO-AB

Paper  /  Abstract

Learning a high-quality transition model is of great importance for sequential decision-making tasks, especially in offline settings. Nevertheless, the complex behaviors of transition dynamics in real-world environments pose challenges for the standard forward models because of their inductive bias towards smooth regressors, conflicting with the inherent nature of transitions such as discontinuity or large curvature. In this work, we propose to model the transition probability implicitly through a scalar-value energy function, which enables not only flexible distribution prediction but also capturing complex transition behaviors. The Energy-based Transition Models (ETM) are shown to accurately fit the discontinuous transition functions and better generalize to out-of-distribution transition data. Furthermore, we demonstrate that energy-based transition models improve the evaluation accuracy and significantly outperform other off-policy evaluation methods in DOPE benchmark. Finally, we show that energy-based transition models also benefit reinforcement learning and outperform prior offline RL algorithms in D4RL Gym-Mujoco tasks.
Work Involved
Jade: A Differentiable Physics Engine for Articulated Rigid Bodies with Intersection-Free Frictional Contact
Gang Yang, Siyuan Luo, Yunhai Feng, Zhixin Sun, Chenrui Tie, Lin Shao
IEEE International Conference on Robotics and Automation (ICRA),  2024  展开收起文字示例

ArXiv  /  Website  /  Video  /  Abstract

We present Jade, a differentiable physics engine for articulated rigid bodies. Jade models contacts as the Linear Complementarity Problem (LCP). Compared to existing differentiable simulations, Jade offers features including intersection-free collision simulation and stable LCP solutions for multiple frictional contacts. We use continuous collision detection to detect the time of impact and adopt the backtracking strategy to prevent intersection between bodies with complex geometry shapes. We derive the gradient calculation to ensure the whole simulation process is differentiable under the backtracking mechanism. We modify the popular Dantzig algorithm to get valid solutions under multiple frictional contacts. We conduct extensive experiments to demonstrate the effectiveness of our differentiable physics simulation over a variety of contact-rich tasks.
I participated in part of the experimental work of the paper, mainly responsible for training the reinforcement learning algorithm to solve the LCP, but I did not signature.
Projects (Selected, view more on github)
Game - Bubble and Bubble

Language based: C/C++.

It can be played in the QT editor

Game - Overcooked

Language based: C/C++.

It can be run directly in Windows terminal.

It can be played by hands, and can also let the agents run automatically to get scores themselves.

Education
Nanjing University, Nanjing, China
B.E. in Computer Science and Technology • Sep. 2021 to Jun. 2025 (expected)
Research Experiences(Does not include ongoing research)
LAMDA Lab (led by Zhi-Hua Zhou), Nanjing University
Research Intern • Sept. 2022 to Feb. 2024
Advisor: Prof. Yang Yu
Research Topics: Reinforcement Learning
National University of Singapore
Research Intern • Mar. 2023 to Oct. 2023
Advisor: Prof. Lin Shao
Research Topics: Robotics
Invited to Visit
图标和链接示例
Miscellaneous
Before university, I had a very rich experience in studying for mathematical Olympiads, spanning ten years. During this period, I achieved many honors, including a bronze medal in the China Mathematical Olympiad (CMO 2020), and consistently placed in the top three in all provincial mathematics competitions. My extensive experience in mathematical Olympiads has greatly helped my growth, cultivating many habits, especially how to independently think when encountering difficult problems, and actively communicate and discuss with teachers and classmates afterwards. I am very grateful for this unforgettable and meaningful learning experience!

I am very passionate about sports, especially volleyball and swimming. I served as a referee for the men's volleyball department cup at Nanjing University in the autumn semester of 2022-2023.
Homemade Oximeter

Language based: MicroPython.

Under the guidance of teachers from the Medical School of Nanjing University, I independently completed the welding and connection of hardware devices, as well as the design of the code related to the blood oxygen meter. It can measure heart rate (HR), blood oxygen saturation (SPO2), and body temperature (Temp) relatively accurately. Although the device is relatively rudimentary, I hope it can fulfill its role.