I am a third-year Ph.D. student in Computer Science at the University of California San Diego (UCSD). I am advised by Prof. Kamalika Chaudhuri.

I am broadly interested in developing algorithmic techniques for learning and reasoning tasks in Artificial Intelligence (AI), especially for sequential decision making. Currently, I work on multi-armed and contextual bandits, and reinforcement learning. I have also worked on combinatorial search, planning, and scheduling, among many other subfields of AI.

Before joining UCSD, I obtained my B.S. in Computer Science from the University of Southern California, where I worked with Prof. T. K. Satish Kumar and Prof. Sven Koenig.

Publications

Chicheng Zhang, Zhi Wang.
Provably Efficient Multi-Task Reinforcement Learning with Model Transfer. To appear in the 35th Conference on Neural Information Processing Systems (NeurIPS-2021). A shorter version of this work appeared in the Workshop on Reinforcement Learning Theory at ICML-2021. [arXiv]

We study multi-task reinforcement learning (RL) in tabular episodic Markov decision processes (MDPs). We formulate a heterogeneous multi-player RL problem, in which a group of players concurrently face similar but not necessarily identical MDPs, with a goal of improving their collective performance through inter-player information sharing. We design and analyze an algorithm based on the idea of model transfer, and provide gap-dependent and gap-independent upper and lower bounds that characterize the intrinsic complexity of the problem.

Zhi Wang*, Chicheng Zhang*, Manish Kumar Singh, Laurel D. Riek, and Kamalika Chaudhuri.
Multitask Bandit Learning through Heterogeneous Feedback Aggregation. In Proceedings of the 24th International Conference on
Artificial Intelligence and Statistics (AISTATS-2021). A preliminary version of this work appeared in the Workshop on Real World Experiment Design and Active Learning at ICML-2020. [arXiv]

In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the ε-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players are similar but not necessarily identical. We develop an upper confidence bound-based algorithm,

RobustAgg

(ε), that adaptively aggregates rewards collected by different players. In the setting where an upper bound on the pairwise similarities of reward distributions between players is known, we achieve instance-dependent regret guarantees that depend on the amenability of information sharing across players. We complement these upper bounds with nearly matching lower bounds. In the setting where pairwise similarities are unknown, we provide a lower bound, as well as an algorithm that trades off minimax regret guarantees for adaptivity to unknown similarity structure.

*These authors contributed equally.

Zhi Wang, Liron Cohen, Sven Koenig and T. K. Satish Kumar. The Factored Shortest Path Problem and Its Applications in Robotics. In Proceedings of the 28th International Conference on Automated Planning and Scheduling (ICAPS-2018). [pdf]

Many real-world combinatorial problems exhibit structure in the way in which their variables interact. Such structure can be exploited in the form of "factors" for representational as well as computational benefits. Factored representations are extensively used in probabilistic reasoning, constraint satisfaction, planning, and decision theory. In this paper, we formulate the factored shortest path problem (FSPP) on a collection of constraints interpreted as factors of a highdimensional map.We show that the FSPP is not only a generalization of the regular shortest path problem but also particularly relevant to robotics.We develop factored-space heuristics for A* and prove that they are admissible and consistent. We provide experimental results on both random and handcrafted instances as well as on an example robotics domain to show that A* with factored-space heuristics outperforms A* with the Manhattan Distance heuristic in many cases.

@inproceedings{wang2018,
author = {Zhi Wang and Liron Cohen and Sven Koenig and T. K. Satish Kumar},
title = {The Factored Shortest Path Problem and Its Applications in Robotics},
booktitle = {Proceedings of the Twenty-Eighth International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {527--531},
year = {2018}
}

T. K. Satish Kumar, Zhi Wang, Anoop Kumar, Craig Milo Rogers and Craig A. Knoblock. Load Scheduling of Simple Temporal Networks Under Dynamic Resource Pricing. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-2018). [pdf]

We study load scheduling of simple temporal networks (STNs) under dynamic pricing of resources. We are given a set of processes and a set of simple temporal constraints between their execution times, i.e., an STN. Each process uses a certain amount of resource for execution. The unit price of the resource is a function of time, f(t). The goal is to find a schedule of a given STN that trades off makespan minimization against cost minimization within a user-specified suboptimality bound. We provide a polynomial-time algorithm for solving the load scheduling problem when f(t) is piecewise constant. This has important applications in many real-world domains including the smart home and smart grid domains. We then study the dependency of the unit price of the resource on time as well as the total demand at that time. This leads to a further characterization of tractable, NP-hard, and conjectured tractable cases.

@inproceedings{kumar2018a,
author = {Kumar, T. K. Satish and Wang, Zhi and Kumar, Anoop and Rogers, Craig Milo and Knoblock,Craig A.},
title = {Load Scheduling of Simple Temporal Networks Under Dynamic Resource Pricing},
booktitle = {Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI)},
pages = {6227--6236},
year = {2018}
}

T. K. Satish Kumar, Zhi Wang, Anoop Kumar, Craig Milo Rogers and Craig A. Knoblock. On the Linear Programming Duals of Temporal Reasoning Problems. In Proceedings of the 15th International Symposium on Artificial Intelligence and Mathematics (ISAIM-2018). [pdf]

Temporal reasoning problems occur in many application domains of Artificial Intelligence; therefore, it is important for us to develop algorithms for solving them efficiently. While some problems like Simple Temporal Problems are known to be tractable, some other problems like Disjunctive Temporal Problems are known to be NP-hard. In this paper, we provide a Linear Programming (LP) duality perspective on temporal reasoning problems. In many cases, we show that their LP duals are the commonly-studied flow problems in Graph Theory. Using the general theory of LP duality, we develop novel algorithms for efficiently solving several temporal reasoning problems. We also show that other previously-known efficient methods in temporal reasoning also fit into this perspective of LP duality.

@inproceedings{kumar2018b,
author = {Kumar, T. K. Satish and Wang, Zhi and Kumar, Anoop and Rogers, Craig Milo and Knoblock,Craig A.},
title = {On the Linear Programming Duals of Temporal Reasoning Problems},
booktitle = {Proceedings of the Fifteenth International Symposium on Artificial Intelligence and Mathematics (ISAIM)},
year = {2018},
url = {"http://isaim2018.cs.virginia.edu/papers/ISAIM2018_TK_Satish_Kumar_etal.pdf"}
}

Contact

Office: EBU3B 4154 @ UCSD
Email: zhiwang at eng dot ucsd dot edu Google Scholar dblp