We employ importance sampling (likelihood ratios) to achieve good performance in partially observable Markov decision processes with few data. Our importance sampling estimator requires no knowledge about the environment and places few restrictions on the method of collecting data. It can be used efficiently with reactive controllers, finite-state controllers, or policies with function approximation. We present theoretical analyses of the estimator and incorporate it into a reinforcement learning algorithm.
Additionally, this method provides a complete return surface which can be used to balance multiple objectives dynamically. We demonstrate the need for multiple goals in a variety of applications and natural solutions based on our sampling method. The thesis concludes with example results from employing our algorithm to the domain of automated electronic market-making.
Christian Robert Shelton (2001). "Importance Sampling for Reinforcement Learning with Multiple Objectives." Technical report. MIT AI Lab, AI Memo 2001-003. |
@techreport{She01d, author = "Christian Robert Shelton", title = "Importance Sampling for Reinforcement Learning with Multiple Objectives", institution = "{MIT} {AI} Lab", year = 2001, type = "AI Memo", number = "2001-003", month = Aug, }