Importance Sampling Estimate for Policies with Memory (2001)

by Christian R. Shelton


Abstract: Importance sampling has recently become a popular method for computing off-policy Monte Carlo estimates of returns. It has been known that importance sampling ratios can be computed for POMDPs when the sampled and target policies are both reactive (memoryless). We extend that result to show how they can also be efficiently computed for policies with memory state (finite state controllers) without resorting to the standard trick of pretending the memory is part of the environment. This allows for very data-efficient algorithms. We demonstrate the results on simulated problems.

Download Information

Christian R. Shelton (2001). "Importance Sampling Estimate for Policies with Memory." ICML Workshop on Heirarchy and Memory. pdf   ps ps.gz    

Bibtex citation

@inproceedings{She01cworkshop,
   author = "Christian R. Shelton",
   title = "Importance Sampling Estimate for Policies with Memory",
   booktitle = "{ICML} Workshop on Heirarchy and Memory",
   year = 2001,
}