Ctrl K

SMZ

Code underlying the publication: VariBASed: Variational Bayes-Adaptive Sequential Monte-Carlo Planning for Deep Reinforcement Learning

6
contributors

Description

VariBASeD: Variational Bayes-Adaptive Sequential Monte-Carlo Planning — JAX implementation accompanying "VariBASed: Variational Bayes-Adaptive Sequential Monte-Carlo Planning for Deep Reinforcement Learning" (De Vries, He, Oren, van der Vaart, de Weerdt, Spaan, 2026). Targets approximate Bayes-optimal exploration-exploitation by combining amortized variational belief learning (via S5 state-space models), SMC planning from the trtpi codebase, and meta-RL into a unified EM framework. The E-step uses a particle-filter planner with nested importance-sampling weights correcting for mismatch between variational and true beliefs; the M-step distills SMC-improved policies and fits the belief ELBO jointly. Designed for single-GPU throughput with fixed-size batches and hidden-state caching. Preliminary results on gridworld and continuous function optimization show favorable sample- and runtime-efficiency scaling with planning budget over a recurrent PPO (RL²) baseline.

Logo of SMZ
Keywords
Programming languages
  • Jupyter Notebook 52%
  • Python 41%
  • YAML 5%
  • Shell 1%
  • Other 1%
License
  • MIT
</>Source code
4TU.
Packages

Reference papers

Contributors

Member of community

4TU