SMZ

Author name

doi:10.4121/c5727097-699a-4479-aaa3-f2ace9417060

Description

VariBASeD: Variational Bayes-Adaptive Sequential Monte-Carlo Planning — JAX implementation accompanying "VariBASed: Variational Bayes-Adaptive Sequential Monte-Carlo Planning for Deep Reinforcement Learning" (De Vries, He, Oren, van der Vaart, de Weerdt, Spaan, 2026). Targets approximate Bayes-optimal exploration-exploitation by combining amortized variational belief learning (via S5 state-space models), SMC planning from the trtpi codebase, and meta-RL into a unified EM framework. The E-step uses a particle-filter planner with nested importance-sampling weights correcting for mismatch between variational and true beliefs; the M-step distills SMC-improved policies and fits the belief ELBO jointly. Designed for single-GPU throughput with fixed-size batches and hidden-state caching. Preliminary results on gridworld and continuous function optimization show favorable sample- and runtime-efficiency scaling with planning budget over a recurrent PPO (RL²) baseline.

SMZ

Description

Reference papers

Contributors

Member of community

SMZ

Description

Reference papers

Other1

Contributors

Member of community