SMZ
Code underlying the publication: VariBASed: Variational Bayes-Adaptive Sequential Monte-Carlo Planning for Deep Reinforcement Learning
Description
VariBASeD: Variational Bayes-Adaptive Sequential Monte-Carlo Planning — JAX implementation accompanying "VariBASed: Variational Bayes-Adaptive Sequential Monte-Carlo Planning for Deep Reinforcement Learning" (De Vries, He, Oren, van der Vaart, de Weerdt, Spaan, 2026). Targets approximate Bayes-optimal exploration-exploitation by combining amortized variational belief learning (via S5 state-space models), SMC planning from the trtpi codebase, and meta-RL into a unified EM framework. The E-step uses a particle-filter planner with nested importance-sampling weights correcting for mismatch between variational and true beliefs; the M-step distills SMC-improved policies and fits the belief ELBO jointly. Designed for single-GPU throughput with fixed-size batches and hidden-state caching. Preliminary results on gridworld and continuous function optimization show favorable sample- and runtime-efficiency scaling with planning budget over a recurrent PPO (RL²) baseline.
- MIT