trtpi
Code underlying the publication: Trust-Region Twisted Policy Improvement
Description
TRT-SMC: Trust-Region Twisted Sequential Monte-Carlo — JAX implementation accompanying the ICML 2025 paper "Trust-Region Twisted Policy Improvement". Combines MCTS-inspired design choices (trust-region constrained action sampling, explicit terminal handling, improved policy/value targets) with particle filter planning to construct a GPU-efficient SMC policy improvement operator. Outperforms baseline MCTS and SMC planners in runtime and sample-efficiency across discrete and continuous RL domains. The smz module exposes a modular SMC planner accepting custom Proposal, Transition, and Target implementations, usable as a drop-in policy replacement compatible with jax.jit and jax.vmap.
- MIT