Ctrl K

trtpi

Code underlying the publication: Trust-Region Twisted Policy Improvement

4
contributors

Description

TRT-SMC: Trust-Region Twisted Sequential Monte-Carlo — JAX implementation accompanying the ICML 2025 paper "Trust-Region Twisted Policy Improvement". Combines MCTS-inspired design choices (trust-region constrained action sampling, explicit terminal handling, improved policy/value targets) with particle filter planning to construct a GPU-efficient SMC policy improvement operator. Outperforms baseline MCTS and SMC planners in runtime and sample-efficiency across discrete and continuous RL domains. The smz module exposes a modular SMC planner accepting custom Proposal, Transition, and Target implementations, usable as a drop-in policy replacement compatible with jax.jit and jax.vmap.

Logo of trtpi
Keywords
Programming languages
  • Python 69%
  • Jupyter Notebook 15%
  • YAML 8%
  • Shell 3%
  • Markdown 2%
  • Other 2%
License
  • MIT
</>Source code
4TU.
Packages

Reference papers

Contributors

Member of community

4TU