- G-Pack: Andrew Kolobov (Microsoft Research)
source code
Gourmand is an LRTDP-based planner described in Kolobov, Mausam, Weld,
"LRTDP vs. UCT for Online Probabilistic Planning", AAAI'12. Its IPPC-2014
version fixes a few bugs in the original Gourmand code available
at http://www.cs.washington.edu/ai/planning/gourmand.html, simplifies
convergence detection in the LRTDP's implementation, and does some simple
action pruning.
- LRTDP: Leliane Nunes de Barros, Ricardo Hermann, Felipe W. Trevizan,
Karina Valdivia Delgado, University of Sao Paulo, Brazil
- PROST 2014: Thomas Keller and Florian Geisser (University of Freiburg)
source code
PROST2014 is implemented in the PROST framework which is
available at http://prost.informatik.uni-freiburg.de. It is based
on the UCT* algorithm as described in Keller and
Helmert, "Trial-based Heuristic Tree Search for Finite Horizon
MDPs", ICAPS '13, and uses the search enhancements unreasonable
action pruning, reward lock detection, and the IDS-based
heuristic (Keller and Eyerich, "PROST: Probabilistic Planning
Based on UCT", ICAPS '12).
The only significant additional enhancement is the used timeout
management system: if UCT* is able to solve the root node before
the total timeout is reached, PROST2014 simulates several runs
internally and determines the best 30 consecutive simulations. It
then executes runs until the achieved result matches (or
outperforms) the simulated result. Simlarly, if time is left
after 30 runs for a reason other than solving the problem (this
is possible due to reward lock states and states with only a
single reasonable action), PROST2014 estimates the total number
of runs it will be able to perform within the total time limit.
It uses that number to determine a good moment to stop by
treating the problem as an instance of the related secretary
problem (i.e., it will execute runs until approximately 37% of
the estimated total number of runs are performed, and then stops
the next time it matches the best result).
- PROST 2011: Thomas Keller and Patrick Eyerich (University of Freiburg)
PROST2011 is the UCT-based planner that has participated in
IPPC2011. It is described in Keller and Eyerich, "PROST:
Probabilistic Planning Based on UCT", ICAPS '12. The most
important difference to the IPPC2011 version is that the timeout
management of PROST2014 is used (this was changed due to the fact
that the original timeout management unfortunately does not exist
anymore). Since UCT cannot label nodes as solved, only the second
part (the one on the secretary problem) is relevant.
- PPUDD v1 and PPUDD v2: Florent Teichteil-Konigsbuch and Nicolas Drougard (Onera - The
French Aerospace Lab)
Planners PPUDD and ATPPUDD are based on the simple idea that
approximating an MDP with a qualitative model of uncertainty like
possibilities that still preserves rankings of how likely events may
occur, can reduce computation complexity without sacrificing too much
to solution quality. This intuition is emphasized by the observation
that decision diagrams used in factored MDPs often blow up due to the
prohibitive number of nodes generated by probabilistic operations.
Starting with a SPUDD representation of the input problem, PPUDD
translates transition and reward decision diagrams to possibilistic
ADDs, then symbolically solves the possibilistic version of Bellman's
equation. Contrary to SPUDD, PPUDD incrementally augments the planning
horizon while maintaining a BDD mask representing the states reachable
from the initial state in order to restrict the current value function
to the reachable states only. While PPUDD is an offline algorithm,
ATPPUDD is an anytime version which learns computation times of
Bellman backups while dispatching the computational effort accordingly
over the remaining planning horizon much like GOURMAND does in the
probabilistic world. The bottleneck of these possibilistic algorithms
resides on the translation from probabilities to possibilities: a
naive automated translation as used currently for the competition may
lead to poor policies in benchmarks with complex reward structures.
Another issue is the size of the input SPUDD domain whose ADD
instantiation before optimization does not even fit into memory for
many difficult benchmarks.
- KAIST_AIPR_LAB: Sang-Gyu Han, Bowon Nam, Kanghoon Lee, and Kee-Eung Kim (KAIST)
Our planner integrates two algorithms, symbolic heuristic search value
iteration (Symbolic HSVI) [1] and partially observable Monte-Carlo planning
(POMCP) [2].
Symbolic HSVI is an extension of heuristic search value iteration
(HSVI) [3] which aims to handle factored POMDPs. Our Symbolic HSVI is built
on the same ground as the Symbolic HSVI from IPPC 2011,
but we re-implemented it using C++ rather than JAVA.
To handle factored structure such as factored transition and observation
model, we use the CUDD library [4] which is an open source library for
algebraic decision diagram(ADD). Our Symbolic HSVI is designed to handle
finite horizon POMDP models.
POMCP is an online POMDP planner based on Monte-Carlo simulations.
Because it uses Monte-Carlo simulations, POMCP algorithm can handle larger
domains for both tree search and (particle-filter based) belief state update.
For this competition, we use an ensemble strategy where Symbolic HSVI planner
first tries to solve the problem, and if Symbolic HSVI doesn't seem
to complete the calculation on time, then we move onto the POMCP planner,
which is the executed.
References
[1] Sim, Hyeong Seop, Kee-Eung Kim, et al. "Symbolic Heuristic Search Value
Iteration for Factored POMDPs." AAAI. 2008.
[2] Silver, David, and Joel Veness. "Monte-Carlo planning in large POMDPs."
NIPS. 2010.
[3] Smith, Trey, and Reid Simmons. "Heuristic search value iteration for
POMDPs." Proceedings of the 20th conference on AUAI, 2004.
[4] CUDD : http://vlsi.colorado.edu/~fabio/CUDD/
- POMDPX_NUS: Nan Ye, Kegui Wu, Meng Zhang, David Hsu and Wee Sun Lee (N.U.S.)
source code
competition domains in the POMDPX format
DESPOT [1] and POMCP [2] are two state-of-the-art anytime algorithms for
large-scale online POMDP planninng. DESPOT performs heuristic search
on a sparse belief tree, the DESPOT tree, constructed from a set of randomly
sampled scenarios. DESPOT makes use of regularization to avoid overfitting
to the sampled scenarios. POMCP is a Monte-Carlo algorithm, which exploits
UCB-based optimistic action selection strategy to search a belief tree.
Our competition submission leverages both DESPOT and POMCP. POMCP's optimistic
action selection strategy may work well and lead to fast convergence
to an optimal action, but it may also be overly greedy and result in extremely
poor worst-case performance. DESPOT performs heuristic search on a randomly
sampled sparse belief tree, achieving strong performance in practice
and guaranteeing a much better worst-case performance bound. Our submission
runs DESPOT and POMCP each for a few rounds, and completes the remaining
rounds by running the one with better performance.
[1] Adhiraj Somani, Nan Ye, David Hsu, Wee Sun Lee. DESPOT: Online POMDP
Planning with Regularization. NIPS 2013.
[2] D. Silver, and J. Veness. Monte-carlo planning in large POMDPs. NIPS, 2010