ICAPS 2014 International Probabilistic Planning Competition (IPPC) Discrete Track

General Information

This track (PROBABILISTIC / DISCRETE):

Organization:

Marek Grzes (University of Waterloo)
Jesse Hoey (University of Waterloo)
Scott Sanner (NICTA and the ANU)

Contribution:

Fighting Wildfire Domain. Zhenyu Yu, Tongji University, China
Academic Advising Domain. Libby Knouse (Ferland) and Judy Goldsmith, University of Kentucky, USA

Mailing list:

IPPC 2014 Discrete Track Mailing List

Links to other tracks:

2014 Probabilistic / Continuous track
2014 Deterministic track
2014 Learning track
Past Planning Competitions

IPPC 2014 Final Competition Evaluation Information

Thanks to Amazon for a research grant that covered Amazon Elastic Compute Cloud (EC2) costs for the competitors and organizers.

Final results presentation
Winners of Boolean MDP Track
- 1st place -- PROST 2014 (Keller, Geisser)
  description
  source code
- 2nd place -- G-Pack (Kolobov)
  description
  source code

Winners of Boolean POMDP Track
- 1st place -- POMDPX_NUS (Ye, Wu, Zhang, Hsu, and Lee)
  description
  source code
  competition domains in the POMDPX format
- 2nd place -- KAIST_AIPR_LAB (Han, Nam, Hong, Lee, and Kim)
  description

Archive of log files and tabulated results with README description
Final competition domains/instances: tar.gz or zip
Evaluation code with README instructions

IPPC 2014 Competitors

Boolean MDP Track:

G-Pack: Andrew Kolobov (Microsoft Research)
source code

Gourmand is an LRTDP-based planner described in Kolobov, Mausam, Weld,
"LRTDP vs. UCT for Online Probabilistic Planning", AAAI'12. Its IPPC-2014
version fixes a few bugs in the original Gourmand code available
at http://www.cs.washington.edu/ai/planning/gourmand.html, simplifies
convergence detection in the LRTDP's implementation, and does some simple
action pruning.

LRTDP: Leliane Nunes de Barros, Ricardo Hermann, Felipe W. Trevizan, Karina Valdivia Delgado, University of Sao Paulo, Brazil

PROST 2014: Thomas Keller and Florian Geisser (University of Freiburg)
source code

PROST2014 is implemented in the PROST framework which is
available at http://prost.informatik.uni-freiburg.de. It is based
on the UCT* algorithm as described in Keller and
Helmert, "Trial-based Heuristic Tree Search for Finite Horizon
MDPs", ICAPS '13, and uses the search enhancements unreasonable
action pruning, reward lock detection, and the IDS-based
heuristic (Keller and Eyerich, "PROST: Probabilistic Planning
Based on UCT", ICAPS '12).

The only significant additional enhancement is the used timeout
management system: if UCT* is able to solve the root node before
the total timeout is reached, PROST2014 simulates several runs
internally and determines the best 30 consecutive simulations. It
then executes runs until the achieved result matches (or
outperforms) the simulated result. Simlarly, if time is left
after 30 runs for a reason other than solving the problem (this
is possible due to reward lock states and states with only a
single reasonable action), PROST2014 estimates the total number
of runs it will be able to perform within the total time limit.
It uses that number to determine a good moment to stop by
treating the problem as an instance of the related secretary
problem (i.e., it will execute runs until approximately 37% of
the estimated total number of runs are performed, and then stops
the next time it matches the best result).

PROST 2011: Thomas Keller and Patrick Eyerich (University of Freiburg)

PROST2011 is the UCT-based planner that has participated in
IPPC2011. It is described in Keller and Eyerich, "PROST:
Probabilistic Planning Based on UCT", ICAPS '12. The most
important difference to the IPPC2011 version is that the timeout
management of PROST2014 is used (this was changed due to the fact
that the original timeout management unfortunately does not exist
anymore). Since UCT cannot label nodes as solved, only the second
part (the one on the secretary problem) is relevant.

PPUDD v1 and PPUDD v2: Florent Teichteil-Konigsbuch and Nicolas Drougard (Onera - The French Aerospace Lab)

Planners PPUDD and ATPPUDD are based on the simple idea that
approximating an MDP with a qualitative model of uncertainty like
possibilities that still preserves rankings of how likely events may
occur, can reduce computation complexity without sacrificing too much
to solution quality. This intuition is emphasized by the observation
that decision diagrams used in factored MDPs often blow up due to the
prohibitive number of nodes generated by probabilistic operations.
Starting with a SPUDD representation of the input problem, PPUDD
translates transition and reward decision diagrams to possibilistic
ADDs, then symbolically solves the possibilistic version of Bellman's
equation. Contrary to SPUDD, PPUDD incrementally augments the planning
horizon while maintaining a BDD mask representing the states reachable
from the initial state in order to restrict the current value function
to the reachable states only. While PPUDD is an offline algorithm,
ATPPUDD is an anytime version which learns computation times of
Bellman backups while dispatching the computational effort accordingly
over the remaining planning horizon much like GOURMAND does in the
probabilistic world. The bottleneck of these possibilistic algorithms
resides on the translation from probabilities to possibilities: a
naive automated translation as used currently for the competition may
lead to poor policies in benchmarks with complex reward structures.
Another issue is the size of the input SPUDD domain whose ADD
instantiation before optimization does not even fit into memory for
many difficult benchmarks.

Boolean POMDP Track:

KAIST_AIPR_LAB: Sang-Gyu Han, Bowon Nam, Kanghoon Lee, and Kee-Eung Kim (KAIST)

Our planner integrates two algorithms, symbolic heuristic search value
iteration (Symbolic HSVI) [1] and partially observable Monte-Carlo planning
(POMCP) [2].

Symbolic HSVI is an extension of heuristic search value iteration
(HSVI) [3] which aims to handle factored POMDPs. Our Symbolic HSVI is built
on the same ground as the Symbolic HSVI from IPPC 2011,
but we re-implemented it using C++ rather than JAVA.

To handle factored structure such as factored transition and observation
model, we use the CUDD library [4] which is an open source library for
algebraic decision diagram(ADD). Our Symbolic HSVI is designed to handle
finite horizon POMDP models.

POMCP is an online POMDP planner based on Monte-Carlo simulations.
Because it uses Monte-Carlo simulations, POMCP algorithm can handle larger
domains for both tree search and (particle-filter based) belief state update.

For this competition, we use an ensemble strategy where Symbolic HSVI planner
first tries to solve the problem, and if Symbolic HSVI doesn't seem
to complete the calculation on time, then we move onto the POMCP planner,
which is the executed.

References

[1] Sim, Hyeong Seop, Kee-Eung Kim, et al. "Symbolic Heuristic Search Value
    Iteration for Factored POMDPs." AAAI. 2008.

[2] Silver, David, and Joel Veness. "Monte-Carlo planning in large POMDPs."
    NIPS. 2010.

[3] Smith, Trey, and Reid Simmons. "Heuristic search value iteration for
    POMDPs." Proceedings of the 20th conference on AUAI, 2004.

[4] CUDD : http://vlsi.colorado.edu/~fabio/CUDD/

POMDPX_NUS: Nan Ye, Kegui Wu, Meng Zhang, David Hsu and Wee Sun Lee (N.U.S.)
source code
competition domains in the POMDPX format

DESPOT [1] and POMCP [2] are two state-of-the-art anytime algorithms for
large-scale online POMDP planninng. DESPOT performs heuristic search
on a sparse belief tree, the DESPOT tree, constructed from a set of randomly
sampled scenarios. DESPOT makes use of regularization to avoid overfitting
to the sampled scenarios. POMCP is a Monte-Carlo algorithm, which exploits
UCB-based optimistic action selection strategy to search a belief tree. 

Our competition submission leverages both DESPOT and POMCP. POMCP's optimistic
action selection strategy may work well and lead to fast convergence
to an optimal action, but it may also be overly greedy and result in extremely
poor worst-case performance. DESPOT performs heuristic search on a randomly
sampled sparse belief tree, achieving strong performance in practice
and guaranteeing a much better worst-case performance bound. Our submission
runs DESPOT and POMCP each for a few rounds, and completes the remaining
rounds by running the one with better performance.

[1] Adhiraj Somani, Nan Ye, David Hsu, Wee Sun Lee. DESPOT: Online POMDP
    Planning with Regularization. NIPS 2013. 
[2] D. Silver, and J. Veness. Monte-carlo planning in large POMDPs. NIPS, 2010

Competition format

Identical to IPPC 2011 with the following changes:

We change from a per-trial time limit to a per-instance time limit.
We disallow manual parameter tuning once the competition has begun.
To ensure reproducibility of results under competition constraints, we require a source code release of all competing planners to the organizers shortly after the competition (a wider release is optional, but highly encouraged).

Four domains will be reused from IPPC 2011, four new ones will be introduced for the final competition

Procedure

We will use the Amazon Elastic Compute Cloud (EC2)

You will run your planner on your own personal Linux or Windows node (your choice) hosted on EC2. You need to create an Amazon EC2 account using the following Amazon EC2 instructions: pdf. You can launch a micro instance for free where you can test how to connect to the instance, install software, etc. You will receive instructions later on how to start a large instance that will be used for the actual copetition.

Language

RDDL and various translations, unchanged from IPPC 2011

Timeline

Competitor interest email to organizers: now until Feb 15, 2014
Warm-up competition: Mar 2014
Final competition: June 2014