Friday, February 17, 2012

Player 3.2: adding a crashed network

In my earlier tests I could not find a significant benefit of using a crashed network, even following the GNUbg definition of crashed.

But that was using TD learning on self-play, and I wanted to see whether the improved training using supervised learning would find a significant benefit.

Indeed it did: a noticeable benefit, so I spawned my new best player, Player 3.2, which is like Player 3.1 but adds a crashed network.

Its benchmark scores: Contact ER 14.0, Crashed ER 12.6, and Race ER 2.08 (the same as Player 3.1). (Starting to get close to GNUbg 0-ply! Extrapolating from the regression on Benchmark 2, GNUbg should beat Player 3.2 by only around 0.05ppg.) (Benchmark scores updated after fix to benchmark calculation.)

Adding the crashed network made the biggest difference to the Crashed ER, with a noticeable but smaller improvement in Contact ER. This is a bit different to the experience of the GNUbg team, who saw a relatively small improvement in Crashed ER but a big difference in Contact ER after adding a crashed network, since the contact network could focus its optimization on a smaller set of game layouts.

That said, my player's performance is still reasonably far from GNUbg's. I think there's still some work to do on extra inputs to bridge the gap.

Player 3.2 summary:
  • 120 hidden nodes.
  • Trained using supervised learning on the GNUbg training databases, starting with the Player 3.1 weights (using the contact weights as the initial guess for the crashed weights).
  • Contact, crashed, and race networks.
  • One-sided bearoff database used when both players have all checkers in their home boards.
  • Contact inputs as per Player 2.4, with Berliner prime and hitting shot inputs in addition to the original Tesauro inputs.
  • Crashed inputs are the same as contact inputs.
  • Race inputs are the original Tesauro inputs plus the 14 extra inputs added with Player 3.1.
I trained it for 128 epochs, with an alpha schedule of 1 until the 8th iteration, then 0.32 until the 20th iteration, then 0.1 until the 60th iteration, then 0.032 until the 100th iteration, then 0.01 afterward.

In 100k cubeless money games, Player 3.2 scores +0.021ppg +/- 0.004ppg against Player 3.1. 

In 40k cubeless money games against Benchmark 2 it scores +0.131ppg; the prediction from the multivariate regression is +0.152ppg.

In 40k cubeless money games against PubEval it scores +0.574ppg; the prediction from the multivariate regression is +0.547ppg.

No comments:

Post a Comment