1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-17 11:10:20 +01:00
Commit Graph

36 Commits

Author SHA1 Message Date
Gal Leibovich
2807c29f27 fix for measurements in the initial state (fix for DFP) 2018-05-29 16:47:38 +03:00
itaicaspi-intel
7725dabc86 checkpoints bug fix 2018-05-26 17:49:13 +03:00
itaicaspi-intel
462c6e314b bug fix in nec checkpoint saving 2018-05-24 15:15:33 +03:00
Itai Caspi
d302168c8c Parallel agents fixes (#95)
* Parallel agents related bug fixes: checkpoint restore, tensorboard integration.
Adding narrow networks support.
Reference code for unlimited number of checkpoints
2018-05-24 14:24:19 +03:00
Gal Novik
dafdb05a7c bug fixes for clippedppo and checkpoints 2018-04-30 15:13:29 +03:00
Itai Caspi
52eb159f69 multiple bug fixes in dealing with measurements + CartPole_DFP preset (#92) 2018-04-23 10:44:46 +03:00
Itai Caspi
a7206ed702 Multiple improvements and bug fixes (#66)
* Multiple improvements and bug fixes:

    * Using lazy stacking to save on memory when using a replay buffer
    * Remove step counting for evaluation episodes
    * Reset game between heatup and training
    * Major bug fixes in NEC (is reproducing the paper results for pong now)
    * Image input rescaling to 0-1 is now optional
    * Change the terminal title to be the experiment name
    * Observation cropping for atari is now optional
    * Added random number of noop actions for gym to match the dqn paper
    * Fixed a bug where the evaluation episodes won't start with the max possible ale lives
    * Added a script for plotting the results of an experiment over all the atari games
2018-02-26 12:29:07 +02:00
Zach Dwiel
86362683b1 comment 2018-02-21 10:05:57 -05:00
Zach Dwiel
8fc24a2bbe fix bc_agent 2018-02-21 10:05:57 -05:00
Zach Dwiel
d8f5a35013 fix qr_dqn_agent 2018-02-21 10:05:57 -05:00
Zach Dwiel
e1ad86417f fix n_step_q_agent 2018-02-21 10:05:57 -05:00
Zach Dwiel
5cf10e5f52 fix bug in ddpg 2018-02-21 10:05:57 -05:00
Zach Dwiel
8248caf35e fix more agents 2018-02-21 10:05:57 -05:00
Zach Dwiel
98f57a0d87 fix ddpg 2018-02-21 10:05:57 -05:00
Zach Dwiel
943e41ba58 fix nec_agent 2018-02-21 10:05:57 -05:00
Zach Dwiel
ee6e0bdc3b fix keep_dims -> keepdims 2018-02-21 10:05:57 -05:00
Zach Dwiel
39a28aba95 fix clipped ppo 2018-02-21 10:05:57 -05:00
Zach Dwiel
85afb86893 temp commit 2018-02-21 10:05:57 -05:00
Itai Caspi
55c8c87afc allow visualizing the observation + bug fixes to coach summary 2018-02-15 13:47:14 +02:00
Itai Caspi
ba96e585d2 appending csv's from logger instead of rewriting them 2018-02-12 14:52:50 +02:00
Gal Leibovich
7c8962c991 adding support in tensorboard (#52)
* bug-fix in architecture.py where additional fetches would acquire more entries than it should
* change in run_test to allow ignoring some test(s)
2018-02-05 15:21:49 +02:00
Zach Dwiel
fff8c8f568 provide a helpful error message in the event that an exploration policy returns a vector of actions instead of a single action during value optimization agent 2018-01-20 14:11:24 -05:00
Itai Caspi
eeb3ec5497 fixed the LSTM middleware initialization 2018-01-09 10:26:15 +02:00
Zach Dwiel
6c79a442f2 update nec and value optimization agents to work with recurrent middleware 2018-01-05 20:16:51 -05:00
Zach Dwiel
37e317682b allow missing carla environment and missing matplotlib package 2017-12-20 11:47:14 +02:00
Itai Caspi
125c7ee38d Release 0.9
Main changes are detailed below:

New features -
* CARLA 0.7 simulator integration
* Human control of the game play
* Recording of human game play and storing / loading the replay buffer
* Behavioral cloning agent and presets
* Golden tests for several presets
* Selecting between deep / shallow image embedders
* Rendering through pygame (with some boost in performance)

API changes -
* Improved environment wrapper API
* Added an evaluate flag to allow convenient evaluation of existing checkpoints
* Improve frameskip definition in Gym

Bug fixes -
* Fixed loading of checkpoints for agents with more than one network
* Fixed the N Step Q learning agent python3 compatibility
2017-12-19 19:27:16 +02:00
Itai Caspi
11faf19649 QR-DQN bug fix and imporvements (#30)
* bug fix - QR-DQN using error instead of abs-error in the quantile huber loss

* improvement - QR-DQN sorting the quantile only once instead of batch_size times

* new feature - adding the Breakout QRDQN preset (verified to achieve good results)
2017-11-29 14:01:59 +02:00
galleibo-intel
3c330768f0 Fix for NEC not saving the DND when saving a model 2017-11-09 19:13:23 +02:00
galleibo-intel
f47b8092af fix for intel optimized tensorflow on distributed runs + adding coach_env to .gitignore 2017-11-06 19:41:32 +02:00
Itai Caspi
a8bce9828c new feature - implementation of Quantile Regression DQN (https://arxiv.org/pdf/1710.10044v1.pdf)
API change - Distributional DQN renamed to Categorical DQN
2017-11-01 15:09:07 +02:00
Itai Caspi
913ab75e8a bug fix - preventing crashes when the probability of one of the actions is 0 in the policy head 2017-10-31 10:51:48 +02:00
Itai Caspi
1918f16079 imporved API for getting / setting variables within the graph 2017-10-31 10:51:48 +02:00
cxx
f43c951c2d Unify base class using new-style (object). 2017-10-26 12:33:09 +03:00
Itai Caspi
39cf78074c preventing the evaluation agent from getting stuck in bad policies by updating from the global network during episodes 2017-10-25 10:28:45 +03:00
Gal Leibovich
eb0b57d7fa Updating PPO references per issue #11 2017-10-24 16:57:44 +03:00
Gal Leibovich
1d4c3455e7 coach v0.8.0 2017-10-19 13:10:15 +03:00