coach

gryf/coach

mirror of https://github.com/gryf/coach.git synced 2026-02-01 13:25:45 +01:00

Author	SHA1	Message	Date
Gal Leibovich	2807c29f27	fix for measurements in the initial state (fix for DFP)	2018-05-29 16:47:38 +03:00
itaicaspi-intel	7725dabc86	checkpoints bug fix	2018-05-26 17:49:13 +03:00
itaicaspi-intel	462c6e314b	bug fix in nec checkpoint saving	2018-05-24 15:15:33 +03:00
Itai Caspi	d302168c8c	Parallel agents fixes (#95 ) * Parallel agents related bug fixes: checkpoint restore, tensorboard integration. Adding narrow networks support. Reference code for unlimited number of checkpoints	2018-05-24 14:24:19 +03:00
Gal Novik	dafdb05a7c	bug fixes for clippedppo and checkpoints	2018-04-30 15:13:29 +03:00
Itai Caspi	52eb159f69	multiple bug fixes in dealing with measurements + CartPole_DFP preset (#92 )	2018-04-23 10:44:46 +03:00
Itai Caspi	a7206ed702	Multiple improvements and bug fixes (#66 ) * Multiple improvements and bug fixes: * Using lazy stacking to save on memory when using a replay buffer * Remove step counting for evaluation episodes * Reset game between heatup and training * Major bug fixes in NEC (is reproducing the paper results for pong now) * Image input rescaling to 0-1 is now optional * Change the terminal title to be the experiment name * Observation cropping for atari is now optional * Added random number of noop actions for gym to match the dqn paper * Fixed a bug where the evaluation episodes won't start with the max possible ale lives * Added a script for plotting the results of an experiment over all the atari games	2018-02-26 12:29:07 +02:00
Zach Dwiel	86362683b1	comment	2018-02-21 10:05:57 -05:00
Zach Dwiel	8fc24a2bbe	fix bc_agent	2018-02-21 10:05:57 -05:00
Zach Dwiel	d8f5a35013	fix qr_dqn_agent	2018-02-21 10:05:57 -05:00
Zach Dwiel	e1ad86417f	fix n_step_q_agent	2018-02-21 10:05:57 -05:00
Zach Dwiel	5cf10e5f52	fix bug in ddpg	2018-02-21 10:05:57 -05:00
Zach Dwiel	8248caf35e	fix more agents	2018-02-21 10:05:57 -05:00
Zach Dwiel	98f57a0d87	fix ddpg	2018-02-21 10:05:57 -05:00
Zach Dwiel	943e41ba58	fix nec_agent	2018-02-21 10:05:57 -05:00
Zach Dwiel	ee6e0bdc3b	fix keep_dims -> keepdims	2018-02-21 10:05:57 -05:00
Zach Dwiel	39a28aba95	fix clipped ppo	2018-02-21 10:05:57 -05:00
Zach Dwiel	85afb86893	temp commit	2018-02-21 10:05:57 -05:00
Itai Caspi	55c8c87afc	allow visualizing the observation + bug fixes to coach summary	2018-02-15 13:47:14 +02:00
Itai Caspi	ba96e585d2	appending csv's from logger instead of rewriting them	2018-02-12 14:52:50 +02:00
Gal Leibovich	7c8962c991	adding support in tensorboard (#52 ) * bug-fix in architecture.py where additional fetches would acquire more entries than it should * change in run_test to allow ignoring some test(s)	2018-02-05 15:21:49 +02:00
Zach Dwiel	fff8c8f568	provide a helpful error message in the event that an exploration policy returns a vector of actions instead of a single action during value optimization agent	2018-01-20 14:11:24 -05:00
Itai Caspi	eeb3ec5497	fixed the LSTM middleware initialization	2018-01-09 10:26:15 +02:00
Zach Dwiel	6c79a442f2	update nec and value optimization agents to work with recurrent middleware	2018-01-05 20:16:51 -05:00
Zach Dwiel	37e317682b	allow missing carla environment and missing matplotlib package	2017-12-20 11:47:14 +02:00
Itai Caspi	125c7ee38d	Release 0.9 Main changes are detailed below: New features - * CARLA 0.7 simulator integration * Human control of the game play * Recording of human game play and storing / loading the replay buffer * Behavioral cloning agent and presets * Golden tests for several presets * Selecting between deep / shallow image embedders * Rendering through pygame (with some boost in performance) API changes - * Improved environment wrapper API * Added an evaluate flag to allow convenient evaluation of existing checkpoints * Improve frameskip definition in Gym Bug fixes - * Fixed loading of checkpoints for agents with more than one network * Fixed the N Step Q learning agent python3 compatibility	2017-12-19 19:27:16 +02:00
Itai Caspi	11faf19649	QR-DQN bug fix and imporvements (#30 ) * bug fix - QR-DQN using error instead of abs-error in the quantile huber loss * improvement - QR-DQN sorting the quantile only once instead of batch_size times * new feature - adding the Breakout QRDQN preset (verified to achieve good results)	2017-11-29 14:01:59 +02:00
galleibo-intel	3c330768f0	Fix for NEC not saving the DND when saving a model	2017-11-09 19:13:23 +02:00
galleibo-intel	f47b8092af	fix for intel optimized tensorflow on distributed runs + adding coach_env to .gitignore	2017-11-06 19:41:32 +02:00
Itai Caspi	a8bce9828c	new feature - implementation of Quantile Regression DQN (https://arxiv.org/pdf/1710.10044v1.pdf ) API change - Distributional DQN renamed to Categorical DQN	2017-11-01 15:09:07 +02:00
Itai Caspi	913ab75e8a	bug fix - preventing crashes when the probability of one of the actions is 0 in the policy head	2017-10-31 10:51:48 +02:00
Itai Caspi	1918f16079	imporved API for getting / setting variables within the graph	2017-10-31 10:51:48 +02:00
cxx	f43c951c2d	Unify base class using new-style (object).	2017-10-26 12:33:09 +03:00
Itai Caspi	39cf78074c	preventing the evaluation agent from getting stuck in bad policies by updating from the global network during episodes	2017-10-25 10:28:45 +03:00
Gal Leibovich	eb0b57d7fa	Updating PPO references per issue #11	2017-10-24 16:57:44 +03:00
Gal Leibovich	1d4c3455e7	coach v0.8.0	2017-10-19 13:10:15 +03:00

36 Commits