Guy Jacob
f52ff1784d
Fix breaking change from minio update ( #469 )
...
`ResponseError` replaced by `S3Error` in new minio version
2020-12-15 10:02:16 +02:00
Zach Dwiel
7b0fccb041
Add RedisDataStore ( #295 )
...
* GraphManager.set_session also sets self.sess
* make sure that GraphManager.fetch_from_worker uses training phase
* remove unnecessary phase setting in training worker
* reorganize rollout worker
* provide default name to GlobalVariableSaver.__init__ since it isn't really used anyway
* allow dividing TrainingSteps and EnvironmentSteps
* add timestamps to the log
* added redis data store
* conflict merge fix
2019-08-28 21:15:58 +03:00
Ajay Deshpande
33dc29ee99
Uploading checkpoint if crd provided ( #191 )
...
* Uploading checkpoint if crd provided
* Changing the calculation of total steps because of a recent change in core_types
Fixes #195
2019-04-26 12:27:33 -07:00
Gal Novik
fc6604c09c
added missing license headers
2018-11-27 22:43:40 +02:00
Balaji Subramaniam
d06197f663
Add documentation on distributed Coach. ( #158 )
...
* Added documentation on distributed Coach.
2018-11-27 12:26:15 +02:00
Balaji Subramaniam
bf2036b284
S3 optimization - save only the latest checkpoint. ( #148 )
2018-11-23 22:17:36 -08:00
Balaji Subramaniam
13d2679af4
Sync experiment dir, videos, gifs to S3. ( #147 )
2018-11-23 20:52:12 -08:00
Sina Afrooze
5332013bd1
Implement frame-work agnostic rollout and training workers ( #137 )
...
* Added checkpoint state file to coach checkpointing.
* Removed TF specific code from rollout_worker, training_worker, and s3_data_store
2018-11-23 18:05:44 -08:00
Cody Hsieh
dd18959e53
Don't download when checkpoint files are already present ( #109 )
...
* add check if checkpoint file present
2018-11-21 15:32:53 -08:00
Gal Leibovich
d4d06aaea6
remove kubernetes dependency ( #117 )
2018-11-18 18:10:22 +02:00
Ajay Deshpande
875d6ef017
Adding target reward and target sucess ( #58 )
...
* Adding target reward
* Adding target successs
* Addressing comments
* Using custom_reward_threshold and target_success_rate
* Adding exit message
* Moving success rate to environment
* Making target_success_rate optional
2018-11-12 15:03:43 -08:00
Ajay Deshpande
0f46877d7e
Adding steps and waiting for new checkpoint
2018-10-23 16:55:37 -04:00
Ajay Deshpande
5eac0102de
Changing exception type
2018-10-23 16:54:43 -04:00
Ajay Deshpande
a7f5442015
Adding should_train helper and should_train in graph_manager
2018-10-23 16:54:43 -04:00
Ajay Deshpande
a2e57a44f1
Getting only the model_checkpoint_path files
2018-10-23 16:54:43 -04:00
Ajay Deshpande
052bbc8f19
Adding lock in s3
2018-10-23 16:54:43 -04:00
Balaji Subramaniam
844a5af831
Make distributed coach work end-to-end.
...
- With data store, memory backend and orchestrator interfaces.
2018-10-23 16:54:43 -04:00
Balaji Subramaniam
1c238b4c60
Added data store backend. ( #17 )
...
* Added data store backend.
* Add NFS implementation for Kubernetes.
* Added S3 data store implementation.
* Addressed review comments.
2018-10-23 16:52:16 -04:00