1
0
mirror of https://github.com/gryf/coach.git synced 2025-12-18 11:40:18 +01:00
Commit Graph

339 Commits

Author SHA1 Message Date
Zach Dwiel
517aac163a introduce graph_manager.phase_context; make sure that calls to graph_manager.train automatically set training phase 2018-10-23 16:57:43 -04:00
Zach Dwiel
7382a142bb remove unused steps parameter from GraphManager.train 2018-10-23 16:57:06 -04:00
Zach Dwiel
97f608ee5e reorder failing presets 2018-10-23 16:57:05 -04:00
Zach Dwiel
ad68fa263d remove property GraphManager.training_start_time 2018-10-23 16:57:05 -04:00
Zach Dwiel
bfc320cf83 disable failing tests for now 2018-10-23 16:57:05 -04:00
Zach Dwiel
01f3a0594b remove return values from GraphManager.act 2018-10-23 16:57:05 -04:00
Zach Dwiel
b02f269464 graph_manager:heatup uses total_steps_counters looping mechanism like other loops. graph_manager:act no longer needs to return any values 2018-10-23 16:57:05 -04:00
Balaji Subramaniam
ca9015d8b1 Make NFS work end-to-end. 2018-10-23 16:55:37 -04:00
Ajay Deshpande
fb1039fcb5 Checkpoint and evaluation optimizations 2018-10-23 16:55:37 -04:00
Ajay Deshpande
b285a02023 Adding parameteres, checking transitions before training 2018-10-23 16:55:37 -04:00
Ajay Deshpande
0f46877d7e Adding steps and waiting for new checkpoint 2018-10-23 16:55:37 -04:00
Ajay Deshpande
0e121c5762 Ignoring redis sub if testing 2018-10-23 16:55:37 -04:00
Ajay Deshpande
7f00235ed5 waiting for a new checkpoint if it's available 2018-10-23 16:54:43 -04:00
Ajay Deshpande
5eac0102de Changing exception type 2018-10-23 16:54:43 -04:00
Ajay Deshpande
a7f5442015 Adding should_train helper and should_train in graph_manager 2018-10-23 16:54:43 -04:00
Ajay Deshpande
a2e57a44f1 Getting only the model_checkpoint_path files 2018-10-23 16:54:43 -04:00
Ajay Deshpande
052bbc8f19 Adding lock in s3 2018-10-23 16:54:43 -04:00
Balaji Subramaniam
844a5af831 Make distributed coach work end-to-end.
- With data store, memory backend and orchestrator interfaces.
2018-10-23 16:54:43 -04:00
Zach Dwiel
9f92064e67 cleanup graph_manager:act 2018-10-23 16:53:32 -04:00
Zach Dwiel
b5305bd075 update dockerfile 2018-10-23 16:52:16 -04:00
Zach Dwiel
950f261201 extract method all_presets 2018-10-23 16:52:16 -04:00
Zach Dwiel
ed3a3b39be add comments 2018-10-23 16:52:16 -04:00
Zach Dwiel
04038c9f40 improve integration test output format 2018-10-23 16:52:16 -04:00
Balaji Subramaniam
1c238b4c60 Added data store backend. (#17)
* Added data store backend.
* Add NFS implementation for Kubernetes.
* Added S3 data store implementation.
* Addressed review comments.
2018-10-23 16:52:16 -04:00
Ajay Deshpande
6b2de6ba6d Adding initial interface for backend and redis pubsub (#19)
* Adding initial interface for backend and redis pubsub

* Addressing comments, adding super in all memories

* Removing distributed experience replay
2018-10-23 16:51:48 -04:00
Zach Dwiel
a54ef2757f ignore deprecation warnings in test logging 2018-10-23 16:51:48 -04:00
Zach Dwiel
acc7f70de3 enumerate each preset as its own test 2018-10-23 16:51:48 -04:00
Zach Dwiel
1e83a27bee update dockerfile and makefile 2018-10-23 16:51:48 -04:00
Zach Dwiel
67faa80ea0 allow custom number of training steps 2018-10-23 16:51:48 -04:00
Zach Dwiel
d69332efd4 fixed bug in training worker 2018-10-23 16:51:48 -04:00
Zach Dwiel
cd733b2404 add support for running kubernetes orchestrator from behind proxy 2018-10-23 16:51:48 -04:00
Zach Dwiel
ad4d2c3053 add make stop_kubernetes 2018-10-23 16:51:48 -04:00
Zach Dwiel
5e85a0f972 use the number of heat up steps specified in schedule parameters 2018-10-23 16:51:48 -04:00
Ajay Deshpande
98850464cc Adding nfs pv, pvc, waiting for memory to be full 2018-10-23 16:50:48 -04:00
Zach Dwiel
13d81f65b9 add redis options to training worker 2018-10-23 16:47:46 -04:00
Zach Dwiel
04f32a0f02 add heatup step to training worker 2018-10-23 16:47:46 -04:00
Zach Dwiel
7c1f0dce4f include registry in image name 2018-10-23 16:47:46 -04:00
Zach Dwiel
0812a94fbd first pass at kubernetes 2018-10-23 16:47:46 -04:00
Zach Dwiel
3328b25549 reenable redis; better error message 2018-10-23 16:47:46 -04:00
Zach Dwiel
009cf670f3 fix simple typos; temporarily disable redis in rollout worker 2018-10-23 16:47:46 -04:00
Zach Dwiel
f5b7122d56 weight for checkpoint before trying to start rollout worker 2018-10-23 16:47:46 -04:00
Zach Dwiel
4352d6735d add training worker 2018-10-23 16:47:46 -04:00
Ajay Deshpande
28926bf2a4 Changing parameters 2018-10-23 16:47:46 -04:00
Ajay Deshpande
c2991819b4 Adding right arguments to the agent 2018-10-23 16:46:04 -04:00
Ajay Deshpande
ad7f031031 Adding dockerfile 2018-10-23 16:46:04 -04:00
Ajay Deshpande
ce9838a7d6 Adding kubernetes orchestrator for rollouts, adding requirements for incremental docker builds 2018-10-23 16:46:04 -04:00
Zach Dwiel
6541bc76b9 working checkpoints 2018-10-23 16:41:57 -04:00
Zach Dwiel
433bc3e27b standardizing variable access 2018-10-23 16:40:33 -04:00
Zach Dwiel
e34b9ae9cf allow specifying preset as a commandline parameter to rollout worker 2018-10-23 16:40:33 -04:00
Zach Dwiel
3714d8ec80 extract functions display_all_presets_and_exit, expand_preset 2018-10-23 16:40:33 -04:00