Generate a dataset using an agent (allowing to select between this and a random dataset)
* kNN based model for predicting which actions to drop * fix for seeds with batch rl