Evaluation Protocols¶
EvaluationPipeline¶
-
class
causal_world.evaluation.
EvaluationPipeline
(evaluation_protocols, tracker_path=None, world_params=None, task_params=None, visualize_evaluation=False, initial_seed=0)[source]¶ This class provides functionalities to evaluate a trained policy on a set of protocols
- Parameters
evaluation_protocols – (list) defines the protocols that will be evaluated in this pipleine.
tracker_path – (causal_world.loggers.Tracker) if a tracker was stored during training this can be passed here.
world_params – (dict) the world_params to set up the environment, including skip_frame, normalization params..etc.
task_params – (dict) the task_params of the Task on which the policy is going to be evaluated.
visualize_evaluation – (bool) if the evaluation is visualized in the GUI.
initial_seed – (int) the random seed of the evaluation for reproducibility.
-
__init__
(evaluation_protocols, tracker_path=None, world_params=None, task_params=None, visualize_evaluation=False, initial_seed=0)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
evaluate_policy
(policy, fraction=1)[source]¶ Runs the evaluation of a policy and returns a evaluation dictionary with all the scores for each metric for each protocol.
- Parameters
policy – (func) the policy_fn that takes an observation as argument and returns the inferred action
fraction – (float) fraction of episodes to be evaluated w.r.t default (can be higher than one).
- Returns
(dict) scores dict for each metric for each protocol.
-
get_metric_scores
()[source]¶ Returns the metric scores of all metrics in the evaluation pipeline
- Returns
(dict) a score dictionary containing the score for each metric name as key.
-
process_metrics
(episode)[source]¶ Processes an episode to compute all the metrics of the evaluation pipeline.
- Parameters
episode – (causal_world.loggers.Episode) The episode to be processed.
- Returns
(None)
Protocol¶
-
class
causal_world.evaluation.
ProtocolBase
(name)[source]¶ Base Protocol from which each EvaluationProtocol inherits. Default number of evaluation protocols is 200 :param name: (str) name of the protocol
-
get_intervention
(episode, timestep)[source]¶ Returns the interventions that are applied at a given timestep of the episode.
- Parameters
episode – (int) episode number of the protocol
timestep – (int) time step within episode
- Returns
(dict) intervention dictionary
-
ProtocolGenerator¶
-
class
causal_world.evaluation.protocols.
ProtocolGenerator
(name, first_level_regex, second_level_regex, variable_space='space_a_b')[source]¶ -
__init__
(name, first_level_regex, second_level_regex, variable_space='space_a_b')[source]¶ This specifies a fully random protocol, where an intervention is produced on every exposed variable by uniformly sampling the intervention space.
- Parameters
name – (str) specifies the name of the protocol to be reported.
first_level_regex – (str) specifies the regex for first level of variables.
second_level_regex – (str) specifies the regex for second level of variables.
variable_space – (str) “space_a”, “space_b” or “space_a_b”.
-
FullyRandomProtocol¶
-
class
causal_world.evaluation.protocols.
FullyRandomProtocol
(name, variable_space='space_a_b')[source]¶ -
__init__
(name, variable_space='space_a_b')[source]¶ This specifies a fully random protocol, where an intervention is produced on every exposed variable by uniformly sampling the intervention space.
- Parameters
name – (str) specifies the name of the protocol to be reported.
variable_space – (str) “space_a”, “space_b” or “space_a_b”.
-