Evaluation Protocols¶

EvaluationPipeline¶

class causal_world.evaluation.EvaluationPipeline(evaluation_protocols, tracker_path=None, world_params=None, task_params=None, visualize_evaluation=False, initial_seed=0)[source]¶

This class provides functionalities to evaluate a trained policy on a set of protocols

Parameters

evaluation_protocols – (list) defines the protocols that will be evaluated in this pipleine.
tracker_path – (causal_world.loggers.Tracker) if a tracker was stored during training this can be passed here.
world_params – (dict) the world_params to set up the environment, including skip_frame, normalization params..etc.
task_params – (dict) the task_params of the Task on which the policy is going to be evaluated.
visualize_evaluation – (bool) if the evaluation is visualized in the GUI.
initial_seed – (int) the random seed of the evaluation for reproducibility.

__init__(evaluation_protocols, tracker_path=None, world_params=None, task_params=None, visualize_evaluation=False, initial_seed=0)[source]¶: Initialize self. See help(type(self)) for accurate signature.

evaluate_policy(policy, fraction=1)[source]¶

Runs the evaluation of a policy and returns a evaluation dictionary with all the scores for each metric for each protocol.

Parameters

policy – (func) the policy_fn that takes an observation as argument and returns the inferred action
fraction – (float) fraction of episodes to be evaluated w.r.t default (can be higher than one).

Returns

(dict) scores dict for each metric for each protocol.

get_metric_scores()[source]¶

Returns the metric scores of all metrics in the evaluation pipeline

Returns: (dict) a score dictionary containing the score for each metric name as key.

process_metrics(episode)[source]¶

Processes an episode to compute all the metrics of the evaluation pipeline.

Parameters: episode – (causal_world.loggers.Episode) The episode to be processed.
Returns: (None)

reset_metric_scores()[source]¶

Resets the metric scores of each metric object

Returns

run_episode(policy_fn)[source]¶

Returns the episode information that is accumulated when running a policy

Parameters: policy_fn – (func) the policy_fn that takes an observation as argument and returns the inferred action.
Returns: (causal_world.loggers.Episode) returns the recorded episode.

save_scores(evaluation_path, prefix=None)[source]¶

Saves the scores dict as json

Parameters

evaluation_path – (str) the path where the scores are saved.
prefix – (str) an optional prefix to the file name.

Returns

Protocol¶

class causal_world.evaluation.ProtocolBase(name)[source]¶

Base Protocol from which each EvaluationProtocol inherits. Default number of evaluation protocols is 200 :param name: (str) name of the protocol

__init__(name)[source]¶: Initialize self. See help(type(self)) for accurate signature.

get_intervention(episode, timestep)[source]¶

Returns the interventions that are applied at a given timestep of the episode.

Parameters

episode – (int) episode number of the protocol
timestep – (int) time step within episode

Returns

(dict) intervention dictionary

get_name()[source]¶

Returns the name of the protocol

Returns: (str) protocol name

get_num_episodes()[source]¶

Returns the name of the evaluation episodes in this protocol

Returns: (int) number of episodes in protocol

init_protocol(env, tracker, fraction=1)[source]¶

Initializes protocol

Parameters

env – (CausalWorld) environment
tracker – (Tracker)
fraction – (float) fraction of episodes to be evaluated using the protocol (can be higher than one)

Returns

ProtocolGenerator¶

class causal_world.evaluation.protocols.ProtocolGenerator(name, first_level_regex, second_level_regex, variable_space='space_a_b')[source]¶

__init__(name, first_level_regex, second_level_regex, variable_space='space_a_b')[source]¶

This specifies a fully random protocol, where an intervention is produced on every exposed variable by uniformly sampling the intervention space.

Parameters

name – (str) specifies the name of the protocol to be reported.
first_level_regex – (str) specifies the regex for first level of variables.
second_level_regex – (str) specifies the regex for second level of variables.
variable_space – (str) “space_a”, “space_b” or “space_a_b”.

get_intervention(episode, timestep)[source]¶

Returns the interventions that are applied at a given timestep of the episode.

Parameters

episode – (int) episode number of the protocol
timestep – (int) time step within episode

Returns

(dict) intervention dictionary

FullyRandomProtocol¶

class causal_world.evaluation.protocols.FullyRandomProtocol(name, variable_space='space_a_b')[source]¶

__init__(name, variable_space='space_a_b')[source]¶

This specifies a fully random protocol, where an intervention is produced on every exposed variable by uniformly sampling the intervention space.

Parameters

name – (str) specifies the name of the protocol to be reported.
variable_space – (str) “space_a”, “space_b” or “space_a_b”.

get_intervention(episode, timestep)[source]¶

Returns the interventions that are applied at a given timestep of the episode.

Parameters

episode – (int) episode number of the protocol
timestep – (int) time step within episode

Returns

(dict) intervention dictionary