Evaluation Protocols

EvaluationPipeline

class causal_world.evaluation.EvaluationPipeline(evaluation_protocols, tracker_path=None, world_params=None, task_params=None, visualize_evaluation=False, initial_seed=0)[source]

This class provides functionalities to evaluate a trained policy on a set of protocols

Parameters
  • evaluation_protocols – (list) defines the protocols that will be evaluated in this pipleine.

  • tracker_path – (causal_world.loggers.Tracker) if a tracker was stored during training this can be passed here.

  • world_params – (dict) the world_params to set up the environment, including skip_frame, normalization params..etc.

  • task_params – (dict) the task_params of the Task on which the policy is going to be evaluated.

  • visualize_evaluation – (bool) if the evaluation is visualized in the GUI.

  • initial_seed – (int) the random seed of the evaluation for reproducibility.

__init__(evaluation_protocols, tracker_path=None, world_params=None, task_params=None, visualize_evaluation=False, initial_seed=0)[source]

Initialize self. See help(type(self)) for accurate signature.

evaluate_policy(policy, fraction=1)[source]

Runs the evaluation of a policy and returns a evaluation dictionary with all the scores for each metric for each protocol.

Parameters
  • policy – (func) the policy_fn that takes an observation as argument and returns the inferred action

  • fraction – (float) fraction of episodes to be evaluated w.r.t default (can be higher than one).

Returns

(dict) scores dict for each metric for each protocol.

get_metric_scores()[source]

Returns the metric scores of all metrics in the evaluation pipeline

Returns

(dict) a score dictionary containing the score for each metric name as key.

process_metrics(episode)[source]

Processes an episode to compute all the metrics of the evaluation pipeline.

Parameters

episode – (causal_world.loggers.Episode) The episode to be processed.

Returns

(None)

reset_metric_scores()[source]

Resets the metric scores of each metric object

Returns

run_episode(policy_fn)[source]

Returns the episode information that is accumulated when running a policy

Parameters

policy_fn – (func) the policy_fn that takes an observation as argument and returns the inferred action.

Returns

(causal_world.loggers.Episode) returns the recorded episode.

save_scores(evaluation_path, prefix=None)[source]

Saves the scores dict as json

Parameters
  • evaluation_path – (str) the path where the scores are saved.

  • prefix – (str) an optional prefix to the file name.

Returns

Protocol

class causal_world.evaluation.ProtocolBase(name)[source]

Base Protocol from which each EvaluationProtocol inherits. Default number of evaluation protocols is 200 :param name: (str) name of the protocol

__init__(name)[source]

Initialize self. See help(type(self)) for accurate signature.

get_intervention(episode, timestep)[source]

Returns the interventions that are applied at a given timestep of the episode.

Parameters
  • episode – (int) episode number of the protocol

  • timestep – (int) time step within episode

Returns

(dict) intervention dictionary

get_name()[source]

Returns the name of the protocol

Returns

(str) protocol name

get_num_episodes()[source]

Returns the name of the evaluation episodes in this protocol

Returns

(int) number of episodes in protocol

init_protocol(env, tracker, fraction=1)[source]

Initializes protocol

Parameters
  • env – (CausalWorld) environment

  • tracker – (Tracker)

  • fraction – (float) fraction of episodes to be evaluated using the protocol (can be higher than one)

Returns

ProtocolGenerator

class causal_world.evaluation.protocols.ProtocolGenerator(name, first_level_regex, second_level_regex, variable_space='space_a_b')[source]
__init__(name, first_level_regex, second_level_regex, variable_space='space_a_b')[source]

This specifies a fully random protocol, where an intervention is produced on every exposed variable by uniformly sampling the intervention space.

Parameters
  • name – (str) specifies the name of the protocol to be reported.

  • first_level_regex – (str) specifies the regex for first level of variables.

  • second_level_regex – (str) specifies the regex for second level of variables.

  • variable_space – (str) “space_a”, “space_b” or “space_a_b”.

get_intervention(episode, timestep)[source]

Returns the interventions that are applied at a given timestep of the episode.

Parameters
  • episode – (int) episode number of the protocol

  • timestep – (int) time step within episode

Returns

(dict) intervention dictionary

FullyRandomProtocol

class causal_world.evaluation.protocols.FullyRandomProtocol(name, variable_space='space_a_b')[source]
__init__(name, variable_space='space_a_b')[source]

This specifies a fully random protocol, where an intervention is produced on every exposed variable by uniformly sampling the intervention space.

Parameters
  • name – (str) specifies the name of the protocol to be reported.

  • variable_space – (str) “space_a”, “space_b” or “space_a_b”.

get_intervention(episode, timestep)[source]

Returns the interventions that are applied at a given timestep of the episode.

Parameters
  • episode – (int) episode number of the protocol

  • timestep – (int) time step within episode

Returns

(dict) intervention dictionary