Actors¶

DummyActorPolicy¶

class causal_world.actors.DummyActorPolicy[source]¶

This is a policy wrapper for a dummy actor, which uses the interface of the actor policy but is basically fed the actions externally, (i.e just using the interface of the actor policy but actions are calculated externally)

__init__()[source]¶

Parameters

identifier – (str) defines the name of the actor policy

act(obs)[source]¶

The function is called for the agent to act in the world.

Parameters: obs – (nd.array) defines the observations received by the agent at time step t
Returns: (nd.array) defines the action to be executed at time step t

add_action(action)[source]¶

The function used to add actions which would be returned further when the act function is called. Can be used if the actions are calculated externally.

Parameters: action – (nd.array) defines the action to be executed at time step t
Returns

BaseActorPolicy¶

class causal_world.actors.BaseActorPolicy(identifier=None)[source]¶

This is a policy wrapper for an actor, its functions need to be filled to load the policy such that it can be used by the robot to act in the environment.

__init__(identifier=None)[source]¶

Parameters

identifier – (str) defines the name of the actor policy

act(obs)[source]¶

The function is called for the agent to act in the world.

Parameters: obs – (nd.array) defines the observations received by the agent at time step t
Returns: (nd.array) defines the action to be executed at time step t

get_identifier()[source]¶

Returns: (str) defines the name of the actor policy

reset()[source]¶

The function is called for the controller to be cleared.

Returns

PushingActorPolicy¶

class causal_world.actors.PushingActorPolicy[source]¶

__init__()[source]¶

This policy is expected to run @83.3 Hz. The policy expects normalized observations and it outputs desired joint positions.

This policy is trained with several goals.

act(obs)[source]¶

The function is called for the agent to act in the world.

Parameters: obs – (nd.array) defines the observations received by the agent at time step t
Returns: (nd.array) defines the action to be executed at time step t

PickingActorPolicy¶

class causal_world.actors.PickingActorPolicy[source]¶

__init__()[source]¶

This policy is expected to run @83.3 Hz. The policy expects normalized observations and it outputs desired joint positions.

This policy is trained with several goal heights.

act(obs)[source]¶

The function is called for the agent to act in the world.

Parameters: obs – (nd.array) defines the observations received by the agent at time step t
Returns: (nd.array) defines the action to be executed at time step t

PickAndPlaceActorPolicy¶

class causal_world.actors.PickAndPlaceActorPolicy[source]¶

__init__()[source]¶

This policy is expected to run @83.3 Hz. The policy expects normalized observations and it outputs desired joint positions.

This policy is trained with one goal positions only.

act(obs)[source]¶

The function is called for the agent to act in the world.

Parameters: obs – (nd.array) defines the observations received by the agent at time step t
Returns: (nd.array) defines the action to be executed at time step t

Stacking2ActorPolicy¶

class causal_world.actors.Stacking2ActorPolicy[source]¶

__init__()[source]¶

This policy is expected to run @83.3 Hz. The policy expects normalized observations and it outputs desired joint positions.

This policy is trained with several goal positions.

act(obs)[source]¶

The function is called for the agent to act in the world.

Parameters: obs – (nd.array) defines the observations received by the agent at time step t
Returns: (nd.array) defines the action to be executed at time step t

ReacherActorPolicy¶

class causal_world.actors.ReacherActorPolicy[source]¶

__init__()[source]¶

This policy is expected to run @83.33 Hz. The policy expects normalized observations and it outputs desired joint positions.

This policy is trained with several goals.

act(obs)[source]¶

The function is called for the agent to act in the world.

Parameters: obs – (nd.array) defines the observations received by the agent at time step t
Returns: (nd.array) defines the action to be executed at time step t

GraspingPolicy¶

class causal_world.actors.GraspingPolicy(tool_blocks_order)[source]¶

This policy is expected to run @25 Hz, its a hand designed policy for picking and placing blocks of a specific size 6.5CM weighing 20grams for the best result tried. The policy outputs desired normalized end_effector_positions

Description of phases: - Phase 0: Move finger-center above the cube center of the current

instruction.

Phase 1: Lower finger-center down to encircle the target cube, and
close grip.
Phase 2: Move finger-center up again, keeping the grip tight
(lifting the block).
Phase 3: Smoothly move the finger-center toward the goal xy, keeping the
height constant.
Phase 4: Move finger-center vertically toward goal height
(keeping relative difference of different finger heights given by h0), at the same time loosen the grip (i.e. increasing the radius of the “grip circle”).
Phase 5: Move finger center up again

Other variables and values: - alpha: interpolation value between two positions - ds: Distances of finger tips to grip center - t: time between 0 and 1 in current phase - phase: every instruction has 7 phases (described above) - program_counter: The index of the current instruction in the overall

program. Is incremented once the policy has successfully completed all phases.

Hyperparameters:

phase_velocity_kthe speed at which phase “k” in the state machine
progresses.
d0_r, d0_gb: Distance of finger tips from grip center while gripping the
object.
gb_angle_spread: Angle between green and blue finger tips along the “grip
circle”.
d1_r, d1_gb: Distance of finger tips from grip center while not gripping
h1_r, h1_gb: Height of grip center while moving around
h0_r, h0_gb: Height of grip center to which it is lowered while grasping
fall_trigger_h: if box is detected below this height when it is supposed
to be gripped, try grasping it again (reset phase to 0).

__init__(tool_blocks_order)[source]¶

Parameters

tool_blocks_order – (nd.array) specifies the program where the indicies ranges from 0 to the number of blocks available in the arena.

act(obs)[source]¶

The function is called for the agent to act in the world.

Parameters: obs – (nd.array) defines the observations received by the agent at time step t
Returns: (nd.array) defines the action to be executed at time step t

reset()[source]¶

resets the controller

Returns

RandomActorPolicy¶

class causal_world.actors.RandomActorPolicy(low_bound, upper_bound)[source]¶

This is a policy wrapper for a random actor.

__init__(low_bound, upper_bound)[source]¶

Parameters

identifier – (str) defines the name of the actor policy

act(obs)[source]¶

The function is called for the agent to act in the world.

Parameters: obs – (nd.array) defines the observations received by the agent at time step t
Returns: (nd.array) defines the action to be executed at time step t