Actors

DummyActorPolicy

class causal_world.actors.DummyActorPolicy[source]

This is a policy wrapper for a dummy actor, which uses the interface of the actor policy but is basically fed the actions externally, (i.e just using the interface of the actor policy but actions are calculated externally)

__init__()[source]
Parameters

identifier – (str) defines the name of the actor policy

act(obs)[source]

The function is called for the agent to act in the world.

Parameters

obs – (nd.array) defines the observations received by the agent at time step t

Returns

(nd.array) defines the action to be executed at time step t

add_action(action)[source]

The function used to add actions which would be returned further when the act function is called. Can be used if the actions are calculated externally.

Parameters

action – (nd.array) defines the action to be executed at time step t

Returns

BaseActorPolicy

class causal_world.actors.BaseActorPolicy(identifier=None)[source]

This is a policy wrapper for an actor, its functions need to be filled to load the policy such that it can be used by the robot to act in the environment.

__init__(identifier=None)[source]
Parameters

identifier – (str) defines the name of the actor policy

act(obs)[source]

The function is called for the agent to act in the world.

Parameters

obs – (nd.array) defines the observations received by the agent at time step t

Returns

(nd.array) defines the action to be executed at time step t

get_identifier()[source]
Returns

(str) defines the name of the actor policy

reset()[source]

The function is called for the controller to be cleared.

Returns

PushingActorPolicy

class causal_world.actors.PushingActorPolicy[source]
__init__()[source]

This policy is expected to run @83.3 Hz. The policy expects normalized observations and it outputs desired joint positions.

  • This policy is trained with several goals.

act(obs)[source]

The function is called for the agent to act in the world.

Parameters

obs – (nd.array) defines the observations received by the agent at time step t

Returns

(nd.array) defines the action to be executed at time step t

PickingActorPolicy

class causal_world.actors.PickingActorPolicy[source]
__init__()[source]

This policy is expected to run @83.3 Hz. The policy expects normalized observations and it outputs desired joint positions.

  • This policy is trained with several goal heights.

act(obs)[source]

The function is called for the agent to act in the world.

Parameters

obs – (nd.array) defines the observations received by the agent at time step t

Returns

(nd.array) defines the action to be executed at time step t

PickAndPlaceActorPolicy

class causal_world.actors.PickAndPlaceActorPolicy[source]
__init__()[source]

This policy is expected to run @83.3 Hz. The policy expects normalized observations and it outputs desired joint positions.

  • This policy is trained with one goal positions only.

act(obs)[source]

The function is called for the agent to act in the world.

Parameters

obs – (nd.array) defines the observations received by the agent at time step t

Returns

(nd.array) defines the action to be executed at time step t

Stacking2ActorPolicy

class causal_world.actors.Stacking2ActorPolicy[source]
__init__()[source]

This policy is expected to run @83.3 Hz. The policy expects normalized observations and it outputs desired joint positions.

  • This policy is trained with several goal positions.

act(obs)[source]

The function is called for the agent to act in the world.

Parameters

obs – (nd.array) defines the observations received by the agent at time step t

Returns

(nd.array) defines the action to be executed at time step t

ReacherActorPolicy

class causal_world.actors.ReacherActorPolicy[source]
__init__()[source]

This policy is expected to run @83.33 Hz. The policy expects normalized observations and it outputs desired joint positions.

  • This policy is trained with several goals.

act(obs)[source]

The function is called for the agent to act in the world.

Parameters

obs – (nd.array) defines the observations received by the agent at time step t

Returns

(nd.array) defines the action to be executed at time step t

GraspingPolicy

class causal_world.actors.GraspingPolicy(tool_blocks_order)[source]

This policy is expected to run @25 Hz, its a hand designed policy for picking and placing blocks of a specific size 6.5CM weighing 20grams for the best result tried. The policy outputs desired normalized end_effector_positions

Description of phases: - Phase 0: Move finger-center above the cube center of the current

instruction.

  • Phase 1: Lower finger-center down to encircle the target cube, and

    close grip.

  • Phase 2: Move finger-center up again, keeping the grip tight

    (lifting the block).

  • Phase 3: Smoothly move the finger-center toward the goal xy, keeping the

    height constant.

  • Phase 4: Move finger-center vertically toward goal height

    (keeping relative difference of different finger heights given by h0), at the same time loosen the grip (i.e. increasing the radius of the “grip circle”).

  • Phase 5: Move finger center up again

Other variables and values: - alpha: interpolation value between two positions - ds: Distances of finger tips to grip center - t: time between 0 and 1 in current phase - phase: every instruction has 7 phases (described above) - program_counter: The index of the current instruction in the overall

program. Is incremented once the policy has successfully completed all phases.

Hyperparameters:

  • phase_velocity_kthe speed at which phase “k” in the state machine

    progresses.

  • d0_r, d0_gb: Distance of finger tips from grip center while gripping the

    object.

  • gb_angle_spread: Angle between green and blue finger tips along the “grip

    circle”.

  • d1_r, d1_gb: Distance of finger tips from grip center while not gripping

  • h1_r, h1_gb: Height of grip center while moving around

  • h0_r, h0_gb: Height of grip center to which it is lowered while grasping

  • fall_trigger_h: if box is detected below this height when it is supposed

    to be gripped, try grasping it again (reset phase to 0).

__init__(tool_blocks_order)[source]
Parameters

tool_blocks_order – (nd.array) specifies the program where the indicies ranges from 0 to the number of blocks available in the arena.

act(obs)[source]

The function is called for the agent to act in the world.

Parameters

obs – (nd.array) defines the observations received by the agent at time step t

Returns

(nd.array) defines the action to be executed at time step t

reset()[source]

resets the controller

Returns

RandomActorPolicy

class causal_world.actors.RandomActorPolicy(low_bound, upper_bound)[source]

This is a policy wrapper for a random actor.

__init__(low_bound, upper_bound)[source]
Parameters

identifier – (str) defines the name of the actor policy

act(obs)[source]

The function is called for the agent to act in the world.

Parameters

obs – (nd.array) defines the observations received by the agent at time step t

Returns

(nd.array) defines the action to be executed at time step t