NeuroTorch

neurotorch.rl package

Submodules

neurotorch.rl.agent module

class neurotorch.rl.agent.Agent(*, env: Env | None = None, observation_space: Space | None = None, action_space: Space | None = None, behavior_name: str | None = None, policy: BaseModel | None = None, policy_predict_method: str = '__call__', policy_kwargs: Dict[str, Any] | None = None, critic: BaseModel | None = None, critic_predict_method: str = '__call__', critic_kwargs: Dict[str, Any] | None = None, **kwargs)

Bases: Module

__init__(*, env: Env | None = None, observation_space: Space | None = None, action_space: Space | None = None, behavior_name: str | None = None, policy: BaseModel | None = None, policy_predict_method: str = '__call__', policy_kwargs: Dict[str, Any] | None = None, critic: BaseModel | None = None, critic_predict_method: str = '__call__', critic_kwargs: Dict[str, Any] | None = None, **kwargs)

Constructor for BaseAgent class.

Parameters:
  • env (Optional[gym.Env]) – The environment.

  • observation_space (Optional[gym.spaces.Space]) – The observation space. Must be a single space not batched. Must be provided if env is not provided. If env is provided, then this will be ignored.

  • action_space (Optional[gym.spaces.Space]) – The action space. Must be a single space not batched. Must be provided if env is not provided. If env is provided, then this will be ignored.

  • behavior_name (Optional[str]) – The name of the behavior.

  • policy (BaseModel) – The model to use.

  • policy_kwargs (Optional[Dict[str, Any]]) –

    The keyword arguments to pass to the policy if it is created by default. The keywords are:

    • default_hidden_units (List[int]): The default number of hidden units. Defaults to [256].

    • default_activation (str): The default activation function. Defaults to “ReLu”.

    • default_output_activation (str): The default output activation function. Defaults to “Identity”.

    • default_dropout (float): The default dropout rate. Defaults to 0.1.

    • all other keywords are passed to the Sequential constructor.

  • critic (BaseModel) – The value model to use.

  • critic_kwargs (Optional[Dict[str, Any]]) –

    The keyword arguments to pass to the critic if it is created by default. The keywords are:

    • default_hidden_units (List[int]): The default number of hidden units. Defaults to [256].

    • default_activation (str): The default activation function. Defaults to “ReLu”.

    • default_output_activation (str): The default output activation function. Defaults to “Identity”.

    • default_n_values (int): The default number of values to output. Defaults to 1.

    • default_dropout (float): The default dropout rate. Defaults to 0.1.

    • all other keywords are passed to the Sequential constructor.

  • kwargs – Other keyword arguments.

property action_spec: Dict[str, Any]
property continuous_actions: List[str]
copy(requires_grad: bool | None = None) Agent

Copy the agent.

Parameters:

requires_grad (Optional[bool]) – Whether to require gradients.

Returns:

The copied agent.

Return type:

Agent

copy_critic(requires_grad: bool | None = None) BaseModel

Copy the critic to a new instance.

Returns:

The copied critic.

static copy_from_agent(agent: Agent, requires_grad: bool | None = None) Agent

Copy the agent.

Parameters:
  • agent (Agent) – The agent to copy.

  • requires_grad (Optional[bool]) – Whether to require gradients.

Returns:

The copied agent.

Return type:

Agent

copy_policy(requires_grad: bool | None = None) BaseModel

Copy the policy to a new instance.

Returns:

The copied policy.

decay_continuous_action_variances()
property device: device

The device of the agent.

Returns:

The device of the agent.

Return type:

torch.device

property discrete_actions: List[str]
format_batch_discrete_actions(actions: Tensor | Dict[str, Tensor], re_format: str = 'logits', **kwargs) Tensor | Dict[str, Tensor]

Format the batch of actions. If actions is a dict, then it is assumed that the keys are the action names and the values are the actions. In this case, all the values where their keys are in self.discrete_actions will be formatted. If actions is a tensor, then the actions will be formatted if self.discrete_actions is not empty.

TODO: fragment this method into smaller methods.

Parameters:
  • actions – The actions.

  • re_format – The format to reformat the actions to. Can be “logits”, “probs”, “index”, or “one_hot”.

  • kwargs – Keywords arguments.

Returns:

The formatted actions.

forward(*args, **kwargs)

Call the agent.

Returns:

The output of the agent.

get_actions(obs: ndarray | Tensor | Dict[str, ndarray | Tensor], **kwargs) Any

Get the actions for the given observations.

Parameters:
  • obs (Union[np.ndarray, torch.Tensor, Dict[str, Union[np.ndarray, torch.Tensor]]]) – The observations. The observations must be batched.

  • kwargs – Keywords arguments.

Keyword Arguments:
  • re_format (str) – The format to reformat the discrete actions to. Default is “index” which will return the index of the action. For other options see :mth:`format_batch_discrete_actions`.

  • as_numpy (bool) – Whether to return the actions as numpy arrays. Default is True.

Returns:

The actions.

get_continuous_action_covariances()
get_default_checkpoints_meta_path() str

The path to the checkpoints meta file.

Returns:

The path to the checkpoints meta file.

Return type:

str

get_random_actions(n_samples: int = 1, **kwargs) Any
get_values(obs: Tensor, **kwargs) Any

Get the values for the given observations.

Parameters:
  • obs – The batched observations.

  • kwargs – Keywords arguments.

Returns:

The values.

hard_update(policy)
load_checkpoint(checkpoints_meta_path: str | None = None, load_checkpoint_mode: LoadCheckpointMode = LoadCheckpointMode.BEST_ITR, verbose: bool = True) dict

Load the checkpoint from the checkpoints_meta_path. If the checkpoints_meta_path is None, the default checkpoints_meta_path is used.

Parameters:
  • checkpoints_meta_path (Optional[str]) – The path to the checkpoints meta file.

  • load_checkpoint_mode (LoadCheckpointMode) – The mode to use when loading the checkpoint.

  • verbose (bool) – Whether to print the loaded checkpoint information.

Returns:

The loaded checkpoint information.

Return type:

dict

property observation_spec: Dict[str, Any]
set_continuous_action_variances_with_itr(itr: int)
set_default_critic_kwargs()
set_default_policy_kwargs()
soft_update(policy, tau)
to(*args, **kwargs)

Move and/or cast the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)
to(dtype, non_blocking=False)
to(tensor, non_blocking=False)
to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtypes. In addition, this method will only cast the floating point or complex parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters:
  • device (torch.device) – the desired device of the parameters and buffers in this module

  • dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module

  • tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module

  • memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns:

self

Return type:

Module

Examples:

>>> # xdoctest: +IGNORE_WANT("non-deterministic")
>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

neurotorch.rl.buffers module

class neurotorch.rl.buffers.AgentsHistoryMaps(buffer: ReplayBuffer | None = None, **kwargs)

Bases: object

Class to store the mapping between agents and their history maps

trajectories

Mapping between agent ids and their trajectories

Type:

Dict[int, Trajectory]

cumulative_rewards

Mapping between agent ids and their cumulative rewards

Type:

Dict[int, float]

__init__(buffer: ReplayBuffer | None = None, **kwargs)
clear() List[Trajectory]
property cumulative_rewards_as_array: ndarray

The cumulative rewards as an array

Type:

return

property experience_count: int

The number of experiences

Type:

return

property max_abs_rewards: float

The maximum absolute reward

Type:

return

property mean_cumulative_rewards: float

The mean cumulative rewards

Type:

return

propagate_all() List[Trajectory]

Propagate all the trajectories and return the finished ones.

Returns:

All the trajectories.

Return type:

List[Trajectory]

propagate_and_get_all() List[Trajectory]

Propagate all the trajectories and return all the trajectories.

Returns:

All the trajectories

Return type:

List[Trajectory]

property terminals_count: int

The number of terminal steps

Type:

return

update_trajectories_(*, observations, actions, next_observations, rewards, terminals, truncated=None, infos=None, others=None) List[Trajectory]

Updates the trajectories of the agents and returns the trajectories of the agents that have been terminated.

Parameters:
  • observations – The observations

  • actions – The actions

  • next_observations – The next observations

  • rewards – The rewards

  • terminals – The terminals

  • truncated – The truncated

  • infos – The infos

  • others – The others

Returns:

The terminated trajectories.

class neurotorch.rl.buffers.BatchExperience(batch: List[Experience], device: device = device(type='cpu'))

Bases: object

__init__(batch: List[Experience], device: device = device(type='cpu'))

An object that contains a batch of experiences as tensors.

Parameters:
  • batch – A list of Experience objects.

  • device – The device to use for the tensors.

property device
class neurotorch.rl.buffers.Experience(obs: Any, action: Any, reward: float, terminal: bool, next_obs: Any, discounted_reward: float | None = None, advantage: float | None = None, rewards_horizon: List[float] | None = None, others: dict | None = None)

Bases: object

An experience contains the data of one Agent transition. - Observation - Action - Reward - Terminal flag - Next Observation

__init__(obs: Any, action: Any, reward: float, terminal: bool, next_obs: Any, discounted_reward: float | None = None, advantage: float | None = None, rewards_horizon: List[float] | None = None, others: dict | None = None)
property advantage: float
property discounted_reward: float
property metrics
property observation
class neurotorch.rl.buffers.ReplayBuffer(capacity=inf, seed=None, use_priority=False, **kwargs)

Bases: object

__init__(capacity=inf, seed=None, use_priority=False, **kwargs)
property capacity
clear()
property counter
property empty
extend(iterable: Iterable[Experience]) ReplayBuffer
property full
get_batch_generator(batch_size: int, n_batches: int | None = None, randomize: bool = True, device='cpu') Iterator[BatchExperience]

Returns a generator of batch_size elements from the buffer.

get_batch_tensor(batch_size: int, device='cpu') BatchExperience

Returns a list of batch_size elements from the buffer.

get_random_batch(batch_size: int) List[Experience]

Returns a list of batch_size elements from the buffer.

increase_capacity(increment: int)
increment_counter(increment: int = 1)
static load(filename: str) ReplayBuffer
reset_counter()
save(filename: str)
set_seed(seed: int)
start_counter()
stop_counter()
store(element: Experience) ReplayBuffer

Stores an element. If the replay buffer is already full, deletes the oldest element to make space.

class neurotorch.rl.buffers.Trajectory(experiences: List[Experience] | None = None, gamma: float | None = None, **kwargs)

Bases: object

A trajectory is a list of experiences.

__init__(experiences: List[Experience] | None = None, gamma: float | None = None, **kwargs)
append(experience: Experience)
append_and_propagate(experience: Experience)
compute_horizon_rewards()
property cumulative_reward
is_empty()
make_rewards_horizon()
propagate()
propagate_rewards(gamma: float | None = 0.99)

Propagate the rewards to the next experiences.

propagate_values(lmbda: float | None = 0.95)
property propagated
property terminal
property terminal_reward
property terminated
update_others(others_list: List[dict])

neurotorch.rl.curriculum module

class neurotorch.rl.curriculum.CompletionCriteria(measure: str, min_lesson_length: int, threshold: float)

Bases: NamedTuple

Completion criteria for a lesson.

static default_criteria() CompletionCriteria
measure: str

Alias for field number 0

min_lesson_length: int

Alias for field number 1

threshold: float

Alias for field number 2

class neurotorch.rl.curriculum.Curriculum(name: str = 'Curriculum', description: str = '', lessons: List[Lesson] | None = None)

Bases: object

__init__(name: str = 'Curriculum', description: str = '', lessons: List[Lesson] | None = None)
add_lesson(lesson: Lesson)
property channels
property current_lesson: Lesson | None

Returns the current lesson.

property is_completed: bool

Returns True if the curriculum is completed, False otherwise.

property lessons
property map_repr: Dict[str, str]
on_iteration_end(metrics: Dict[str, float]) CurriculumEndIterationOutput

Called when an iteration ends.

on_iteration_start()

Called when an iteration starts.

property teacher_buffer: ReplayBuffer | None

Returns the current teacher buffer.

property teachers
update_channels(channels: List)
update_teachers(teachers: List)
update_teachers_and_channels(other: Curriculum)
class neurotorch.rl.curriculum.CurriculumEndIterationOutput(messages: Dict[str, str], lesson_completed: bool)

Bases: NamedTuple

Output of the curriculum when the end of an iteration is reached.

lesson_completed: bool

Alias for field number 1

messages: Dict[str, str]

Alias for field number 0

class neurotorch.rl.curriculum.Lesson(name, channel, params: Dict[str, float], completion_criteria: CompletionCriteria = CompletionCriteria(measure='Rewards', min_lesson_length=1, threshold=0.9), teacher=None, teacher_strength: float | None = None)

Bases: object

UNPICKLABLE_ATTRIBUTES = ['_teacher', '_channel']
__init__(name, channel, params: Dict[str, float], completion_criteria: CompletionCriteria = CompletionCriteria(measure='Rewards', min_lesson_length=1, threshold=0.9), teacher=None, teacher_strength: float | None = None)
property channel
check_completion_criteria(metrics: Dict[str, float]) bool

Checks if the lesson is completed.

property is_completed

Returns True if the lesson is completed, False otherwise.

on_iteration_end(metrics: Dict[str, float]) bool

Called when an iteration ends.

set_result(result)
start()

Starts the lesson.

property teacher
property teacher_buffer: ReplayBuffer | None

Returns the replay buffer for the lesson.

neurotorch.rl.ppo module

class neurotorch.rl.ppo.PPO(agent: Agent | None = None, optimizer: Optimizer | None = None, **kwargs)

Bases: LearningAlgorithm

Apply the Proximal Policy Optimization algorithm to the given model. The algorithm is described in the paper Proximal Policy Optimization Algorithms <https://arxiv.org/abs/1707.06347>.

CHECKPOINT_OPTIMIZER_STATE_DICT_KEY: str = 'optimizer_state_dict'
__init__(agent: Agent | None = None, optimizer: Optimizer | None = None, **kwargs)

Constructor of the PPO algorithm.

Parameters:
  • agent (Agent) – The agent to train.

  • optimizer (torch.optim.Optimizer) – The optimizer to use.

  • kwargs – Additional keyword arguments.

Keyword Arguments:
  • clip_ratio (float) – The clipping ratio for the policy loss.

  • tau (float) – The smoothing factor for the policy update.

  • gamma (float) – The discount factor.

  • gae_lambda (float) – The lambda parameter for the generalized advantage estimation (GAE).

  • critic_weight (float) – The weight of the critic loss.

  • entropy_weight (float) – The weight of the entropy loss.

  • critic_criterion (torch.nn.Module) – The loss function to use for the critic.

  • advantages=returns-values (bool) – This keyword is introduced to fix a bug when using the GAE. If set to True, the advantages are computed as the returns minus the values. If set to False, the advantages are compute as in the PPO paper. The default value is False and it is recommended to try to set it to True if the agent doesn’t seem to learn.

  • max_grad_norm (float) – The maximum L2 norm of the gradient. Default is 0.5.

property agent
property critic
get_actions_from_batch(batch: BatchExperience) Tensor

Get the actions for the provided batch

get_advantages_from_batch(batch: BatchExperience) Tensor

Computes the advantages for the provided batch

get_checkpoint_state(trainer, **kwargs) object

Get the state of the callback. This is called when the checkpoint manager saves the state of the trainer. Then this state is saved in the checkpoint file with the name of the callback as the key.

Parameters:

trainer (Trainer) – The trainer.

Returns:

The state of the callback.

Return type:

An pickleable object.

get_returns_from_batch(batch: BatchExperience) Tensor

Computes the returns for the provided batch

get_values_from_batch(batch: BatchExperience) Tensor

Computes the values for the provided batch

property last_policy
load_checkpoint_state(trainer, checkpoint: dict, **kwargs)

Loads the state of the callback from a dictionary.

Parameters:
  • trainer (Trainer) – The trainer.

  • checkpoint (dict) – The dictionary containing all the states of the trainer.

Returns:

None

on_iteration_begin(trainer, **kwargs)

Called when an iteration starts. An iteration is defined as one full pass through the training dataset and the validation dataset.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

on_optimization_begin(trainer, **kwargs)

Called when the optimization phase of an iteration starts. The optimization phase is defined as the moment where the model weights are updated.

Parameters:
  • trainer (Trainer) – The trainer.

  • kwargs – Additional arguments.

Keyword Arguments:
  • x – The input data.

  • y – The target data.

  • pred – The predicted data.

Returns:

None

on_optimization_end(trainer, **kwargs)

Called when the optimization phase of an iteration ends. The optimization phase is defined as the moment where the model weights are updated.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

on_pbar_update(trainer, **kwargs) dict

Called when the progress bar is updated.

Parameters:
  • trainer (Trainer) – The trainer.

  • kwargs – Additional arguments.

Returns:

None

on_trajectory_end(trainer, trajectory, **kwargs) List[Dict[str, Any]]

Called when a trajectory ends. This is used in reinforcement learning to update the trajectory loss and metrics. Must return a list of dictionaries containing the trajectory metrics. The list must have the same length as the trajectory. Each item in the list will update the attribute others of the corresponding Experience.

Parameters:
  • trainer (Trainer) – The trainer.

  • trajectory (Trajectory) – The trajectory i.e. the sequence of Experiences.

  • kwargs – Additional arguments.

Returns:

A list of dictionaries containing the trajectory metrics.

property policy
start(trainer, **kwargs)

Called when the training starts. This is the first callback called.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

update_params(batch: BatchExperience) float

Performs a single update of the policy network using the provided optimizer and buffer

neurotorch.rl.rl_academy module

class neurotorch.rl.rl_academy.GenTrajectoriesOutput(buffer, cumulative_rewards, agents_history_maps, trajectories)

Bases: NamedTuple

agents_history_maps: AgentsHistoryMaps

Alias for field number 2

buffer: ReplayBuffer

Alias for field number 0

cumulative_rewards: ndarray

Alias for field number 1

trajectories: List[Trajectory] | None

Alias for field number 3

class neurotorch.rl.rl_academy.RLAcademy(agent: Agent, *, predict_method: str = '__call__', learning_algorithm: LearningAlgorithm | None = None, callbacks: List[BaseCallback] | CallbacksList | BaseCallback | None = None, verbose: bool = True, **kwargs)

Bases: Trainer

CUM_REWARDS_METRIC_KEY = 'cum_rewards'
TERMINAL_REWARDS_METRIC_KEY = 'terminal_rewards'
__init__(agent: Agent, *, predict_method: str = '__call__', learning_algorithm: LearningAlgorithm | None = None, callbacks: List[BaseCallback] | CallbacksList | BaseCallback | None = None, verbose: bool = True, **kwargs)

Constructor for Trainer.

Parameters:
  • model – Model to train.

  • criterion – Loss function(s) to use. Deprecated, use learning_algorithm instead.

  • regularization

    Regularization(s) to use. In NeuroTorch, there are two ways to do regularization: 1. Regularization can be specified in the layers with the ‘update_regularization_loss’ method. This regularization will be performed by the same optimizer as the main loss. This way is useful when you want a regularization that depends on the model output or hidden state. 2. Regularization can be specified in the trainer with the ‘regularization’ parameter. This regularization will be performed by a separate optimizer named ‘regularization_optimizer’. This way is useful when you want a regularization that depends only on the model parameters and when you want to control the learning rate of the regularization independently of the main loss.

    Note: This parameter will be deprecated and remove in a future version. The regularization will be

    specified in the learning algorithm and/or in the callbacks.

  • optimizer – Optimizer to use for the main loss. Deprecated. Use learning_algorithm instead.

  • learning_algorithm – Learning algorithm to use for the main loss. This learning algorithm can be given in the callbacks list as well. If specified, this learning algorithm will be added to the callbacks list. In this case, make sure that the learning algorithm is not added twice. Note that multiple learning algorithms can be used in the callbacks list.

  • regularization_optimizer – Optimizer to use for the regularization loss.

  • metrics – Metrics to compute during training.

  • callbacks – Callbacks to use during training. Each callback will be called at different moments, see the documentation of BaseCallback for more information.

  • device – Device to use for the training. Default is the device of the model.

  • verbose – Whether to print information during training.

  • kwargs – Additional arguments of the training.

Keyword Arguments:
  • n_epochs (int) – The number of epochs to train at each iteration. Default is 1.

  • lr (float) – Learning rate of the main optimizer. Default is 1e-3.

  • reg_lr (float) – Learning rate of the regularization optimizer. Default is 1e-2.

  • weight_decay (float) – Weight decay of the main optimizer. Default is 0.0.

  • exec_metrics_on_train (bool) – Whether to compute metrics on the train dataset. This is useful when you want to save time by not computing the metrics on the train dataset. Default is True.

  • x_transform – Transform to apply to the input data before passing it to the model.

  • y_transform – Transform to apply to the target data before passing it to the model. For example, this can be used to convert the target data to a one-hot encoding or to long tensor using nt.ToTensor(dtype=torch.long).

close()
copy_agent(requires_grad: bool = False) Agent

Copy the agent to a new instance.

Returns:

The copied agent.

copy_policy(requires_grad: bool = False) BaseModel

Copy the policy to a new instance.

Returns:

The copied policy.

property env
generate_trajectories(*, n_trajectories: int | None = None, n_experiences: int | None = None, buffer: ReplayBuffer | None = None, epsilon: float = 0.0, p_bar_position: int = 0, verbose: bool | None = None, **kwargs) GenTrajectoriesOutput

Generate trajectories using the current policy. If the policy of the agent is in evaluation mode, the actions will be chosen with the argmax method. If the policy is in training mode and a random number is generated that is less than epsilon, a random action will be chosen. Otherwise, the action will be chosen by a sample considering the policy output.

Parameters:
  • n_trajectories (int) – Number of trajectories to generate. If not specified, the number of trajectories will be calculated based on the number of experiences.

  • n_experiences (int) – Number of experiences to generate. If not specified, the number of experiences will be calculated based on the number of trajectories.

  • buffer (ReplayBuffer) – The buffer to store the experiences.

  • epsilon (float) – The probability of choosing a random action.

  • p_bar_position (int) – The position of the progress bar.

  • verbose (bool) – Whether to show the progress bar.

  • kwargs – Additional arguments.

Keyword Arguments:
  • env (gym.Env) – The environment to generate the trajectories. Will update the “env” of the current_state.

  • observation – The initial observation. If not specified, the observation will be get from the the objects of the current_state attribute and if not available, the environment will be reset.

  • info – The initial info. If not specified, the info will be get from the objects of the current_state attribute and if not available, the environment will be reset.

Returns:

The buffer with the generated experiences, the cumulative rewards and the mean of terminal rewards.

property policy: BaseModel

The policy of the academy.

Type:

return

reset_agents_history_maps_meta()
static set_default_academy_kwargs(**kwargs) Dict[str, Any]

Set default values for the kwargs of the fit method. :param kwargs:

close_env: Whether to close the environment after the training. n_epochs: Number of epochs to train each iteration. init_lr: Initial learning rate. min_lr: Minimum learning rate. weight_decay: Weight decay. init_epsilon: Initial epsilon. Epsilon is the probability of choosing a random action. epsilon_decay: Epsilon decay. min_epsilon: Minimum epsilon. gamma: Discount factor. tau: Target network update rate. n_batches: Number of batches to train each iteration. batch_size: Batch size. update_freq: Number of steps between each update. curriculum_strength: Strength of the teacher learning strategy.

Returns:

train(env, n_iterations: int | None = None, *, n_epochs: int = 10, load_checkpoint_mode: LoadCheckpointMode | None = None, force_overwrite: bool = False, p_bar_position: int | None = None, p_bar_leave: bool | None = None, **kwargs) TrainingHistory

Train the model.

Parameters:
  • train_dataloader (DataLoader) – The dataloader for the training set. It contains the training data.

  • val_dataloader (Optional[DataLoader]) – The dataloader for the validation set. It contains the validation data.

  • n_iterations (Optional[int]) – The number of iterations to train the model. An iteration is a pass over the training set and the validation set. If None, the model will be trained until the training is stopped by the user.

  • n_epochs (int) – The number of epochs to train the model. An epoch is a pass over the training set. The nomenclature here is different from what is usually used elsewhere. Here, an epoch is a pass over the training set, while an iteration is a pass over the training set and the validation set. In other words, if n_iterations=1 and n_epochs=10, the trainer will pass 10 times over the training set and 1 time over the validation set (this will constitute 1 iteration). If n_iterations=10 and n_epochs=1, the trainer will pass 10 times over the training set and 10 times over the validation set (this will constitute 10 iterations). The nuance between those terms is really important when is comes to reinforcement learning. Default is 1.

  • load_checkpoint_mode (LoadCheckpointMode) – The mode to use when loading the checkpoint.

  • force_overwrite (bool) – Whether to force overwriting the checkpoint. Be careful when using this option, as it will destroy the previous checkpoint folder. Default is False.

  • p_bar_position (Optional[int]) – The position of the progress bar. See tqdm documentation for more information.

  • p_bar_leave (Optional[bool]) – Whether to leave the progress bar. See tqdm documentation for more information.

  • kwargs – Additional keyword arguments.

Returns:

The training history.

neurotorch.rl.utils module

class neurotorch.rl.utils.Linear(input_size: int | Dimension | Iterable[int | Dimension] | Size | None = None, output_size: int | Dimension | Iterable[int | Dimension] | Size | None = None, name: str | None = None, device: device | None = None, **kwargs)

Bases: BaseNeuronsLayer

__init__(input_size: int | Dimension | Iterable[int | Dimension] | Size | None = None, output_size: int | Dimension | Iterable[int | Dimension] | Size | None = None, name: str | None = None, device: device | None = None, **kwargs)

Initialize the layer.; See the BaseLayer class for more details.;

Parameters:
  • input_size (Optional[SizeTypes]) – The input size of the layer;

  • output_size (Optional[SizeTypes]) – The output size of the layer.

  • name (Optional[str]) – The name of the layer.

  • use_recurrent_connection (bool) – Whether to use a recurrent connection. Default is True.

  • use_rec_eye_mask (bool) – Whether to use a recurrent eye mask. Default is False. This mask will be used to mask to zero the diagonal of the recurrent connection matrix.

  • dt (float) – The time step of the layer. Default is 1e-3.

  • kwargs – Other keyword arguments.

Keyword Arguments:
  • regularize (bool) – Whether to regularize the layer. If True, the method update_regularization_loss will be called after each forward pass. Defaults to False.

  • hh_init (str) – The initialization method for the hidden state. Defaults to “zeros”.

  • hh_init_mu (float) – The mean of the hidden state initialization when hh_init is random . Defaults to 0.0.

  • hh_init_std (float) – The standard deviation of the hidden state initialization when hh_init is random. Defaults to 1.0.

  • hh_init_seed (int) – The seed of the hidden state initialization when hh_init is random. Defaults to 0.

  • force_dale_law (bool) – Whether to force the Dale’s law in the layer’s weights. Defaults to False.

  • forward_sign (Union[torch.Tensor, float]) – If force_dale_law is True, this parameter will be used to initialize the forward_sign vector. If it is a float, the forward_sign vector will be initialized with this value as the ration of inhibitory neurons. If it is a tensor, it will be used as the forward_sign vector.

  • recurrent_sign (Union[torch.Tensor, float]) – If force_dale_law is True, this parameter will be used to initialize the recurrent_sign vector. If it is a float, the recurrent_sign vector will be initialized with this value as the ration of inhibitory neurons. If it is a tensor, it will be used as the recurrent_sign vector.

  • sign_activation (Callable) – The activation function used to compute the sign of the weights i.e. the forward_sign and recurrent_sign vectors. Defaults to torch.nn.Tanh.

build() Linear

Build the layer. This method must be call after the layer is initialized to make sure that the layer is ready to be used e.g. the input and output size is set, the weights are initialized, etc.

In this method the forward_weights, recurrent_weights and :attr: rec_mask are created and finally the method initialize_weights_() is called.

Returns:

The layer itself.

Return type:

BaseLayer

create_empty_state(batch_size: int = 1, **kwargs) Tuple[Tensor, ...]

Create an empty state for the layer. This method must be implemented by the child class.

Parameters:

batch_size (int) – The batch size of the state.

Returns:

The empty state.

Return type:

Tuple[torch.Tensor, …]

forward(inputs: Tensor, state: Tuple[Tensor, ...] | None = None, **kwargs)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

initialize_weights_()

Initialize the weights of the layer. This method must be implemented by the child class.

Returns:

None

class neurotorch.rl.utils.TrainingHistoriesMap(curriculum: Curriculum | None = None)

Bases: object

REPORT_KEY = 'report'
__init__(curriculum: Curriculum | None = None)
append(key, value)
concat(other)
max(key=None)
plot(save_path=None, show=False, lesson_idx: int | str | None = None, **kwargs)
plot_history(history_name: str, save_path=None, show=False, **kwargs)
property report_history: TrainingHistory
class neurotorch.rl.utils.TrajectoryRenderer(trajectory: Trajectory, env: Env | None = None, **kwargs)

Bases: object

__init__(trajectory: Trajectory, env: Env | None = None, **kwargs)
check_simulate_is_needed()
render(**kwargs) Tuple[Figure, Axes, FuncAnimation]
simulate()
to_file(file_path: str, fps: int = 30, **kwargs)
to_gif(file_path: str, fps: int = 30, **kwargs)
to_mp4(file_path: str, fps: int = 30, **kwargs)
neurotorch.rl.utils.batch_dict_of_items(x: Any) Any
neurotorch.rl.utils.batch_numpy_actions(actions, env: Env | None = None)
neurotorch.rl.utils.continuous_actions_distribution(actions: Dict | Tensor | ndarray, covariance: Dict | Tensor | ndarray | None = None) Dict | Distribution

Creates a continuous action distribution from the actions and the covariance.

Parameters:
  • actions (Union[Dict, torch.Tensor, np.ndarray]) – The actions.

  • covariance (Optional[Union[Dict, torch.Tensor, np.ndarray]]) – The covariance of the actions. If None, a diagonal covariance is assumed using the variance of the given actions.

Returns:

The action distribution.

Return type:

Union[Dict, torch.distributions.Distribution]

neurotorch.rl.utils.discounted_cumulative_sums(x, discount, axis=-1, **kwargs)
neurotorch.rl.utils.env_batch_render(env: Env, **kwargs) List[Any]

Render the environment in batch mode.

Parameters:

env (gym.Env) – The environment.

neurotorch.rl.utils.env_batch_reset(env: Env) Tuple[ndarray, ndarray]

Reset the environment in batch mode.

Parameters:

env (gym.Env) – The environment.

Returns:

The batch of observations.

Return type:

np.ndarray

neurotorch.rl.utils.env_batch_step(env: Env, actions: Any) Tuple[ndarray, ndarray, ndarray, ndarray, ndarray]

Step the environment in batch mode.

Parameters:
  • env (gym.Env) – The environment.

  • actions (Any) – The actions to take.

Returns:

The batch of observations, rewards, dones, truncated and infos.

Return type:

Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]

neurotorch.rl.utils.format_numpy_actions(actions, env: Env)
neurotorch.rl.utils.get_item_from_batch(x: Any, i: int) Any
neurotorch.rl.utils.get_single_action_space(env: Env) Space

Return the action space of a single environment.

Parameters:

env (gym.Env) – The environment.

Returns:

The action space.

Return type:

gym.spaces.Space

neurotorch.rl.utils.get_single_observation_space(env: Env) Space

Return the observation space of a single environment.

Parameters:

env (gym.Env) – The environment.

Returns:

The observation space.

Return type:

gym.spaces.Space

neurotorch.rl.utils.obs_batch_to_sequence(obs: Tensor | Dict[str, Tensor], as_numpy: bool = False) Sequence[ndarray | Tensor | Dict[str, ndarray | Tensor]]

Convert a batch of observations to a sequence of observations.

Parameters:
  • obs (Union[torch.Tensor, Dict[str, torch.Tensor]]) – The batch of observations.

  • as_numpy (bool) – Whether to convert the observations to numpy arrays.

Returns:

The sequence of observations.

Return type:

Sequence[Union[np.ndarray, torch.Tensor, Dict[str, Union[np.ndarray, torch.Tensor]]]]

neurotorch.rl.utils.obs_sequence_to_batch(obs: Sequence[ndarray | Tensor | Dict[str, ndarray | Tensor]]) Tensor | Dict[str, Tensor]

Convert a sequence of observations to a batch of observations.

Parameters:

obs (Sequence[Union[np.ndarray, torch.Tensor, Dict[str, Union[np.ndarray, torch.Tensor]]]]) – The sequence of observations.

Returns:

The batch of observations.

Return type:

Union[torch.Tensor, Dict[str, torch.Tensor]]

neurotorch.rl.utils.sample_action_space(action_space: Space, re_format: str = 'raw')

Sample an action from the action space.

Parameters:
  • action_space (gym.spaces.Space) – The action space.

  • re_format (str) – The format to return the action in.

Returns:

The sampled action.

Return type:

Any

neurotorch.rl.utils.space_to_continuous_shape(space: Space, flatten_spaces=False) Tuple[int, ...] | Dict[str, Tuple[int, ...]]
neurotorch.rl.utils.space_to_spec(space: Space) Dict[str, Space]

Module contents