NeuroTorch

neurotorch.learning_algorithms package

Submodules

neurotorch.learning_algorithms.bptt module

class neurotorch.learning_algorithms.bptt.BPTT(*, params: Sequence[Parameter] | None = None, layers: Sequence[Module] | Module | None = None, optimizer: Optimizer | None = None, criterion: Dict[str, Module | Callable] | Module | Callable | None = None, **kwargs)

Bases: LearningAlgorithm

Apply the backpropagation through time algorithm to the given model.

CHECKPOINT_OPTIMIZER_STATE_DICT_KEY: str = 'optimizer_state_dict'
DEFAULT_OPTIMIZER_CLS

alias of AdamW

OPTIMIZER_PARAMS_GROUP_IDX = 0
__init__(*, params: Sequence[Parameter] | None = None, layers: Sequence[Module] | Module | None = None, optimizer: Optimizer | None = None, criterion: Dict[str, Module | Callable] | Module | Callable | None = None, **kwargs)

Constructor for BPTT class.

Parameters:
  • params (Optional[Sequence[torch.nn.Parameter]]) – The parameters to optimize. If None, the parameters of the model’s trainer will be used.

  • optimizer (Optional[torch.optim.Optimizer]) – The optimizer to use. If not provided, torch.optim.Adam is used.

  • criterion (Optional[Union[Dict[str, Union[torch.nn.Module, Callable]], torch.nn.Module, Callable]]) – The criterion to use. If not provided, torch.nn.MSELoss is used.

  • kwargs – The keyword arguments to pass to the BaseCallback.

Keyword Arguments:
  • save_state (bool) – Whether to save the state of the optimizer. Defaults to True.

  • load_state (bool) – Whether to load the state of the optimizer. Defaults to True.

  • maximize (bool) – Whether to maximize the loss. Defaults to False.

apply_criterion(pred_batch, y_batch, **kwargs)
create_default_optimizer()

Create the default optimizer.

Returns:

The optimizer to use for training.

extra_repr() str
get_checkpoint_state(trainer, **kwargs) object

Get the state of the callback. This is called when the checkpoint manager saves the state of the trainer. Then this state is saved in the checkpoint file with the name of the callback as the key.

Parameters:

trainer (Trainer) – The trainer.

Returns:

The state of the callback.

Return type:

An pickleable object.

initialize_param_groups()

The learning rate are initialize. If the user has provided a learning rate for each parameter, then it is used.

Returns:

load_checkpoint_state(trainer, checkpoint: dict, **kwargs)

Loads the state of the callback from a dictionary.

Parameters:
  • trainer (Trainer) – The trainer.

  • checkpoint (dict) – The dictionary containing all the states of the trainer.

Returns:

None

on_optimization_begin(trainer, **kwargs)

Called when the optimization phase of an iteration starts. The optimization phase is defined as the moment where the model weights are updated.

Parameters:
  • trainer (Trainer) – The trainer.

  • kwargs – Additional arguments.

Keyword Arguments:
  • x – The input data.

  • y – The target data.

  • pred – The predicted data.

Returns:

None

on_optimization_end(trainer, **kwargs)

Called when the optimization phase of an iteration ends. The optimization phase is defined as the moment where the model weights are updated.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

on_validation_batch_begin(trainer, **kwargs)

Called when the validation batch starts. The validation batch is defined as one forward pass through the network on the validation dataset. This is used to update the batch loss and metrics on the validation dataset.

Parameters:
  • trainer (Trainer) – The trainer.

  • kwargs – Additional arguments.

Keyword Arguments:
  • x – The input data.

  • y – The target data.

  • pred – The predicted data.

Returns:

None

start(trainer, **kwargs)

Called when the training starts. This is the first callback called.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

neurotorch.learning_algorithms.eprop module

class neurotorch.learning_algorithms.eprop.Eprop(*, params: Sequence[Parameter] | None = None, output_params: Sequence[Parameter] | None = None, layers: Sequence[Module] | Module | None = None, output_layers: Sequence[Module] | Module | None = None, **kwargs)

Bases: TBPTT

Apply the eligibility trace forward propagation (e-prop) Bellec et al. [BSS+20] algorithm to the given model.

../../images/learning_algorithms/EpropDiagram.png
Note: If this learning algorithm is used for classification, the output layer should have a log-softmax activation

function and the target should be a one-hot encoded tensor. Then, the loss function should be the negative log likelihood loss function from nt.losses.NLLLoss with the target_as_one_hot argument set to True.

CHECKPOINT_FEEDBACK_WEIGHTS_KEY: str = 'feedback_weights'
CHECKPOINT_OPTIMIZER_STATE_DICT_KEY: str = 'optimizer_state_dict'
DEFAULT_FEEDBACKS_GEN_STRATEGY = 'xavier_normal'
DEFAULT_FEEDBACKS_STR_NORM_CLIP_VALUE = {'kaiming_normal': inf, 'ones': 1.0, 'orthogonal': inf, 'rand': 1.0, 'randn': 1.0, 'unitary': inf, 'xavier_normal': inf}
DEFAULT_OPTIMIZER_CLS

alias of AdamW

DEFAULT_Y_KEY = 'default_key'
FEEDBACKS_GEN_FUNCS = {'kaiming_normal': <function Eprop.<lambda>>, 'ones': <function Eprop.<lambda>>, 'orthogonal': <function Eprop.<lambda>>, 'rand': <function Eprop.<lambda>>, 'randn': <function Eprop.<lambda>>, 'unitary': <function Eprop.<lambda>>, 'xavier_normal': <function Eprop.<lambda>>}
OPTIMIZER_OUTPUT_PARAMS_GROUP_IDX = 1
OPTIMIZER_PARAMS_GROUP_IDX = 0
__init__(*, params: Sequence[Parameter] | None = None, output_params: Sequence[Parameter] | None = None, layers: Sequence[Module] | Module | None = None, output_layers: Sequence[Module] | Module | None = None, **kwargs)

Constructor for Eprop class.

Parameters:
  • params (Optional[Sequence[torch.nn.Parameter]]) – The hidden parameters to optimize. If not provided, eprop will try to find the hidden parameters by looking for the parameters of the layers provided or in the inputs and hidden layers of the model provided in the trainer.

  • output_params (Optional[Sequence[torch.nn.Parameter]]) – The output parameters to optimize. If not provided, eprop will try to find the output parameters by looking for the parameters of the output layers provided or in the output layers of the model provided in the trainer.

  • layers (Optional[Union[Sequence[torch.nn.Module], torch.nn.Module]]) – The hidden layers to optimize. If not provided, eprop will try to find the hidden layers by looking for the layers of the model provided in the trainer.

  • output_layers (Optional[Union[Sequence[torch.nn.Module], torch.nn.Module]]) – The output layers to optimize. If not provided, eprop will try to find the output layers by looking for the layers of the model provided in the trainer.

  • kwargs – Keyword arguments to pass to the parent class and to configure the algorithm.

Keyword Arguments:
  • optimizer (Optional[torch.optim.Optimizer]) –

    The optimizer to use. If provided make sure to provide the param_group in the following format:

    [{“params”: params, “lr”: params_lr}, {“params”: output_params, “lr”: output_params_lr}]

    The index of the group must be the same as the OPTIMIZER_PARAMS_GROUP_IDX and OPTIMIZER_OUTPUT_PARAMS_GROUP_IDX constants which are 0 and 1 respectively. If not provided, torch.optim.Adam is used.

  • criterion (Optional[Union[Dict[str, Union[torch.nn.Module, Callable]], torch.nn.Module, Callable]]) – The criterion to use for the output learning signal. If not provided, torch.nn.MSELoss is used. Note that this criterion will be minimized.

  • params_lr (float) – The learning rate for the hidden parameters. Defaults to 1e-4.

  • output_params_lr (float) – The learning rate for the output parameters. Defaults to 2e-4.

  • eligibility_traces_norm_clip_value (float) – The value to clip the eligibility traces norm to. Defaults to torch.inf.

  • grad_norm_clip_value (float) – The value to clip the gradients norm to. This parameter is used to normalize the gradients of the parameters in order to help the convergence and avoid overflowing. Defaults to 1.0.

  • feedbacks_gen_strategy (str) –

    The strategy to use to generate the feedbacks. Defaults to Eprop.DEFAULT_FEEDBACKS_GEN_STRATEGY which is “xavier_normal”. The available strategies are stored in Eprop.FEEDBACKS_GEN_FUNCS which are:

    • ”randn”: Normal distribution with mean 0 and variance 1.

    • ”xavier_normal”: Xavier normal distribution.

    • ”rand”: Uniform distribution between 0 and 1.

    • ”unitary”: Unitary matrix with normal distribution.

  • nan (float) – The value to use to replace the NaN values in the gradients. Defaults to 0.0.

  • posinf (float) – The value to use to replace the inf values in the gradients. Defaults to 1.0.

  • neginf (float) – The value to use to replace the -inf values in the gradients. Defaults to -1.0.

  • save_state (bool) – Whether to save the state of the optimizer. Defaults to True.

  • load_state (bool) – Whether to load the state of the optimizer. Defaults to True.

  • raise_non_finite_errors (bool) – Whether to raise non-finite errors when detected. Defaults to False.

compute_errors(pred_batch: Dict[str, Tensor] | Tensor, y_batch: Dict[str, Tensor] | Tensor) Dict[str, Tensor]

The errors for each output is computed then inserted in a dict for further use. This function check if the y_batch and pred_batch are given as a dict or a tensor.

Parameters:
  • pred_batch – prediction of the network

  • y_batch – target

Returns:

dict of errors

compute_learning_signals(errors: Dict[str, Tensor]) List[Tensor]

TODO : Determine if we normalize with the number of output when computing the learning signal : If multiple TODO : output layers, do we sum the learning signals or do we average them ? Should we make a ‘reduce’ param? TODO : When averaging, add factor 1/n to the learning signal. It “kind of” results in a change of learning rate. The learning signals are computed using equation (28) from Bellec et al. [BSS+20].

Parameters:

errors – The errors to use to compute the learning signals.

Returns:

List of the learning signals for each parameter.

decorate_forwards()

Ensure that the forward pass is decorated. THe original forward and the hidden layers names are stored. The hidden layers forward method are decorated using :meth: _decorate_hidden_forward. The output layers forward are decorated using :meth: _decorate_output_forward from TBPTT.

Here, we are using decorators to introduce a specific behavior in the forward pass. For E-prop, we need to ensure that the gradient is computed and optimize at each time step t of the sequence. This can be achieved by decorating our forward. However, we do keep in storage the previous forward pass. This is done to ensure that the forward pass is not modified permanently in any way.

Returns:

None

eligibility_traces_zeros_()

Set the eligibility traces to zero.

Returns:

None

get_checkpoint_state(trainer, **kwargs) object

Get the state of the optimizer to be saved in the checkpoint.

Parameters:
  • trainer – The trainer object that is used for training.

  • kwargs – Additional keyword arguments.

Returns:

The state of the optimizer to be saved in the checkpoint.

initialize_feedback_weights(y_batch: Dict[str, Tensor] | Tensor | None = None) Dict[str, List[Tensor]]

TODO : Non-random feedbacks must be implemented with {W_out}.T Initialize the feedback weights for each params. The random feedback is noted B_{ij} in Bellec’s paper Bellec et al. [BSS+20]. The keys of the feedback_weights dictionary are the names of the output layers.

Parameters:

y_batch – The batch of the target values.

Returns:

The feedback weights for each params.

initialize_layers(trainer)

Initialize the layers of the optimizer. Try multiple ways to identify the output layers if those are not provided by the user.

Parameters:

trainer – The trainer object.

Returns:

None

initialize_output_layers(trainer)

Initialize the output layers of the optimizer. Try multiple ways to identify the output layers if those are not provided by the user.

Note:

Must be called before initialize_output_params().

Parameters:

trainer – The trainer object.

Returns:

None.

initialize_output_params(trainer=None)

Initialize the output parameters of the optimizer. Try multiple ways to identify the output parameters if those are not provided by the user.

Note:

Must be called after initialize_output_layers().

Parameters:

trainer – The trainer object.

Returns:

None

initialize_param_groups() List[Dict[str, Any]]

The learning rate are initialize. If the user has provided a learning rate for each parameter, then it is used.

Returns:

the param_groups.

initialize_params(trainer=None)

Initialize the parameters of the optimizer.

Note:

Must be called after initialize_output_params() and initialize_layers().

Parameters:

trainer – The trainer to use.

Returns:

None

load_checkpoint_state(trainer, checkpoint: dict, **kwargs)

Load the state of the optimizer from the checkpoint.

Parameters:
  • trainer – The trainer object that is used for training.

  • checkpoint – The checkpoint dictionary.

  • kwargs – Additional keyword arguments.

Returns:

None

make_feedback_weights(*args, **kwargs)

Generate the feedback weights for each params. The random feedback is noted B_{ij} in Bellec’s paper Bellec et al. [BSS+20].

Returns:

The feedback weights for each params.

on_batch_begin(trainer, **kwargs)

For each batch. Initialize the random feedback weights if not already done. Also, set the eligibility traces to zero.

Parameters:
  • trainer – The trainer object.

  • kwargs – Additional arguments.

Returns:

None

on_batch_end(trainer, **kwargs)

Ensure that there is not any remaining gradients in the output parameters. The forward are undecorated and the gradients are set to zero. The buffer are also cleared.

Parameters:
  • trainer – The trainer to use for computation.

  • kwargs – Additional arguments.

Returns:

None

start(trainer, **kwargs)

Start the training process with E-prop.

Parameters:
  • trainer – The trainer object for the training process with E-prop.

  • kwargs – Additional arguments.

Returns:

None

update_grads(errors: Dict[str, Tensor])

The learning signal is computed. The gradients of the parameters are then updated as seen in equation (28) from Bellec et al. [BSS+20].

Parameters:

errors – The errors to use to compute the learning signals.

Returns:

None

neurotorch.learning_algorithms.learning_algorithm module

class neurotorch.learning_algorithms.learning_algorithm.LearningAlgorithm(*, params: Sequence[Parameter] | None = None, **kwargs)

Bases: BaseCallback

DEFAULT_PRIORITY = 50
__init__(*, params: Sequence[Parameter] | None = None, **kwargs)

Constructor for LearningAlgorithm class.

Parameters:

params (Optional[Sequence[torch.nn.Parameter]]) – The parameters to optimize. If None, the parameters of the model’s trainer will be used.

neurotorch.learning_algorithms.rls module

class neurotorch.learning_algorithms.rls.RLS(*, params: Sequence[Parameter] | None = None, layers: Sequence[Module] | Module | None = None, criterion: Dict[str, Module | Callable] | Module | Callable | None = None, backward_time_steps: int | None = None, is_recurrent: bool = False, **kwargs)

Bases: TBPTT

Apply the recursive least squares algorithm to the given model. Different strategies are available to update the parameters of the model. The strategy is defined by the strategy attribute of the class. The following strategies are available:

  • inputs: The parameters are updated using the inputs of the model.

  • outputs: The parameters are updated using the outputs of the model. This one is inspired by the work of

    Perich and al. Perich et al. [PAS+] with the CURBD algorithm.

  • grad: The parameters are updated using the gradients of the model. This one is inspired by the work of

    Zhang and al. Zhang et al. [ZSZ+].

  • jacobian: The parameters are updated using the Jacobian of the model. This one is inspired by the work of

    Al-Batah and al. Al-Batah et al. [ABMIZA].

  • scaled_jacobian: The parameters are updated using the scaled Jacobian of the model.

Note

The inputs and outputs strategies are limited to an optimization of only one parameter. The others strategies can be used with multiple parameters. Unfortunately, those strategies do not work as expected at the moment. If you want to help with the development of those strategies, please open an issue on GitHub.

CHECKPOINT_OPTIMIZER_STATE_DICT_KEY: str = 'optimizer_state_dict'
CHECKPOINT_P_STATES_DICT_KEY: str = 'P_list'
__init__(*, params: Sequence[Parameter] | None = None, layers: Sequence[Module] | Module | None = None, criterion: Dict[str, Module | Callable] | Module | Callable | None = None, backward_time_steps: int | None = None, is_recurrent: bool = False, **kwargs)

Constructor for RLS class.

Parameters:
  • params (Optional[Sequence[torch.nn.Parameter]]) – The parameters to optimize. If None, the parameters of the model’s trainer will be used.

  • layers (Optional[Union[Sequence[torch.nn.Module], torch.nn.Module]]) – The layers to optimize. If not None the parameters of the layers will be added to the parameters to optimize.

  • criterion (Optional[Union[Dict[str, Union[torch.nn.Module, Callable]], torch.nn.Module, Callable]]) – The criterion to use. If not provided, torch.nn.MSELoss is used.

  • backward_time_steps (Optional[int]) – The frequency of parameter optimisation. If None, the number of time steps of the data will be used.

  • is_recurrent (bool) – If True, the model is recurrent. If False, the model is not recurrent.

  • kwargs – The keyword arguments to pass to the BaseCallback.

Keyword Arguments:
  • save_state (bool) – Whether to save the state of the optimizer. Defaults to True.

  • load_state (bool) – Whether to load the state of the optimizer. Defaults to True.

get_checkpoint_state(trainer, **kwargs) object

Get the state of the callback. This is called when the checkpoint manager saves the state of the trainer. Then this state is saved in the checkpoint file with the name of the callback as the key.

Parameters:

trainer (Trainer) – The trainer.

Returns:

The state of the callback.

Return type:

An pickleable object.

grad_mth_step(x_batch: Tensor, pred_batch: Tensor, y_batch: Tensor)

This method is inspired by the work of Zhang and al. Zhang et al. [ZSZ+]. Unfortunately, this method does not seem to work with the current implementation.

TODO: Make it work.

x.shape = [B, f_in] y.shape = [B, f_out] error.shape = [B, f_out] P.shape = [f_in, f_in]

epsilon = mean[B](error[B, f_out]) -> [1, f_out] phi = mean[B](x[B, f_in]) [1, f_in]

K = P[f_in, f_in] @ phi.T[f_in, 1] -> [f_in, 1] h = 1 / (labda[1] + kappa[1] * phi[1, f_in] @ K[f_in, 1]) -> [1] grad = h[1] * P[f_in, f_in] @ grad[N_in, N_out] -> [N_in, N_out] P = labda[1] * P[f_in, f_in] - h[1] * kappa[1] * K[f_in, 1] @ K.T[1, f_in] -> [f_in, f_in]

In this case f_in must be equal to N_in.

Parameters:
  • x_batch – inputs of the layer

  • pred_batch – outputs of the layer

  • y_batch – targets of the layer

initialize_P_list(m=None)
inputs_mth_step(x_batch: Tensor, pred_batch: Tensor, y_batch: Tensor)

x.shape = [B, f_in] y.shape = [B, f_out] error.shape = [B, f_out] P.shape = [f_in, f_in]

epsilon = mean[B](error[B, f_out]) -> [1, f_out] phi = mean[B](x[B, f_in]) [1, f_in]

K = P[f_in, f_in] @ phi.T[f_in, 1] -> [f_in, 1] h = 1 / (labda[1] + kappa[1] * phi[1, f_in] @ K[f_in, 1]) -> [1] P = labda[1] * P[f_in, f_in] - h[1] * kappa[1] * K[f_in, 1] @ K.T[1, f_in] -> [f_in, f_in] grad = h[1] * K[f_in, 1] @ epsilon[1, f_out] -> [N_in, N_out]

In this case [N_in, N_out] must be equal to [f_in, f_out].

Parameters:
  • x_batch – inputs of the layer

  • pred_batch – outputs of the layer

  • y_batch – targets of the layer

jacobian_mth_step(x_batch: Tensor, pred_batch: Tensor, y_batch: Tensor)

This method is inspired by the work of Al-Batah and al. Al-Batah et al. [ABMIZA]. Unfortunately, this method does not seem to work with the current implementation.

TODO: Make it work.

x.shape = [B, f_in] y.shape = [B, f_out] error.shape = [B, f_out] P.shape = [f_out, f_out] theta.shape = [ell, 1]

epsilon = mean[B](error[B, f_out]) -> [1, f_out] phi = mean[B](y[B, f_out]) [1, f_out] psi = jacobian[theta](phi[1, f_out]]) -> [f_out, L]

K = P[f_out, f_out] @ psi[f_out, L] -> [f_out, L] grad = epsilon[1, f_out] @ K[f_out, L] -> [L, 1] P = labda[1] * P[f_out, f_out] - kappa[1] * K[f_out, L] @ K[f_out, L].T -> [f_out, f_out]

In this case f_in must be equal to N_in.

Parameters:
  • x_batch – inputs of the layer

  • pred_batch – outputs of the layer

  • y_batch – targets of the layer

load_checkpoint_state(trainer, checkpoint: dict, **kwargs)

Loads the state of the callback from a dictionary.

Parameters:
  • trainer (Trainer) – The trainer.

  • checkpoint (dict) – The dictionary containing all the states of the trainer.

Returns:

None

on_batch_begin(trainer, **kwargs)

Called when a batch starts. The batch is defined as one forward pass through the network.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

on_batch_end(trainer, **kwargs)

Called when a batch ends. The batch is defined as one forward pass through the network.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

on_optimization_begin(trainer, **kwargs)

Called when the optimization phase of an iteration starts. The optimization phase is defined as the moment where the model weights are updated.

Parameters:
  • trainer (Trainer) – The trainer.

  • kwargs – Additional arguments.

Keyword Arguments:
  • x – The input data.

  • y – The target data.

  • pred – The predicted data.

Returns:

None

optimization_step(x_batch: Tensor, pred_batch: Tensor, y_batch: Tensor)
outputs_mth_step(x_batch: Tensor, pred_batch: Tensor, y_batch: Tensor)

This method is inspired by the work of Perich and al. Perich et al. [PAS+] with the CURBD algorithm.

x.shape = [B, f_in] y.shape = [B, f_out] error.shape = [B, f_out] epsilon = mean[B](error[B, f_out]) -> [1, f_out] phi = mean[B](y[B, f_out]) [1, f_out]

P.shape = [f_out, f_out] K = P[f_out, f_out] @ phi.T[f_out, 1] -> [f_out, 1] h = 1 / (labda[1] + kappa[1] * phi[1, f_out] @ K[f_out, 1]) -> [1] P = labda[1] * P[f_out, f_out] - h[1] * kappa[1] * K[f_out, 1] @ K.T[1, f_out] -> [f_out, f_out] grad = h[1] * K[f_out, 1] @ epsilon[1, f_out] -> [N_in, N_out]

In this case [N_in, N_out] must be equal to [f_out, f_out].

Parameters:
  • x_batch – inputs of the layer

  • pred_batch – outputs of the layer

  • y_batch – targets of the layer

scaled_jacobian_mth_step(x_batch: Tensor, pred_batch: Tensor, y_batch: Tensor)

This method is inspired by the work of Al-Batah and al. Al-Batah et al. [ABMIZA]. Unfortunately, this method does not seem to work with the current implementation.

TODO: Make it work.

x.shape = [B, f_in] y.shape = [B, f_out] error.shape = [B, f_out] P.shape = [f_out, f_out] theta.shape = [ell, 1]

epsilon = mean[B](error[B, f_out]) -> [1, f_out] phi = mean[B](y[B, f_out]) [1, f_out] psi = jacobian[theta](phi[1, f_out]]) -> [f_out, ell]

K = P[f_out, f_out] @ psi[f_out, ell] -> [f_out, ell] h = 1 / (labda[1] + kappa[1] * psi.T[ell, f_out] @ K[f_out, ell]) -> [ell, ell] grad = (K[f_out, ell] @ h[ell, ell]).T[ell, f_out] @ epsilon.T[f_out, 1] -> [1, ell] P = labda[1] * P[f_out, f_out] - kappa[1] * K[f_out, ell] @ h[ell, ell] @ K[f_out, ell].T -> [f_out, f_out]

In this case f_in must be equal to N_in.

Parameters:
  • x_batch – inputs of the layer

  • pred_batch – outputs of the layer

  • y_batch – targets of the layer

start(trainer, **kwargs)

Called when the training starts. This is the first callback called.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

neurotorch.learning_algorithms.tbptt module

class neurotorch.learning_algorithms.tbptt.TBPTT(*, params: Sequence[Parameter] | None = None, layers: Sequence[Module] | Module | None = None, output_layers: Sequence[Module] | Module | None = None, optimizer: Optimizer | None = None, criterion: Dict[str, Module | Callable] | Module | Callable | None = None, backward_time_steps: int | None = None, optim_time_steps: int | None = None, **kwargs)

Bases: BPTT

Truncated Backpropagation Through Time (TBPTT) algorithm.

__init__(*, params: Sequence[Parameter] | None = None, layers: Sequence[Module] | Module | None = None, output_layers: Sequence[Module] | Module | None = None, optimizer: Optimizer | None = None, criterion: Dict[str, Module | Callable] | Module | Callable | None = None, backward_time_steps: int | None = None, optim_time_steps: int | None = None, **kwargs)

Constructor for TBPTT class.

Parameters:
  • params (Optional[Sequence[torch.nn.Parameter]]) – The parameters to optimize. If None, the parameters of the model’s trainer will be used.

  • layers (Optional[Union[Sequence[torch.nn.Module], torch.nn.Module]]) – The layers to apply the TBPTT algorithm to. If None, the layers of the model’s trainer will be used.

  • output_layers (Optional[Union[Sequence[torch.nn.Module], torch.nn.Module]]) – The layers to use as output layers. If None, the output layers of the model’s trainer will be used.

  • optimizer (Optional[torch.optim.Optimizer]) – The optimizer to use.

  • criterion (Optional[Union[Dict[str, Union[torch.nn.Module, Callable]], torch.nn.Module, Callable]]) – The criterion to use.

  • backward_time_steps (Optional[int]) – The number of time steps to use for the backward pass.

  • optim_time_steps (Optional[int]) – The number of time steps to use for the optimizer step.

  • kwargs – Additional keyword arguments.

Keyword Arguments:
  • auto_backward_time_steps_ratio (float) – The ratio of the number of time steps to use for the backward pass. Defaults to 0.1.

  • auto_optim_time_steps_ratio (float) – The ratio of the number of time steps to use for the optimizer step. Defaults to 0.1.

  • alpha (float) – The alpha value to use for the exponential moving average of the gradients.

  • grad_norm_clip_value (float) – The value to clip the gradients norm to. This parameter is used to normalize the gradients of the parameters in order to help the convergence and avoid overflowing. Defaults to 1.0.

  • nan (float) – The value to use to replace the NaN values in the gradients. Defaults to 0.0.

  • posinf (float) – The value to use to replace the inf values in the gradients. Defaults to 1.0.

  • neginf (float) – The value to use to replace the -inf values in the gradients. Defaults to -1.0.

Raises:
  • AssertionError – If auto_backward_time_steps_ratio is not between 0 and 1.

  • AssertionError – If auto_optim_time_steps_ratio is not between 0 and 1.

close(trainer, **kwargs)

Called when the training ends. This is the last callback called.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

decorate_forwards()
extra_repr() str
initialize_layers(trainer)

Initialize the layers of the optimizer. Try multiple ways to identify the output layers if those are not provided by the user.

Parameters:

trainer – The trainer object.

Returns:

None

initialize_output_layers(trainer)

Initialize the output layers of the optimizer. Try multiple ways to identify the output layers if those are not provided by the user.

Note:

Must be called before initialize_output_params().

Parameters:

trainer – The trainer object.

Returns:

None.

on_batch_begin(trainer, **kwargs)

Called when a batch starts. The batch is defined as one forward pass through the network.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

on_batch_end(trainer, **kwargs)

Called when a batch ends. The batch is defined as one forward pass through the network.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

on_optimization_begin(trainer, **kwargs)

Called when the optimization phase of an iteration starts. The optimization phase is defined as the moment where the model weights are updated.

Parameters:
  • trainer (Trainer) – The trainer.

  • kwargs – Additional arguments.

Keyword Arguments:
  • x – The input data.

  • y – The target data.

  • pred – The predicted data.

Returns:

None

on_optimization_end(trainer, **kwargs)

Called when the optimization phase of an iteration ends. The optimization phase is defined as the moment where the model weights are updated.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

params: List[Parameter]
start(trainer, **kwargs)

Called when the training starts. This is the first callback called.

Parameters:

trainer (Trainer) – The trainer.

Returns:

None

undecorate_forwards()

Module contents