hessQuik.networks

hessQuik Network

class NN(*args: Any, **kwargs: Any)[source]

Bases: Sequential

Wrapper for hessQuik networks built upon torch.nn.Sequential.

__init__(*args)[source]
Parameters:

args – sequence of hessQuik layers to be concatenated

dim_input() int[source]

Number of network input features

dim_output()[source]

Number of network output features

setup_forward_mode(**kwargs)[source]

Setup forward or backward mode.

If kwargs does not include a forward_mode key, then the heuristic is to use forward_mode = True if \(n_{in} < n_{out}\) where \(n_{in}\) is the number of input features and \(n_{out}\) is the number of output features.

There are three possible options once forward_mode is a key of kwargs:

  • If forward_mode = True, then the network computes derivatives during forward propagation.

  • If forward_mode = False, then the network calls the backward routine to compute derivatives after forward propagating.

  • If forward_mode = None, then the network will compute derivatives in backward mode, but will not call the backward routine. This enables concatenation of networks, not just layers.

forward(x: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False, do_Laplacian: bool = False, dudx: Optional[torch.Tensor] = None, d2ud2x: Optional[torch.Tensor] = None, v: Optional[torch.Tensor] = None, **kwargs) Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]][source]

Forward propagate through network and compute derivatives

Parameters:
  • x (torch.Tensor) – input into network of shape \((n_s, d)\) where \(n_s\) is the number of samples and \(d\) is the number of input features

  • do_gradient (bool, optional) – If set to True, the gradient will be computed during the forward call. Default: False

  • do_Hessian (bool, optional) – If set to True, the Hessian will be computed during the forward call. Default: False

  • dudx (torch.Tensor or None) – if forward_mode = True, gradient of features from previous layer with respect to network input \(x\) with shape \((n_s, d, n_{in})\)

  • d2ud2x (torch.Tensor or None) – if forward_mode = True, Hessian of features from previous layer with respect to network input \(x\) with shape \((n_s, d, d, n_{in})\)

  • kwargs – additional options, such as forward_mode as a user input

Returns:

  • f (torch.Tensor) - output features of network with shape \((n_s, m)\) where \(m\) is the number of network output features

  • dfdx (torch.Tensor or None) - if forward_mode = True, gradient of output features with respect to network input \(x\) with shape \((n_s, d, m)\)

  • d2fd2x (torch.Tensor or None) - if forward_mode = True, Hessian of output features with respect to network input \(x\) with shape \((n_s, d, d, m)\)

backward(do_Hessian: bool = False, dgdf: Optional[torch.Tensor] = None, d2gd2f: Optional[torch.Tensor] = None, v: Optional[torch.Tensor] = None) Tuple[torch.Tensor, Optional[torch.Tensor]][source]

Compute derivatives using backward propagation. This method is called during the forward pass if forward_mode = False.

Parameters:
  • do_Hessian (bool, optional) – If set to True, the Hessian will be computed during the forward call. Default: False

  • dgdf (torch.Tensor) – gradient of the subsequent layer features, \(g(f)\), with respect to the layer outputs, \(f\) with shape \((n_s, n_{out}, m)\).

  • d2gd2f (torch.Tensor or None) – gradient of the subsequent layer features, \(g(f)\), with respect to the layer outputs, \(f\) with shape \((n_s, n_{out}, n_{out}, m)\).

Returns:

  • dgdf (torch.Tensor or None) - gradient of the network with respect to input features \(x\) with shape \((n_s, d, m)\)

  • d2gd2f (torch.Tensor or None) - Hessian of the network with respect to input features \(u\) with shape \((n_s, d, d, m)\)

class NNPytorchAD(*args: Any, **kwargs: Any)[source]

Bases: Module

Compute the derivatives of a network using Pytorch’s automatic differentiation.

The implementation follows that of CP Flow.

__init__(net: NN)[source]

Create wrapper around hessQuik network.

Parameters:

net (hessQuik.networks.NN) – hessQuik network

forward(x: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False, **kwargs) Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]][source]

Forward propagate through the hessQuik network without computing derivatives. Then, use automatic differentiation to compute derivatives using torch.autograd.grad.

class NNPytorchHessian(*args: Any, **kwargs: Any)[source]

Bases: Module

Compute the derivatives of a network using Pytorch’s Hessian functional.

__init__(net)[source]

Create wrapper around hessQuik network.

Parameters:

net (hessQuik.networks.NN) – hessQuik network

forward(x: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False, **kwargs) Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]][source]

Forward propagate through the hessQuik network without computing derivatives. Then, use automatic differentiation to compute derivatives using torch.autograd.functional.hessian.

Fully Connected Network

class fullyConnectedNN(*args: Any, **kwargs: Any)[source]

Bases: NN

Fully-connected network where every layer is a single layer. Let \(u_0 = x\) be the input into the network. The construction is of the form

\[\begin{split}\begin{align} u_1 &= \sigma(K_1 u_0 + b_1)\\ u_2 &= \sigma(K_2 u_1 + b_2)\\ &\vdots \\ u_{\ell} &= \sigma(K_{\ell} u_{\ell-1} + b_{\ell}) \end{align}\end{split}\]

where \(\ell\) is the number of layers. Each vector of features \(u_i\) is of size \((n_s, n_i)\) where \(n_s\) is the number of samples and \(n_i\) is the dimension or width of the hidden features on layer \(i\). Users choose the widths of the network and the activation function \(\sigma\).

ICNN Network

class ICNN(*args: Any, **kwargs: Any)[source]

Bases: NN

Input Convex Neural Networks (ICNN) were proposed in the paper Input Convex Neural Networks by Amos, Xu, and Kolter. The network is constructed such that it is convex with respect to the network inputs \(x\). It is constructed via

\[\begin{split}\begin{align} u_1 &= \sigma(K_1 x + b_1)\\ u_2 &= \sigma(L_2^+ u_1 + K_2 x + b_2)\\ &\vdots\\ u_{\ell} &= \sigma(L_{\ell}^+ u_{\ell-1} + K_{\ell} x + b_{\ell}) \end{align}\end{split}\]

where \(\ell\) is the number of layers. Here, \((\cdot)^+\) is a function that forces the matrix to have only nonnegative entries. The activation function \(\sigma\) must be convex and non-decreasing.

Residual Neural Network

class resnetNN(*args: Any, **kwargs: Any)[source]

Bases: NN

Residual neural networks (ResNet) were popularized in the paper Deep Residual Learning for Image Recognition by He et al. Here, every layer is a single layer plus a skip connection. Let \(u_0\) be the input into the ResNet. The construction is of the form

\[\begin{split}\begin{align} u_1 &= u_0 + h\sigma(K_1 u_0 + b_1)\\ u_2 &= u_1 + h\sigma(K_2 x + b_2)\\ &\vdots \\ u_{\ell} &= u_{\ell-1} + h\sigma(K_{\ell} u_{\ell-1} + b_{\ell}) \end{align}\end{split}\]

where \(\ell\) is the number of layers, called the depth of the network. Each vector of features \(u_i\) is of size \((n_s, w)\) where \(n_s\) is the number of samples and \(w\) is the width of the network. Users choose the width and depth of the network and the activation function \(\sigma\).