hessQuik.networks¶
hessQuik Network¶
- class NN(*args: Any, **kwargs: Any)[source]¶
Bases:
Sequential
Wrapper for hessQuik networks built upon torch.nn.Sequential.
- setup_forward_mode(**kwargs)[source]¶
Setup forward or backward mode.
If
kwargs
does not include aforward_mode
key, then the heuristic is to useforward_mode = True
if \(n_{in} < n_{out}\) where \(n_{in}\) is the number of input features and \(n_{out}\) is the number of output features.There are three possible options once
forward_mode
is a key ofkwargs
:If
forward_mode = True
, then the network computes derivatives during forward propagation.If
forward_mode = False
, then the network calls the backward routine to compute derivatives after forward propagating.If
forward_mode = None
, then the network will compute derivatives in backward mode, but will not call the backward routine. This enables concatenation of networks, not just layers.
- forward(x: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False, do_Laplacian: bool = False, dudx: Optional[torch.Tensor] = None, d2ud2x: Optional[torch.Tensor] = None, v: Optional[torch.Tensor] = None, **kwargs) Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]] [source]¶
Forward propagate through network and compute derivatives
- Parameters:
x (torch.Tensor) – input into network of shape \((n_s, d)\) where \(n_s\) is the number of samples and \(d\) is the number of input features
do_gradient (bool, optional) – If set to
True
, the gradient will be computed during the forward call. Default:False
do_Hessian (bool, optional) – If set to
True
, the Hessian will be computed during the forward call. Default:False
dudx (torch.Tensor or
None
) – ifforward_mode = True
, gradient of features from previous layer with respect to network input \(x\) with shape \((n_s, d, n_{in})\)d2ud2x (torch.Tensor or
None
) – ifforward_mode = True
, Hessian of features from previous layer with respect to network input \(x\) with shape \((n_s, d, d, n_{in})\)kwargs – additional options, such as
forward_mode
as a user input
- Returns:
f (torch.Tensor) - output features of network with shape \((n_s, m)\) where \(m\) is the number of network output features
dfdx (torch.Tensor or
None
) - ifforward_mode = True
, gradient of output features with respect to network input \(x\) with shape \((n_s, d, m)\)d2fd2x (torch.Tensor or
None
) - ifforward_mode = True
, Hessian of output features with respect to network input \(x\) with shape \((n_s, d, d, m)\)
- backward(do_Hessian: bool = False, dgdf: Optional[torch.Tensor] = None, d2gd2f: Optional[torch.Tensor] = None, v: Optional[torch.Tensor] = None) Tuple[torch.Tensor, Optional[torch.Tensor]] [source]¶
Compute derivatives using backward propagation. This method is called during the forward pass if
forward_mode = False
.- Parameters:
do_Hessian (bool, optional) – If set to
True
, the Hessian will be computed during the forward call. Default:False
dgdf (torch.Tensor) – gradient of the subsequent layer features, \(g(f)\), with respect to the layer outputs, \(f\) with shape \((n_s, n_{out}, m)\).
d2gd2f (torch.Tensor or
None
) – gradient of the subsequent layer features, \(g(f)\), with respect to the layer outputs, \(f\) with shape \((n_s, n_{out}, n_{out}, m)\).
- Returns:
dgdf (torch.Tensor or
None
) - gradient of the network with respect to input features \(x\) with shape \((n_s, d, m)\)d2gd2f (torch.Tensor or
None
) - Hessian of the network with respect to input features \(u\) with shape \((n_s, d, d, m)\)
- class NNPytorchAD(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
Compute the derivatives of a network using Pytorch’s automatic differentiation.
The implementation follows that of CP Flow.
- __init__(net: NN)[source]¶
Create wrapper around hessQuik network.
- Parameters:
net (hessQuik.networks.NN) – hessQuik network
- forward(x: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False, **kwargs) Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]] [source]¶
Forward propagate through the hessQuik network without computing derivatives. Then, use automatic differentiation to compute derivatives using
torch.autograd.grad
.
- class NNPytorchHessian(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
Compute the derivatives of a network using Pytorch’s Hessian functional.
- __init__(net)[source]¶
Create wrapper around hessQuik network.
- Parameters:
net (hessQuik.networks.NN) – hessQuik network
- forward(x: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False, **kwargs) Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]] [source]¶
Forward propagate through the hessQuik network without computing derivatives. Then, use automatic differentiation to compute derivatives using
torch.autograd.functional.hessian
.
Fully Connected Network¶
- class fullyConnectedNN(*args: Any, **kwargs: Any)[source]¶
Bases:
NN
Fully-connected network where every layer is a single layer. Let \(u_0 = x\) be the input into the network. The construction is of the form
\[\begin{split}\begin{align} u_1 &= \sigma(K_1 u_0 + b_1)\\ u_2 &= \sigma(K_2 u_1 + b_2)\\ &\vdots \\ u_{\ell} &= \sigma(K_{\ell} u_{\ell-1} + b_{\ell}) \end{align}\end{split}\]where \(\ell\) is the number of layers. Each vector of features \(u_i\) is of size \((n_s, n_i)\) where \(n_s\) is the number of samples and \(n_i\) is the dimension or width of the hidden features on layer \(i\). Users choose the widths of the network and the activation function \(\sigma\).
ICNN Network¶
- class ICNN(*args: Any, **kwargs: Any)[source]¶
Bases:
NN
Input Convex Neural Networks (ICNN) were proposed in the paper Input Convex Neural Networks by Amos, Xu, and Kolter. The network is constructed such that it is convex with respect to the network inputs \(x\). It is constructed via
\[\begin{split}\begin{align} u_1 &= \sigma(K_1 x + b_1)\\ u_2 &= \sigma(L_2^+ u_1 + K_2 x + b_2)\\ &\vdots\\ u_{\ell} &= \sigma(L_{\ell}^+ u_{\ell-1} + K_{\ell} x + b_{\ell}) \end{align}\end{split}\]where \(\ell\) is the number of layers. Here, \((\cdot)^+\) is a function that forces the matrix to have only nonnegative entries. The activation function \(\sigma\) must be convex and non-decreasing.
Residual Neural Network¶
- class resnetNN(*args: Any, **kwargs: Any)[source]¶
Bases:
NN
Residual neural networks (ResNet) were popularized in the paper Deep Residual Learning for Image Recognition by He et al. Here, every layer is a single layer plus a skip connection. Let \(u_0\) be the input into the ResNet. The construction is of the form
\[\begin{split}\begin{align} u_1 &= u_0 + h\sigma(K_1 u_0 + b_1)\\ u_2 &= u_1 + h\sigma(K_2 x + b_2)\\ &\vdots \\ u_{\ell} &= u_{\ell-1} + h\sigma(K_{\ell} u_{\ell-1} + b_{\ell}) \end{align}\end{split}\]where \(\ell\) is the number of layers, called the depth of the network. Each vector of features \(u_i\) is of size \((n_s, w)\) where \(n_s\) is the number of samples and \(w\) is the width of the network. Users choose the width and depth of the network and the activation function \(\sigma\).