hessQuik.layers¶
hessQuik Layer¶
- class hessQuikLayer(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
Base class for all hessQuik layers.
- forward(u: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False, do_Laplacian: bool = False, forward_mode: bool = True, dudx: Optional[torch.Tensor] = None, d2ud2x: Optional[torch.Tensor] = None, v: Optional[torch.Tensor] = None) Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]] [source]¶
Forward pass through the layer that maps input features \(u\) of size \((n_s, n_{in})\) to output features \(f\) of size \((n_s, n_{out})\) where \(n_s\) is the number of samples, \(n_{in}\) is the number of input features, and \(n_{out}\) is the number of output features.
The input features \(u(x)\) is a function of the network input \(x\) of size \((n_s, d)\) where \(d\) is the dimension of the network input.
- Parameters:
u (torch.Tensor) – features from previous layer with shape \((n_s, n_{in})\)
do_gradient (bool, optional) – If set to
True
, the gradient will be computed during the forward call. Default:False
do_Hessian (bool, optional) – If set to
True
, the Hessian will be computed during the forward call. Default:False
do_Laplacian (bool, optional) – If set to
True
, the Laplacian will be computed during the forward call. Default:False
forward_mode (bool, optional) – If set to
False
, the derivatives will be computed in backward mode. Default:True
dudx (torch.Tensor or
None
) – ifforward_mode = True
, gradient of features from previous layer with respect to network input \(x\) with shape \((n_s, d, n_{in})\)d2ud2x (torch.Tensor or
None
) – ifforward_mode = True
, Hessian of features from previous layer with respect to network input \(x\) with shape \((n_s, d, d, n_{in})\)v (torch.Tensor or
None
) – ifforward_mode = True
, direction(s) to apply Jacobian and Hessian \(x\) with shape \((d, k)\)
- Returns:
f (torch.Tensor) - output features of layer with shape \((n_s, n_{out})\)
dfdx (torch.Tensor or
None
) - ifforward_mode = True
, gradient of output features with respect to network input \(x\) with shape \((n_s, d, n_{out})\)d2fd2x (torch.Tensor or
None
) - ifforward_mode = True
, Hessian of output features with respect to network input \(x\) with shape \((n_s, d, d, n_{out})\)
- backward(do_Hessian: bool = False, dgdf: Optional[torch.Tensor] = None, d2gd2f: Optional[torch.Tensor] = None, v: Optional[torch.Tensor] = None) Tuple[torch.Tensor, Optional[torch.Tensor]] [source]¶
Backward pass through the layer that maps the gradient of the network \(g\) with respect to the output features \(f\) of size \((n_s, n_{out}, m)\) to the gradient of the network \(g\) with respect to the input features \(u\) of size \((n_s, n_{in}, m)\) where \(n_s\) is the number of samples, \(n_{in}\) is the number of input features,, \(n_{out}\) is the number of output features, and \(m\) is the number of network output features.
- Parameters:
do_Hessian (bool, optional) – If set to
True
, the Hessian will be computed during the forward call. Default:False
dgdf (torch.Tensor) – gradient of the subsequent layer features, \(g(f)\), with respect to the layer outputs, \(f\) with shape \((n_s, n_{out}, m)\).
d2gd2f (torch.Tensor or
None
) – gradient of the subsequent layer features, \(g(f)\), with respect to the layer outputs, \(f\) with shape \((n_s, n_{out}, n_{out}, m)\).v (torch.Tensor or
None
) – direction(s) to apply Jacobian transpose and Hessian \(x\) with shape \((d, k)\)
- Returns:
dgdu (torch.Tensor or
None
) - gradient of the network with respect to input features \(u\) with shape \((n_s, n_{in}, m)\)d2gd2u (torch.Tensor or
None
) - Hessian of the network with respect to input features \(u\) with shape \((n_s, n_{in}, n_{in}, m)\)
ICNN Layer¶
- class ICNNLayer(*args: Any, **kwargs: Any)[source]¶
Bases:
hessQuikLayer
Evaluate and compute derivatives of a single layer.
Examples:
>>> import torch, hessQuik.layers as lay >>> f = lay.ICNNLayer(4, None, 7) >>> x = torch.randn(10, 4) >>> fx, dfdx, d2fd2x = f(x, do_gradient=True, do_Hessian=True) >>> print(fx.shape, dfdx.shape, d2fd2x.shape) torch.Size([10, 11]) torch.Size([10, 4, 11]) torch.Size([10, 4, 4, 11])
- __init__(input_dim: int, in_features: Optional[int], out_features: int, act: hessQuikActivationFunction = torch.nn.Module, bias: bool = True, device=None, dtype=None) None [source]¶
- Parameters:
input_dim (int) – dimension of network inputs
in_features (int or``None``) – number of input features. For first ICNN layer, set
in_features = None
out_features (int) – number of output features
act (hessQuikActivationFunction) – activation function
- Variables:
K – weight matrix for the network inputs of size \((d, n_{out})\)
b – bias vector of size \((n_{out},)\)
L – weight matrix for the input features of size \((n_{in}, n_{out})\)
nonneg – pointwise function to force \(l\) to have nonnegative weights. Default:
torch.nn.functional.softplus
- forward(ux, do_gradient=False, do_Hessian=False, do_Laplacian=False, forward_mode=True, dudx=None, d2ud2x=None, v=None)[source]¶
Forward propagation through ICNN layer of the form
\[\begin{split}f(x) = \left[\begin{array}{c} \sigma\left(\left[\begin{array}{c}u(x) & x\end{array}\right] \left[\begin{array}{c}L^+ \\ K\end{array}\right] + b\right) & x \end{array}\right]\end{split}\]Here, \(u(x)\) is the input into the layer of size \((n_s, n_{in})\) which is a function of the input of the network, \(x\) of size \((n_s, d)\). The output features, \(f(x)\), are of size \((n_s, n_{out} + d)\). The notation \((\cdot)^+\) is a function that makes the weights of a matrix nonnegative.
As an example, for one sample, \(n_s = 1\), the gradient with respect to \(\begin{bmatrix} u & x \end{bmatrix}\) is of the form
\[\begin{split}\nabla_x f = \text{diag}\left(\sigma'\left(\left[\begin{array}{c}u(x) & x\end{array}\right] \left[\begin{array}{c}L^+ \\ K\end{array}\right] + b\right)\right) \left[\begin{array}{c}(L^+)^\top & K^\top\end{array}\right] \left[\begin{array}{c}\nabla_x u \\ I\end{array}\right]\end{split}\]where \(\text{diag}\) transforms a vector into the entries of a diagonal matrix and \(I\) is the \(d \times d\) identity matrix.
- backward(do_Hessian=False, dgdf=None, d2gd2f=None, v=None)[source]¶
Backward propagation through ICNN layer of the form
\[\begin{split}f(u) = \left[\begin{array}{c} \sigma\left(\left[\begin{array}{c}u & x\end{array}\right] \left[\begin{array}{c}L^+ \\ K\end{array}\right] + b\right) & x \end{array}\right]\end{split}\]Here, the network is \(g\) is a function of \(f(u)\).
As an example, for one sample, \(n_s = 1\), the gradient of the network with respect to \(u\) is of the form
\[\begin{split}\nabla_{[u,x]} g = \left(\sigma'\left(\left[\begin{array}{c}u & x\end{array}\right] \left[\begin{array}{c}L^+ \\ K\end{array}\right] + b\right) \odot \nabla_{[f, x]} g\right) \left[\begin{array}{c}(L^+)^\top & K^\top\end{array}\right]\end{split}\]where \(\odot\) denotes the pointwise product.
Quadratic ICNN Layer¶
- class quadraticICNNLayer(*args: Any, **kwargs: Any)[source]¶
Bases:
hessQuikLayer
Evaluate and compute derivatives of a ICNN quadratic layer.
Examples:
>>> import hessQuik.layers as lay >>> f = lay.quadraticICNNLayer(4, None, 2) >>> x = torch.randn(10, 4) >>> fx, dfdx, d2fd2x = f(x, do_gradient=True, do_Hessian=True) >>> print(fx.shape, dfdx.shape, d2fd2x.shape) torch.Size([10, 1]) torch.Size([10, 4, 1]) torch.Size([10, 4, 4, 1])
- __init__(input_dim: int, in_features: Optional[int], rank: int, device=None, dtype=None) None [source]¶
- Parameters:
input_dim (int) – dimension of network inputs
in_features (int or
None
) – number of input features, \(n_{in}\). For only ICNN quadratic layer, setin_features = None
rank (int) – number of columns of quadratic matrix, \(r\). In practice, \(r < n_{in}\)
- Variables:
v – weight vector for network inputs of size \((d,)\)
w – weight vector for input features of size \((n_{in},)\)
A – weight matrix for quadratic term of size \((d, r)\)
mu – additive scalar bias
nonneg – pointwise function to force \(l\) to have nonnegative weights. Default
torch.nn.functional.softplus
- forward(ux, do_gradient=False, do_Hessian=False, do_Laplacian=False, forward_mode=True, dudx=None, d2ud2x=None, v=None)[source]¶
Forward propagation through ICNN layer of the form, for one sample \(n_s = 1\),
\[\begin{split}f(x) = \left[\begin{array}{c}u(x) & x\end{array}\right] \left[\begin{array}{c}w^+ \\ v\end{array}\right] + \frac{1}{2} x A A^\top x^\top + \mu\end{split}\]Here, \(u(x)\) is the input into the layer of size \((n_s, n_{in})\) which is a function of the input of the network, \(x\) of size \((n_s, d)\). The output features, \(f(x)\), are of size \((n_s, 1)\). The notation \((\cdot)^+\) is a function that makes the weights of a matrix nonnegative.
As an example, for one sample, \(n_s = 1\), the gradient with respect to \(x\) is of the form
\[\begin{split}\nabla_x f = \left[\begin{array}{c}(w^+)^\top & v^\top\end{array}\right] \left[\begin{array}{c} \nabla_x u \\ I\end{array}\right] + x A A^\top\end{split}\]where \(I\) is the \(d \times d\) identity matrix.
- backward(do_Hessian=False, dgdf=None, d2gd2f=None, v=None)[source]¶
Backward propagation through quadratic ICNN layer of the form, for one sample \(n_s = 1\),
\[\begin{split}f\left(\begin{bmatrix} u & x \end{bmatrix}\right) =\left[\begin{array}{c}u & x\end{array}\right] \left[\begin{array}{c}w^+ \\ v\end{array}\right] + \frac{1}{2} x A A^\top x^\top + \mu\end{split}\]Here, the network is \(g\) is a function of \(f(u)\).
The gradient of the layer with respect to \(\begin{bmatrix} u & x \end{bmatrix}\) is of the form
\[\nabla_{[u,x]} f = \begin{bmatrix}(w^+)^\top & v^\top + x A A^\top\end{bmatrix}.\]
Quadratic Layer¶
- class quadraticLayer(*args: Any, **kwargs: Any)[source]¶
Bases:
hessQuikLayer
Evaluate and compute derivatives of a ICNN quadratic layer.
Examples:
>>> import hessQuik.layers as lay >>> f = lay.quadraticLayer(4, 2) >>> x = torch.randn(10, 4) >>> fx, dfdx, d2fd2x = f(x, do_gradient=True, do_Hessian=True) >>> print(fx.shape, dfdx.shape, d2fd2x.shape) torch.Size([10, 1]) torch.Size([10, 4, 1]) torch.Size([10, 4, 4, 1])
- __init__(in_features: int, rank: int, device=None, dtype=None) None [source]¶
- Parameters:
in_features (int) – number of input features, \(n_{in}\)
rank (int) – number of columns of quadratic matrix, \(r\). In practice, \(r < n_{in}\)
- Variables:
v – weight vector for network inputs of size \((d,)\)
A – weight matrix for quadratic term of size \((d, r)\)
mu – additive scalar bias
- forward(u, do_gradient=False, do_Hessian=False, do_Laplacian=False, forward_mode=True, dudx=None, d2ud2x=None, v=None)[source]¶
Forward propagation through quadratic layer of the form, for one sample \(n_s = 1\),
\[f(x) = u(x) v + \frac{1}{2} u(x) A A^\top u(x)^\top + \mu\]Here, \(u(x)\) is the input into the layer of size \((n_s, n_{in})\) which is a function of the input of the network, \(x\). The output features, \(f(x)\), are of size \((n_s, 1)\).
The gradient with respect to \(x\) is of the form
\[\nabla_x f = (v^\top + u A A^\top) \nabla_x u\]
- backward(do_Hessian=False, dgdf=None, d2gd2f=None, v=None)[source]¶
Backward propagation through quadratic ICNN layer of the form, for one sample \(n_s = 1\),
\[f(u) = u v + \frac{1}{2} u A A^\top u^\top + \mu\]Here, the network is \(g\) is a function of \(f(u)\).
The gradient of the layer with respect to \(u\) is of the form
\[\nabla_u f = v^\top + u A A^\top.\]
Residual Layer¶
- class resnetLayer(*args: Any, **kwargs: Any)[source]¶
Bases:
hessQuikLayer
Evaluate and compute derivatives of a residual layer.
Examples:
>>> import hessQuik.layers as lay >>> f = lay.resnetLayer(4, h=0.25) >>> x = torch.randn(10, 4) >>> fx, dfdx, d2fd2x = f(x, do_gradient=True, do_Hessian=True) >>> print(fx.shape, dfdx.shape, d2fd2x.shape) torch.Size([10, 4]) torch.Size([10, 4, 4]) torch.Size([10, 4, 4, 4])
- __init__(width: int, h: float = 1.0, act: hessQuikActivationFunction = torch.nn.Module, bias: bool = True, device=None, dtype=None) None [source]¶
- Parameters:
width (int) – number of input and output features, \(w\)
h (float) – step size, \(h > 0\)
act (hessQuikActivationFunction) – activation function
bias (bool) – additive bias flag
- Variables:
layer – singleLayer with \(w\) input features and \(w\) output features
- forward(u, do_gradient=False, do_Hessian=False, do_Laplacian=False, forward_mode=True, dudx=None, d2ud2x=None, v=None)[source]¶
Forward propagation through resnet layer of the form
\[f(x) = u(x) + h \cdot singleLayer(u(x))\]Here, \(u(x)\) is the input into the layer of size \((n_s, w)\) which is a function of the input of the network, \(x\). The output features, \(f(x)\), are of size \((n_s, w)\).
As an example, for one sample, \(n_s = 1\), the gradient with respect to \(x\) is of the form
\[\nabla_x f = I + h \nabla_x singleLayer(u(x))\]where \(I\) denotes the \(w \times w\) identity matrix.
- backward(do_Hessian=False, dgdf=None, d2gd2f=None, v=None)[source]¶
Backward propagation through single layer of the form
\[f(u) = u + h \cdot singleLayer(u)\]Here, the network is \(g\) is a function of \(f(u)\).
As an example, for one sample, \(n_s = 1\), the gradient of the network with respect to \(u\) is of the form
\[\nabla_u g = \nabla_f g + h \cdot \nabla_u singleLayer(u)\]where \(\odot\) denotes the pointwise product.
Single Layer¶
- class singleLayer(*args: Any, **kwargs: Any)[source]¶
Bases:
hessQuikLayer
Evaluate and compute derivatives of a single layer.
Examples:
>>> import hessQuik.layers as lay >>> f = lay.singleLayer(4, 7) >>> x = torch.randn(10, 4) >>> fx, dfdx, d2fd2x = f(x, do_gradient=True, do_Hessian=True) >>> print(fx.shape, dfdx.shape, d2fd2x.shape) torch.Size([10, 7]) torch.Size([10, 4, 7]) torch.Size([10, 4, 4, 7])
- __init__(in_features: int, out_features: int, act: hessQuikActivationFunction = torch.nn.Module, bias: bool = True, device=None, dtype=None) None [source]¶
- Parameters:
in_features (int) – number of input features, \(n_{in}\)
out_features (int) – number of output features, \(n_{out}\)
act (hessQuikActivationFunction) – activation function
bias (bool) – additive bias flag
- Variables:
K – weight matrix of size \((n_{in}, n_{out})\)
b – bias vector of size \((n_{out},)\)
- forward(u, do_gradient=False, do_Hessian=False, do_Laplacian=False, forward_mode=True, dudx=None, d2ud2x=None, v=None)[source]¶
Forward propagation through single layer of the form
\[f(x) = \sigma(u(x) K + b)\]Here, \(u(x)\) is the input into the layer of size \((n_s, n_{in})\) which is a function of the input of the network, \(x\). The output features, \(f(x)\), are of size \((n_s, n_{out})\).
As an example, for one sample, \(n_s = 1\), the gradient with respect to \(x\) is of the form
\[\nabla_x f = \text{diag}(\sigma'(u(x) K + b))K^\top \nabla_x u\]where \(\text{diag}\) transforms a vector into the entries of a diagonal matrix.
- backward(do_Hessian=False, dgdf=None, d2gd2f=None, v=None)[source]¶
Backward propagation through single layer of the form
\[f(u) = \sigma(u K + b)\]Here, the network is \(g\) is a function of \(f(u)\).
As an example, for one sample, \(n_s = 1\), the gradient of the network with respect to \(u\) is of the form
\[\nabla_u g = (\sigma'(u K + b) \odot \nabla_f g)K^\top\]where \(\odot\) denotes the pointwise product.