hessQuik.layers¶

hessQuik Layer¶

class hessQuikLayer(*args: Any, **kwargs: Any)[source]¶

Bases: Module

Base class for all hessQuik layers.

dim_input() → int[source]¶

Returns:: dimension of input features
Return type:: int

dim_output() → int[source]¶

Returns:: dimension of output features
Return type:: int

forward(u: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False, do_Laplacian: bool = False, forward_mode: bool = True, dudx: Optional[torch.Tensor] = None, d2ud2x: Optional[torch.Tensor] = None, v: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]][source]¶

Forward pass through the layer that maps input features \(u\) of size \((n_s, n_{in})\) to output features \(f\) of size \((n_s, n_{out})\) where \(n_s\) is the number of samples, \(n_{in}\) is the number of input features, and \(n_{out}\) is the number of output features.

The input features \(u(x)\) is a function of the network input \(x\) of size \((n_s, d)\) where \(d\) is the dimension of the network input.

Parameters:

u (torch.Tensor) – features from previous layer with shape \((n_s, n_{in})\)
do_gradient (bool, optional) – If set to True, the gradient will be computed during the forward call. Default: False
do_Hessian (bool, optional) – If set to True, the Hessian will be computed during the forward call. Default: False
do_Laplacian (bool, optional) – If set to True, the Laplacian will be computed during the forward call. Default: False
forward_mode (bool, optional) – If set to False, the derivatives will be computed in backward mode. Default: True
dudx (torch.Tensor or None) – if forward_mode = True, gradient of features from previous layer with respect to network input \(x\) with shape \((n_s, d, n_{in})\)
d2ud2x (torch.Tensor or None) – if forward_mode = True, Hessian of features from previous layer with respect to network input \(x\) with shape \((n_s, d, d, n_{in})\)
v (torch.Tensor or None) – if forward_mode = True, direction(s) to apply Jacobian and Hessian \(x\) with shape \((d, k)\)

Returns:

f (torch.Tensor) - output features of layer with shape \((n_s, n_{out})\)
dfdx (torch.Tensor or None) - if forward_mode = True, gradient of output features with respect to network input \(x\) with shape \((n_s, d, n_{out})\)
d2fd2x (torch.Tensor or None) - if forward_mode = True, Hessian of output features with respect to network input \(x\) with shape \((n_s, d, d, n_{out})\)

backward(do_Hessian: bool = False, dgdf: Optional[torch.Tensor] = None, d2gd2f: Optional[torch.Tensor] = None, v: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶

Backward pass through the layer that maps the gradient of the network \(g\) with respect to the output features \(f\) of size \((n_s, n_{out}, m)\) to the gradient of the network \(g\) with respect to the input features \(u\) of size \((n_s, n_{in}, m)\) where \(n_s\) is the number of samples, \(n_{in}\) is the number of input features,, \(n_{out}\) is the number of output features, and \(m\) is the number of network output features.

Parameters:

do_Hessian (bool, optional) – If set to True, the Hessian will be computed during the forward call. Default: False
dgdf (torch.Tensor) – gradient of the subsequent layer features, \(g(f)\), with respect to the layer outputs, \(f\) with shape \((n_s, n_{out}, m)\).
d2gd2f (torch.Tensor or None) – gradient of the subsequent layer features, \(g(f)\), with respect to the layer outputs, \(f\) with shape \((n_s, n_{out}, n_{out}, m)\).
v (torch.Tensor or None) – direction(s) to apply Jacobian transpose and Hessian \(x\) with shape \((d, k)\)

Returns:

dgdu (torch.Tensor or None) - gradient of the network with respect to input features \(u\) with shape \((n_s, n_{in}, m)\)
d2gd2u (torch.Tensor or None) - Hessian of the network with respect to input features \(u\) with shape \((n_s, n_{in}, n_{in}, m)\)

ICNN Layer¶

class ICNNLayer(*args: Any, **kwargs: Any)[source]¶

Bases: hessQuikLayer

Evaluate and compute derivatives of a single layer.

Examples:

>>> import torch, hessQuik.layers as lay
>>> f = lay.ICNNLayer(4, None, 7)
>>> x = torch.randn(10, 4)
>>> fx, dfdx, d2fd2x = f(x, do_gradient=True, do_Hessian=True)
>>> print(fx.shape, dfdx.shape, d2fd2x.shape)
torch.Size([10, 11]) torch.Size([10, 4, 11]) torch.Size([10, 4, 4, 11])

__init__(input_dim: int, in_features: Optional[int], out_features: int, act: hessQuikActivationFunction = torch.nn.Module, bias: bool = True, device=None, dtype=None) → None[source]¶

Parameters:

input_dim (int) – dimension of network inputs
in_features (int or``None``) – number of input features. For first ICNN layer, set in_features = None
out_features (int) – number of output features
act (hessQuikActivationFunction) – activation function

Variables:

K – weight matrix for the network inputs of size \((d, n_{out})\)
b – bias vector of size \((n_{out},)\)
L – weight matrix for the input features of size \((n_{in}, n_{out})\)
nonneg – pointwise function to force \(l\) to have nonnegative weights. Default: torch.nn.functional.softplus

dim_input() → int[source]¶: number of input features + dimension of network inputs

dim_output() → int[source]¶: number of output features + dimension of network inputs

forward(ux, do_gradient=False, do_Hessian=False, do_Laplacian=False, forward_mode=True, dudx=None, d2ud2x=None, v=None)[source]¶

Forward propagation through ICNN layer of the form

\[\begin{split}f(x) = \left[\begin{array}{c} \sigma\left(\left[\begin{array}{c}u(x) & x\end{array}\right] \left[\begin{array}{c}L^+ \\ K\end{array}\right] + b\right) & x \end{array}\right]\end{split}\]

Here, \(u(x)\) is the input into the layer of size \((n_s, n_{in})\) which is a function of the input of the network, \(x\) of size \((n_s, d)\). The output features, \(f(x)\), are of size \((n_s, n_{out} + d)\). The notation \((\cdot)^+\) is a function that makes the weights of a matrix nonnegative.

As an example, for one sample, \(n_s = 1\), the gradient with respect to \(\begin{bmatrix} u & x \end{bmatrix}\) is of the form

\[\begin{split}\nabla_x f = \text{diag}\left(\sigma'\left(\left[\begin{array}{c}u(x) & x\end{array}\right] \left[\begin{array}{c}L^+ \\ K\end{array}\right] + b\right)\right) \left[\begin{array}{c}(L^+)^\top & K^\top\end{array}\right] \left[\begin{array}{c}\nabla_x u \\ I\end{array}\right]\end{split}\]

where \(\text{diag}\) transforms a vector into the entries of a diagonal matrix and \(I\) is the \(d \times d\) identity matrix.

backward(do_Hessian=False, dgdf=None, d2gd2f=None, v=None)[source]¶

Backward propagation through ICNN layer of the form

\[\begin{split}f(u) = \left[\begin{array}{c} \sigma\left(\left[\begin{array}{c}u & x\end{array}\right] \left[\begin{array}{c}L^+ \\ K\end{array}\right] + b\right) & x \end{array}\right]\end{split}\]

Here, the network is \(g\) is a function of \(f(u)\).

As an example, for one sample, \(n_s = 1\), the gradient of the network with respect to \(u\) is of the form

\[\begin{split}\nabla_{[u,x]} g = \left(\sigma'\left(\left[\begin{array}{c}u & x\end{array}\right] \left[\begin{array}{c}L^+ \\ K\end{array}\right] + b\right) \odot \nabla_{[f, x]} g\right) \left[\begin{array}{c}(L^+)^\top & K^\top\end{array}\right]\end{split}\]

where \(\odot\) denotes the pointwise product.

Quadratic ICNN Layer¶

class quadraticICNNLayer(*args: Any, **kwargs: Any)[source]¶

Bases: hessQuikLayer

Evaluate and compute derivatives of a ICNN quadratic layer.

Examples:

>>> import hessQuik.layers as lay
>>> f = lay.quadraticICNNLayer(4, None, 2)
>>> x = torch.randn(10, 4)
>>> fx, dfdx, d2fd2x = f(x, do_gradient=True, do_Hessian=True)
>>> print(fx.shape, dfdx.shape, d2fd2x.shape)
torch.Size([10, 1]) torch.Size([10, 4, 1]) torch.Size([10, 4, 4, 1])

__init__(input_dim: int, in_features: Optional[int], rank: int, device=None, dtype=None) → None[source]¶

Parameters:

input_dim (int) – dimension of network inputs
in_features (int or None) – number of input features, \(n_{in}\). For only ICNN quadratic layer, set in_features = None
rank (int) – number of columns of quadratic matrix, \(r\). In practice, \(r < n_{in}\)

Variables:

v – weight vector for network inputs of size \((d,)\)
w – weight vector for input features of size \((n_{in},)\)
A – weight matrix for quadratic term of size \((d, r)\)
mu – additive scalar bias
nonneg – pointwise function to force \(l\) to have nonnegative weights. Default torch.nn.functional.softplus

dim_input() → int[source]¶: number of input features + dimension of network inputs

dim_output() → int[source]¶: scalar

forward(ux, do_gradient=False, do_Hessian=False, do_Laplacian=False, forward_mode=True, dudx=None, d2ud2x=None, v=None)[source]¶

Forward propagation through ICNN layer of the form, for one sample \(n_s = 1\),

\[\begin{split}f(x) = \left[\begin{array}{c}u(x) & x\end{array}\right] \left[\begin{array}{c}w^+ \\ v\end{array}\right] + \frac{1}{2} x A A^\top x^\top + \mu\end{split}\]

Here, \(u(x)\) is the input into the layer of size \((n_s, n_{in})\) which is a function of the input of the network, \(x\) of size \((n_s, d)\). The output features, \(f(x)\), are of size \((n_s, 1)\). The notation \((\cdot)^+\) is a function that makes the weights of a matrix nonnegative.

As an example, for one sample, \(n_s = 1\), the gradient with respect to \(x\) is of the form

\[\begin{split}\nabla_x f = \left[\begin{array}{c}(w^+)^\top & v^\top\end{array}\right] \left[\begin{array}{c} \nabla_x u \\ I\end{array}\right] + x A A^\top\end{split}\]

where \(I\) is the \(d \times d\) identity matrix.

backward(do_Hessian=False, dgdf=None, d2gd2f=None, v=None)[source]¶

Backward propagation through quadratic ICNN layer of the form, for one sample \(n_s = 1\),

\[\begin{split}f\left(\begin{bmatrix} u & x \end{bmatrix}\right) =\left[\begin{array}{c}u & x\end{array}\right] \left[\begin{array}{c}w^+ \\ v\end{array}\right] + \frac{1}{2} x A A^\top x^\top + \mu\end{split}\]

Here, the network is \(g\) is a function of \(f(u)\).

The gradient of the layer with respect to \(\begin{bmatrix} u & x \end{bmatrix}\) is of the form

\[\nabla_{[u,x]} f = \begin{bmatrix}(w^+)^\top & v^\top + x A A^\top\end{bmatrix}.\]

Quadratic Layer¶

class quadraticLayer(*args: Any, **kwargs: Any)[source]¶

Bases: hessQuikLayer

Evaluate and compute derivatives of a ICNN quadratic layer.

Examples:

>>> import hessQuik.layers as lay
>>> f = lay.quadraticLayer(4, 2)
>>> x = torch.randn(10, 4)
>>> fx, dfdx, d2fd2x = f(x, do_gradient=True, do_Hessian=True)
>>> print(fx.shape, dfdx.shape, d2fd2x.shape)
torch.Size([10, 1]) torch.Size([10, 4, 1]) torch.Size([10, 4, 4, 1])

__init__(in_features: int, rank: int, device=None, dtype=None) → None[source]¶

Parameters:

in_features (int) – number of input features, \(n_{in}\)
rank (int) – number of columns of quadratic matrix, \(r\). In practice, \(r < n_{in}\)

Variables:

v – weight vector for network inputs of size \((d,)\)
A – weight matrix for quadratic term of size \((d, r)\)
mu – additive scalar bias

dim_input() → int[source]¶: number of input features

dim_output() → int[source]¶: scalar

forward(u, do_gradient=False, do_Hessian=False, do_Laplacian=False, forward_mode=True, dudx=None, d2ud2x=None, v=None)[source]¶

Forward propagation through quadratic layer of the form, for one sample \(n_s = 1\),

\[f(x) = u(x) v + \frac{1}{2} u(x) A A^\top u(x)^\top + \mu\]

Here, \(u(x)\) is the input into the layer of size \((n_s, n_{in})\) which is a function of the input of the network, \(x\). The output features, \(f(x)\), are of size \((n_s, 1)\).

The gradient with respect to \(x\) is of the form

\[\nabla_x f = (v^\top + u A A^\top) \nabla_x u\]

backward(do_Hessian=False, dgdf=None, d2gd2f=None, v=None)[source]¶

Backward propagation through quadratic ICNN layer of the form, for one sample \(n_s = 1\),

\[f(u) = u v + \frac{1}{2} u A A^\top u^\top + \mu\]

Here, the network is \(g\) is a function of \(f(u)\).

The gradient of the layer with respect to \(u\) is of the form

\[\nabla_u f = v^\top + u A A^\top.\]

Residual Layer¶

class resnetLayer(*args: Any, **kwargs: Any)[source]¶

Bases: hessQuikLayer

Evaluate and compute derivatives of a residual layer.

Examples:

>>> import hessQuik.layers as lay
>>> f = lay.resnetLayer(4, h=0.25)
>>> x = torch.randn(10, 4)
>>> fx, dfdx, d2fd2x = f(x, do_gradient=True, do_Hessian=True)
>>> print(fx.shape, dfdx.shape, d2fd2x.shape)
torch.Size([10, 4]) torch.Size([10, 4, 4]) torch.Size([10, 4, 4, 4])

__init__(width: int, h: float = 1.0, act: hessQuikActivationFunction = torch.nn.Module, bias: bool = True, device=None, dtype=None) → None[source]¶

Parameters:

width (int) – number of input and output features, \(w\)
h (float) – step size, \(h > 0\)
act (hessQuikActivationFunction) – activation function
bias (bool) – additive bias flag

Variables:

layer – singleLayer with \(w\) input features and \(w\) output features

dim_input() → int[source]¶: width

dim_output() → int[source]¶: width

forward(u, do_gradient=False, do_Hessian=False, do_Laplacian=False, forward_mode=True, dudx=None, d2ud2x=None, v=None)[source]¶

Forward propagation through resnet layer of the form

\[f(x) = u(x) + h \cdot singleLayer(u(x))\]

Here, \(u(x)\) is the input into the layer of size \((n_s, w)\) which is a function of the input of the network, \(x\). The output features, \(f(x)\), are of size \((n_s, w)\).

As an example, for one sample, \(n_s = 1\), the gradient with respect to \(x\) is of the form

\[\nabla_x f = I + h \nabla_x singleLayer(u(x))\]

where \(I\) denotes the \(w \times w\) identity matrix.

backward(do_Hessian=False, dgdf=None, d2gd2f=None, v=None)[source]¶

Backward propagation through single layer of the form

\[f(u) = u + h \cdot singleLayer(u)\]

Here, the network is \(g\) is a function of \(f(u)\).

As an example, for one sample, \(n_s = 1\), the gradient of the network with respect to \(u\) is of the form

\[\nabla_u g = \nabla_f g + h \cdot \nabla_u singleLayer(u)\]

where \(\odot\) denotes the pointwise product.

Single Layer¶

class singleLayer(*args: Any, **kwargs: Any)[source]¶

Bases: hessQuikLayer

Evaluate and compute derivatives of a single layer.

Examples:

>>> import hessQuik.layers as lay
>>> f = lay.singleLayer(4, 7)
>>> x = torch.randn(10, 4)
>>> fx, dfdx, d2fd2x = f(x, do_gradient=True, do_Hessian=True)
>>> print(fx.shape, dfdx.shape, d2fd2x.shape)
torch.Size([10, 7]) torch.Size([10, 4, 7]) torch.Size([10, 4, 4, 7])

__init__(in_features: int, out_features: int, act: hessQuikActivationFunction = torch.nn.Module, bias: bool = True, device=None, dtype=None) → None[source]¶

Parameters:

in_features (int) – number of input features, \(n_{in}\)
out_features (int) – number of output features, \(n_{out}\)
act (hessQuikActivationFunction) – activation function
bias (bool) – additive bias flag

Variables:

K – weight matrix of size \((n_{in}, n_{out})\)
b – bias vector of size \((n_{out},)\)

dim_input() → int[source]¶: number of input features

dim_output()[source]¶: number of output features

forward(u, do_gradient=False, do_Hessian=False, do_Laplacian=False, forward_mode=True, dudx=None, d2ud2x=None, v=None)[source]¶

Forward propagation through single layer of the form

\[f(x) = \sigma(u(x) K + b)\]

Here, \(u(x)\) is the input into the layer of size \((n_s, n_{in})\) which is a function of the input of the network, \(x\). The output features, \(f(x)\), are of size \((n_s, n_{out})\).

As an example, for one sample, \(n_s = 1\), the gradient with respect to \(x\) is of the form

\[\nabla_x f = \text{diag}(\sigma'(u(x) K + b))K^\top \nabla_x u\]

where \(\text{diag}\) transforms a vector into the entries of a diagonal matrix.

backward(do_Hessian=False, dgdf=None, d2gd2f=None, v=None)[source]¶

Backward propagation through single layer of the form

\[f(u) = \sigma(u K + b)\]

Here, the network is \(g\) is a function of \(f(u)\).

As an example, for one sample, \(n_s = 1\), the gradient of the network with respect to \(u\) is of the form

\[\nabla_u g = (\sigma'(u K + b) \odot \nabla_f g)K^\top\]

where \(\odot\) denotes the pointwise product.