hessQuik.utils¶

Peaks Data¶

peaks(y: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False) → Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]][source]¶

Generate data from the MATLAB 2D peaks function

Examples:

import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits import mplot3d
x, y = torch.linspace(-3, 3, 100), torch.linspace(-3, 3, 100)
grid_x, grid_y = torch.meshgrid(x, y)
grid_xy = torch.concat((grid_x.reshape(-1, 1), grid_y.reshape(-1, 1)), dim=1)
grid_z, *_ = peaks(grid_xy)
fig = plt.figure()
ax = plt.axes(projection='3d')
surf = ax.plot_surface(grid_x, grid_y, grid_z.reshape(grid_x.shape), cmap=cm.viridis)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()

Parameters:

y (torch.Tensor) – (x, y) coordinates with shape \((n_s, 2)\) where \(n_s\) is the number of samples
do_gradient (bool, optional) – If set to True, the gradient will be computed during the forward call. Default: False
do_Hessian (bool, optional) – If set to True, the Hessian will be computed during the forward call. Default: False

Returns:

f (torch.Tensor) - value of peaks function at each coordinate with shape \((n_s, 1)\)
dfdx (torch.Tensor or None) - value of gradient at each coordinate with shape \((n_s, 2)\)
d2fd2x (torch.Tensor or None) - value of Hessian at each coordinate with shape \((n_s, 4)\)

Input Derivative Checks¶

input_derivative_check(f: Union[torch.nn.Module, Callable], x: torch.Tensor, do_Hessian: bool = False, forward_mode: bool = True, num_test: int = 15, base: float = 2.0, tol: float = 0.1, verbose: float = False) → Tuple[Optional[bool], Optional[bool]][source]¶

Taylor approximation test to verify derivatives. Form the approximation by perturbing the input \(x\) in the direction \(p\) with step size \(h > 0\) via

\[f(x + h p) \approx f(x) + h\nabla f(x)^\top p + \frac{1}{2}p^\top \nabla^2f(x) p\]

As \(h \downarrow 0^+\), the error between the approximation and the true value will decrease. The rate of decrease indicates the accuracy of the derivative computation. For details, see Chapter 5 of Computational Methods for Electromagnetics by Eldad Haber.

Examples:

>>> from hessQuik.layers import singleLayer
>>> torch.set_default_dtype(torch.float64)  # use double precision to check implementations
>>> x = torch.randn(10, 4)
>>> f = singleLayer(4, 7, act=act.softplusActivation())
>>> input_derivative_check(f, x, do_Hessian=True, verbose=True, forward_mode=True)
    h                           E0                      E1                      E2
    1.00 x 2^(00)               1.62 x 2^(-02)          1.70 x 2^(-07)          1.02 x 2^(-12)
    1.00 x 2^(-01)              1.63 x 2^(-03)          1.70 x 2^(-09)          1.06 x 2^(-15)
    1.00 x 2^(-02)              1.63 x 2^(-04)          1.69 x 2^(-11)          1.08 x 2^(-18)
    1.00 x 2^(-03)              1.63 x 2^(-05)          1.69 x 2^(-13)          1.09 x 2^(-21)
    1.00 x 2^(-04)              1.63 x 2^(-06)          1.69 x 2^(-15)          1.09 x 2^(-24)
    1.00 x 2^(-05)              1.63 x 2^(-07)          1.69 x 2^(-17)          1.10 x 2^(-27)
    1.00 x 2^(-06)              1.63 x 2^(-08)          1.69 x 2^(-19)          1.10 x 2^(-30)
    1.00 x 2^(-07)              1.63 x 2^(-09)          1.69 x 2^(-21)          1.10 x 2^(-33)
    1.00 x 2^(-08)              1.63 x 2^(-10)          1.69 x 2^(-23)          1.10 x 2^(-36)
    1.00 x 2^(-09)              1.63 x 2^(-11)          1.69 x 2^(-25)          1.10 x 2^(-39)
    1.00 x 2^(-10)              1.63 x 2^(-12)          1.69 x 2^(-27)          1.10 x 2^(-42)
    1.00 x 2^(-11)              1.63 x 2^(-13)          1.69 x 2^(-29)          1.10 x 2^(-45)
    1.00 x 2^(-12)              1.63 x 2^(-14)          1.69 x 2^(-31)          1.15 x 2^(-48)
    1.00 x 2^(-13)              1.63 x 2^(-15)          1.69 x 2^(-33)          1.33 x 2^(-50)
    1.00 x 2^(-14)              1.63 x 2^(-16)          1.69 x 2^(-35)          1.70 x 2^(-51)
    Gradient PASSED!
    Hessian PASSED!

Parameters:

f (torch.nn.Module or Callable) – callable function that returns value, gradient, and Hessian
x (torch.Tensor) – input data
do_Hessian (bool, optional) – If set to True, the Hessian will be computed during the forward call. Default: False
forward_mode (bool, optional) – If set to False, the derivatives will be computed in backward mode. Default: True
num_test (int) – number of perturbations
base (float) – step size \(h = base^k\)
tol (float) – small tolerance to account for numerical errors when computing the order of approximation
verbose (bool) – printout flag

Returns:

grad_check (bool) - if True, gradient check passes
hess_check (bool, optional) - if True, Hessian check passes

input_derivative_check_finite_difference(f: Callable, x: torch.Tensor, do_Hessian: bool = False, forward_mode: bool = True, eps: float = 0.0001, atol: float = 1e-05, rtol: float = 0.001, verbose: bool = False) → Tuple[Optional[bool], Optional[bool]][source]¶

Finite difference test to verify derivatives. Form the approximation by perturbing each entry in the input in the unit direction with step size \(\varepsilon > 0\):

\[\widetilde{\nabla f}_{i} = \frac{f(x_i + \varepsilon) - f(x_i - \varepsilon)}{2\varepsilon}\]

where \(x_i \pm \varepsilon\) means add or subtract \(\varepsilon\) from the i-th entry of the input \(x\), but leave the other entries unchanged. The notation \(\widetilde{(\cdot)}\) indicates the finite difference approximation.

Examples:

>>> from hessQuik.layers import singleLayer
>>> torch.set_default_dtype(torch.float64)  # use double precision to check implementations
>>> x = torch.randn(10, 4)
>>> f = singleLayer(4, 7, act=act.tanhActivation())
>>> input_derivative_check_finite_difference(f, x, do_Hessian=True, verbose=True, forward_mode=True)
    Gradient Finite Difference: Error = 8.1720e-10, Relative Error = 2.5602e-10
    Gradient PASSED!
    Hessian Finite Difference: Error = 4.5324e-08, Relative Error = 4.4598e-08
    Hessian PASSED!

Parameters:

f (Callable) – callable function that returns value, gradient, and Hessian
x (torch.Tensor) – input data
do_Hessian (bool, optional) – If set to True, the Hessian will be computed during the forward call. Default: False
forward_mode (bool, optional) – If set to False, the derivatives will be computed in backward mode. Default: True
eps (float) – step size. Default: 1e-4
atol (float) – absolute tolerance, e.g., \(\|\nabla f - \widetilde{\nabla f}\| < atol\). Default: 1e-5
rtol (float) – relative tolerance, e.g., \(\|\nabla f - \widetilde{\nabla f}\|/\|\nabla f\| < rtol\). Default: 1e-3
verbose (bool) – printout flag

Returns:

grad_check (bool) - if True, gradient check passes
hess_check (bool, optional) - if True, Hessian check passes

Network Weights Derivative Check¶

network_derivative_check(f: torch.nn.Module, x: torch.Tensor, do_Hessian: bool = False, forward_mode: bool = True, num_test: int = 15, base: float = 2.0, tol: float = 0.1, verbose: bool = False) → Optional[bool][source]¶

Taylor approximation test to verify derivatives. Form the approximation by perturbing the network weights \(\theta\) in the direction \(p\) with step size \(h > 0\) via

\[\Phi(\theta + h p) \approx \Phi(\theta) + h\nabla_{\theta} \Phi(\theta)^\top p\]

where \(\Phi\) is the objective function and \(\theta\) are the network weights. This test uses the loss

\[\Phi(\theta) = \frac{1}{2}\|f_{\theta}(x)\|^2 + \frac{1}{2}\|\nabla f_{\theta}(x)\|^2 + \frac{1}{2}\|\nabla^2 f_{\theta}(x)\|^2\]

to validate network gradient computation after computing derivatives of the input features of the network \(f_{\theta}\).

As \(h \downarrow 0^+\), the error between the approximation and the true value will decrease. The rate of decrease indicates the accuracy of the derivative computation. For details, see Chapter 5 of Computational Methods for Electromagnetics by Eldad Haber.

Examples:

>>> import hessQuik.activations as act, hessQuik.layers as lay, hessQuik.networks as net
>>> torch.set_default_dtype(torch.float64)  # use double precision to check implementations
>>> x = torch.randn(10, 4)
>>> width, depth = 8, 3
>>> f = net.NN(lay.singleLayer(4, width, act=act.tanhActivation()), net.resnetNN(width, depth, h=1.0, act=act.tanhActivation()), lay.singleLayer(width, 1, act=act.identityActivation()))
>>> network_derivative_check(f, x, do_Hessian=True, verbose=True, forward_mode=True)
    h                           E0                      E1
    1.00 x 2^(00)               1.97 x 2^(02)           1.05 x 2^(01)
    1.00 x 2^(-01)              1.14 x 2^(02)           1.71 x 2^(-02)
    1.00 x 2^(-02)              1.20 x 2^(01)           1.50 x 2^(-04)
    1.00 x 2^(-03)              1.22 x 2^(00)           1.40 x 2^(-06)
    1.00 x 2^(-04)              1.24 x 2^(-01)          1.35 x 2^(-08)
    1.00 x 2^(-05)              1.24 x 2^(-02)          1.32 x 2^(-10)
    1.00 x 2^(-06)              1.24 x 2^(-03)          1.31 x 2^(-12)
    1.00 x 2^(-07)              1.24 x 2^(-04)          1.30 x 2^(-14)
    1.00 x 2^(-08)              1.25 x 2^(-05)          1.30 x 2^(-16)
    1.00 x 2^(-09)              1.25 x 2^(-06)          1.30 x 2^(-18)
    1.00 x 2^(-10)              1.25 x 2^(-07)          1.30 x 2^(-20)
    1.00 x 2^(-11)              1.25 x 2^(-08)          1.30 x 2^(-22)
    1.00 x 2^(-12)              1.25 x 2^(-09)          1.30 x 2^(-24)
    1.00 x 2^(-13)              1.25 x 2^(-10)          1.30 x 2^(-26)
    1.00 x 2^(-14)              1.25 x 2^(-11)          1.30 x 2^(-28)
    Gradient PASSED!

Parameters:

f (torch.nn.Module) – callable function that returns value, gradient, and Hessian
x (torch.Tensor) – input data
do_Hessian (bool, optional) – If set to True, the Hessian will be computed during the forward call. Default: False
forward_mode (bool, optional) – If set to False, the derivatives will be computed in backward mode. Default: True
num_test (int) – number of perturbations
base (float) – step size \(h = base^k\)
tol (float) – small tolerance to account for numerical errors when computing the order of approximation
verbose (bool) – printout flag

Returns:

grad_check (bool) - if True, gradient check passes

Timing Functions¶

setup_device_and_gradient(f: NN, network_wrapper: str = 'hessQuik', device: str = 'cpu') → torch.nn.Module[source]¶: Setup network with correct wrapper and device

setup_resnet(in_features: int, out_features: int, width: int = 16, depth: int = 4) → NN[source]¶: Setup resnet architecture for timing tests

setup_fully_connected(in_features: int, out_features: int, width: int = 16, depth: int = 4) → NN[source]¶: Setup fully-connected architecture for timing tests

setup_icnn(in_features: int, out_features: int, width: int = 16, depth: int = 4) → NN[source]¶

Setup ICNN architecture for timing tests.

Requires scalar output.

setup_network(in_features: int, out_features: int, width: int, depth: int, network_type: str = 'resnet', network_wrapper: str = 'hessQuik', device: str = 'cpu')[source]¶: Wrapper to setup network.

timing_test_cpu(f: Union[NN, torch.nn.Module], x: torch.Tensor, num_trials: int = 10, clear_memory: bool = True) → torch.Tensor[source]¶

Timing test for one architecture on CPU.

Test is run num_trials times and the timing for each trial is returned.

The timing includes one dry run for the first iteration that is not returned.

Memory is cleared after the completion of all trials.

timing_test_gpu(f: Union[NN, torch.nn.Module], x: torch.Tensor, num_trials: int = 10, clear_memory: bool = True)[source]¶

Timing test for one architecture on CPU.

Test is run num_trials times and the timing for each trial is returned.

Each trial includes a torch.cuda.synchonize call.

The timing includes one dry run for the first iteration that is not returned.

Memory is cleared after the completion of all trials.

timing_test(in_feature_range: torch.Tensor, out_feature_range: torch.Tensor, nex: int = 10, num_trials: int = 10, width: int = 16, depth: int = 4, network_wrapper: str = 'hessQuik', network_type: str = 'resnet', device: str = 'cpu', clear_memory: bool = True) → dict[source]¶

Parameters:

in_feature_range (torch.Tensor) – available input feature dimensions
out_feature_range (torch.Tensor) – available output feature dimensions
nex (int, optional) – number of examples for network input. Default: 10
num_trials (int, optional) – number of trials per input-output feature combination. Default: 10
width (int, optional) – width of network. Default: 16
depth (int, optional) – depth of network. Default: 4
network_wrapper (str, optional) – type of network wrapper. Default: ‘hessQuik’. Options: ‘hessQuik’, ‘PytorchAD’, ‘PytorchHessian’
network_type (str, optional) – network architecture. Default: ‘resnet’. Options: ‘resnet’, ‘fully_connected’, ‘icnn’
device (str, optional) – device for testing. Default: ‘cpu’
clear_memory (bool, optional) – flag to clear memory after each set of trials

Returns:

dictionary containing keys

’timing_trials’ (torch.Tensor) - time (in seconds) for each trial and each architecture
’timing_trials_mean’ (torch.Tensor) - average over trials for each architecture
’timing_trials_std’ (torch.Tensor) - standard deviation over trials for each architecture
’in_feature_range’ (torch.Tensor) - available input feature dimensions
’out_feature_range’ (torch.Tensor) - available output feature dimensions
’nex’ (int) - number of samples for the input data
’num_trials’ (int) - number of trials per architecture

Training Functions¶

train_one_epoch(f: torch.nn.Module, x: torch.Tensor, y: torch.Tensor, optimizer: torch.optim.Optimizer, batch_size: int = 5, do_gradient: bool = False, do_Hessian: bool = False, loss_weights: Union[tuple, list] = (1.0, 1.0, 1.0))[source]¶

Training mean-square loss for one epoch where the loss function is

\[L(\theta) = \frac{1}{2N}\left(w_0\|f_{\theta}(x)\|^2 + w_1\|\nabla f_{\theta}(x)\|^2 + w_2\|\nabla^2 f_{\theta}(x)\|^2\right)\]

where \(f_{theta}\) is the network \(\theta\) are the network weights, and \(N\) is the number of training samples. The loss corresponding to the function value, gradient, and Hessian each can have different weights, \(w_0\), \(w_1\), and \(w_2\), respectively.

Parameters:

f (torch.nn.Module) – hessQuik neural network to train
x (torch.Tensor) – training data of shape \((N, *)\) where \(*\) can be any shapw
y (torch.Tensor) – target data of shape \((N, *)\)
optimizer (torch.optim.Optimizer) – method for updating the network weights
batch_size (int) – size of mini-batches for stochatistic training. Default: 5
do_gradient (bool, optional) – If set to True, the gradient will be computed during the forward call. Default: False
do_Hessian (bool, optional) – If set to True, the Hessian will be computed during the forward call. Default: False
loss_weights (tuple or list) – weight for each term in the loss function

Returns:

tuple containing the overall running loss and the running loss for each term in the loss function

test(f: torch.nn.Module, x: torch.Tensor, y: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False, loss_weights: Union[tuple, list] = (1.0, 1.0, 1.0))[source]¶

Evaluate mean-squared loss function without training

See hessQuik.utils.training.train_one_epoch() for details.

print_headers(do_gradient: bool = True, do_Hessian: bool = False, verbose: bool = False, loss_weights: Union[tuple, list] = (1.0, 1.0, 1.0))[source]¶: Print headers for nice training

Additional Utilities¶

module_getattr(obj: torch.nn.Module, names: Tuple)[source]¶: Get specific attribute of module at any level

module_setattr(obj: torch.nn.Module, names: Tuple, val: torch.Tensor)[source]¶: Set specific attribute of module at any level

extract_data(net: torch.nn.Module, attr: str = 'data')[source]¶: Extract data stored in specific attribute and store as 1D array

insert_data(net: torch.nn.Module, theta: torch.Tensor) → None[source]¶: Insert 1D array of data into specific attribute

convert_to_base(a: tuple, b: float = 2.0) → tuple[source]¶

Convert tuple of floats to a base-exponent pair for nice printouts.

See use in, e.g., hessQuik.utils.input_derivative_check.input_derivative_check().