hessQuik.utils¶
Peaks Data¶
- peaks(y: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False) Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]] [source]¶
Generate data from the MATLAB 2D peaks function
Examples:
import matplotlib.pyplot as plt from matplotlib import cm from mpl_toolkits import mplot3d x, y = torch.linspace(-3, 3, 100), torch.linspace(-3, 3, 100) grid_x, grid_y = torch.meshgrid(x, y) grid_xy = torch.concat((grid_x.reshape(-1, 1), grid_y.reshape(-1, 1)), dim=1) grid_z, *_ = peaks(grid_xy) fig = plt.figure() ax = plt.axes(projection='3d') surf = ax.plot_surface(grid_x, grid_y, grid_z.reshape(grid_x.shape), cmap=cm.viridis) ax.set_xlabel('x') ax.set_ylabel('y') ax.set_zlabel('z') plt.show()
- Parameters:
y (torch.Tensor) – (x, y) coordinates with shape \((n_s, 2)\) where \(n_s\) is the number of samples
do_gradient (bool, optional) – If set to
True
, the gradient will be computed during the forward call. Default:False
do_Hessian (bool, optional) – If set to
True
, the Hessian will be computed during the forward call. Default:False
- Returns:
f (torch.Tensor) - value of peaks function at each coordinate with shape \((n_s, 1)\)
dfdx (torch.Tensor or
None
) - value of gradient at each coordinate with shape \((n_s, 2)\)d2fd2x (torch.Tensor or
None
) - value of Hessian at each coordinate with shape \((n_s, 4)\)
Input Derivative Checks¶
- input_derivative_check(f: Union[torch.nn.Module, Callable], x: torch.Tensor, do_Hessian: bool = False, forward_mode: bool = True, num_test: int = 15, base: float = 2.0, tol: float = 0.1, verbose: float = False) Tuple[Optional[bool], Optional[bool]] [source]¶
Taylor approximation test to verify derivatives. Form the approximation by perturbing the input \(x\) in the direction \(p\) with step size \(h > 0\) via
\[f(x + h p) \approx f(x) + h\nabla f(x)^\top p + \frac{1}{2}p^\top \nabla^2f(x) p\]As \(h \downarrow 0^+\), the error between the approximation and the true value will decrease. The rate of decrease indicates the accuracy of the derivative computation. For details, see Chapter 5 of Computational Methods for Electromagnetics by Eldad Haber.
Examples:
>>> from hessQuik.layers import singleLayer >>> torch.set_default_dtype(torch.float64) # use double precision to check implementations >>> x = torch.randn(10, 4) >>> f = singleLayer(4, 7, act=act.softplusActivation()) >>> input_derivative_check(f, x, do_Hessian=True, verbose=True, forward_mode=True) h E0 E1 E2 1.00 x 2^(00) 1.62 x 2^(-02) 1.70 x 2^(-07) 1.02 x 2^(-12) 1.00 x 2^(-01) 1.63 x 2^(-03) 1.70 x 2^(-09) 1.06 x 2^(-15) 1.00 x 2^(-02) 1.63 x 2^(-04) 1.69 x 2^(-11) 1.08 x 2^(-18) 1.00 x 2^(-03) 1.63 x 2^(-05) 1.69 x 2^(-13) 1.09 x 2^(-21) 1.00 x 2^(-04) 1.63 x 2^(-06) 1.69 x 2^(-15) 1.09 x 2^(-24) 1.00 x 2^(-05) 1.63 x 2^(-07) 1.69 x 2^(-17) 1.10 x 2^(-27) 1.00 x 2^(-06) 1.63 x 2^(-08) 1.69 x 2^(-19) 1.10 x 2^(-30) 1.00 x 2^(-07) 1.63 x 2^(-09) 1.69 x 2^(-21) 1.10 x 2^(-33) 1.00 x 2^(-08) 1.63 x 2^(-10) 1.69 x 2^(-23) 1.10 x 2^(-36) 1.00 x 2^(-09) 1.63 x 2^(-11) 1.69 x 2^(-25) 1.10 x 2^(-39) 1.00 x 2^(-10) 1.63 x 2^(-12) 1.69 x 2^(-27) 1.10 x 2^(-42) 1.00 x 2^(-11) 1.63 x 2^(-13) 1.69 x 2^(-29) 1.10 x 2^(-45) 1.00 x 2^(-12) 1.63 x 2^(-14) 1.69 x 2^(-31) 1.15 x 2^(-48) 1.00 x 2^(-13) 1.63 x 2^(-15) 1.69 x 2^(-33) 1.33 x 2^(-50) 1.00 x 2^(-14) 1.63 x 2^(-16) 1.69 x 2^(-35) 1.70 x 2^(-51) Gradient PASSED! Hessian PASSED!
- Parameters:
f (torch.nn.Module or Callable) – callable function that returns value, gradient, and Hessian
x (torch.Tensor) – input data
do_Hessian (bool, optional) – If set to
True
, the Hessian will be computed during the forward call. Default:False
forward_mode (bool, optional) – If set to
False
, the derivatives will be computed in backward mode. Default:True
num_test (int) – number of perturbations
base (float) – step size \(h = base^k\)
tol (float) – small tolerance to account for numerical errors when computing the order of approximation
verbose (bool) – printout flag
- Returns:
grad_check (bool) - if
True
, gradient check passeshess_check (bool, optional) - if
True
, Hessian check passes
- input_derivative_check_finite_difference(f: Callable, x: torch.Tensor, do_Hessian: bool = False, forward_mode: bool = True, eps: float = 0.0001, atol: float = 1e-05, rtol: float = 0.001, verbose: bool = False) Tuple[Optional[bool], Optional[bool]] [source]¶
Finite difference test to verify derivatives. Form the approximation by perturbing each entry in the input in the unit direction with step size \(\varepsilon > 0\):
\[\widetilde{\nabla f}_{i} = \frac{f(x_i + \varepsilon) - f(x_i - \varepsilon)}{2\varepsilon}\]where \(x_i \pm \varepsilon\) means add or subtract \(\varepsilon\) from the i-th entry of the input \(x\), but leave the other entries unchanged. The notation \(\widetilde{(\cdot)}\) indicates the finite difference approximation.
Examples:
>>> from hessQuik.layers import singleLayer >>> torch.set_default_dtype(torch.float64) # use double precision to check implementations >>> x = torch.randn(10, 4) >>> f = singleLayer(4, 7, act=act.tanhActivation()) >>> input_derivative_check_finite_difference(f, x, do_Hessian=True, verbose=True, forward_mode=True) Gradient Finite Difference: Error = 8.1720e-10, Relative Error = 2.5602e-10 Gradient PASSED! Hessian Finite Difference: Error = 4.5324e-08, Relative Error = 4.4598e-08 Hessian PASSED!
- Parameters:
f (Callable) – callable function that returns value, gradient, and Hessian
x (torch.Tensor) – input data
do_Hessian (bool, optional) – If set to
True
, the Hessian will be computed during the forward call. Default:False
forward_mode (bool, optional) – If set to
False
, the derivatives will be computed in backward mode. Default:True
eps (float) – step size. Default: 1e-4
atol (float) – absolute tolerance, e.g., \(\|\nabla f - \widetilde{\nabla f}\| < atol\). Default: 1e-5
rtol (float) – relative tolerance, e.g., \(\|\nabla f - \widetilde{\nabla f}\|/\|\nabla f\| < rtol\). Default: 1e-3
verbose (bool) – printout flag
- Returns:
grad_check (bool) - if
True
, gradient check passeshess_check (bool, optional) - if
True
, Hessian check passes
Network Weights Derivative Check¶
- network_derivative_check(f: torch.nn.Module, x: torch.Tensor, do_Hessian: bool = False, forward_mode: bool = True, num_test: int = 15, base: float = 2.0, tol: float = 0.1, verbose: bool = False) Optional[bool] [source]¶
Taylor approximation test to verify derivatives. Form the approximation by perturbing the network weights \(\theta\) in the direction \(p\) with step size \(h > 0\) via
\[\Phi(\theta + h p) \approx \Phi(\theta) + h\nabla_{\theta} \Phi(\theta)^\top p\]where \(\Phi\) is the objective function and \(\theta\) are the network weights. This test uses the loss
\[\Phi(\theta) = \frac{1}{2}\|f_{\theta}(x)\|^2 + \frac{1}{2}\|\nabla f_{\theta}(x)\|^2 + \frac{1}{2}\|\nabla^2 f_{\theta}(x)\|^2\]to validate network gradient computation after computing derivatives of the input features of the network \(f_{\theta}\).
As \(h \downarrow 0^+\), the error between the approximation and the true value will decrease. The rate of decrease indicates the accuracy of the derivative computation. For details, see Chapter 5 of Computational Methods for Electromagnetics by Eldad Haber.
Examples:
>>> import hessQuik.activations as act, hessQuik.layers as lay, hessQuik.networks as net >>> torch.set_default_dtype(torch.float64) # use double precision to check implementations >>> x = torch.randn(10, 4) >>> width, depth = 8, 3 >>> f = net.NN(lay.singleLayer(4, width, act=act.tanhActivation()), net.resnetNN(width, depth, h=1.0, act=act.tanhActivation()), lay.singleLayer(width, 1, act=act.identityActivation())) >>> network_derivative_check(f, x, do_Hessian=True, verbose=True, forward_mode=True) h E0 E1 1.00 x 2^(00) 1.97 x 2^(02) 1.05 x 2^(01) 1.00 x 2^(-01) 1.14 x 2^(02) 1.71 x 2^(-02) 1.00 x 2^(-02) 1.20 x 2^(01) 1.50 x 2^(-04) 1.00 x 2^(-03) 1.22 x 2^(00) 1.40 x 2^(-06) 1.00 x 2^(-04) 1.24 x 2^(-01) 1.35 x 2^(-08) 1.00 x 2^(-05) 1.24 x 2^(-02) 1.32 x 2^(-10) 1.00 x 2^(-06) 1.24 x 2^(-03) 1.31 x 2^(-12) 1.00 x 2^(-07) 1.24 x 2^(-04) 1.30 x 2^(-14) 1.00 x 2^(-08) 1.25 x 2^(-05) 1.30 x 2^(-16) 1.00 x 2^(-09) 1.25 x 2^(-06) 1.30 x 2^(-18) 1.00 x 2^(-10) 1.25 x 2^(-07) 1.30 x 2^(-20) 1.00 x 2^(-11) 1.25 x 2^(-08) 1.30 x 2^(-22) 1.00 x 2^(-12) 1.25 x 2^(-09) 1.30 x 2^(-24) 1.00 x 2^(-13) 1.25 x 2^(-10) 1.30 x 2^(-26) 1.00 x 2^(-14) 1.25 x 2^(-11) 1.30 x 2^(-28) Gradient PASSED!
- Parameters:
f (torch.nn.Module) – callable function that returns value, gradient, and Hessian
x (torch.Tensor) – input data
do_Hessian (bool, optional) – If set to
True
, the Hessian will be computed during the forward call. Default:False
forward_mode (bool, optional) – If set to
False
, the derivatives will be computed in backward mode. Default:True
num_test (int) – number of perturbations
base (float) – step size \(h = base^k\)
tol (float) – small tolerance to account for numerical errors when computing the order of approximation
verbose (bool) – printout flag
- Returns:
grad_check (bool) - if
True
, gradient check passes
Timing Functions¶
- setup_device_and_gradient(f: NN, network_wrapper: str = 'hessQuik', device: str = 'cpu') torch.nn.Module [source]¶
Setup network with correct wrapper and device
- setup_resnet(in_features: int, out_features: int, width: int = 16, depth: int = 4) NN [source]¶
Setup resnet architecture for timing tests
- setup_fully_connected(in_features: int, out_features: int, width: int = 16, depth: int = 4) NN [source]¶
Setup fully-connected architecture for timing tests
- setup_icnn(in_features: int, out_features: int, width: int = 16, depth: int = 4) NN [source]¶
Setup ICNN architecture for timing tests.
Requires scalar output.
- setup_network(in_features: int, out_features: int, width: int, depth: int, network_type: str = 'resnet', network_wrapper: str = 'hessQuik', device: str = 'cpu')[source]¶
Wrapper to setup network.
- timing_test_cpu(f: Union[NN, torch.nn.Module], x: torch.Tensor, num_trials: int = 10, clear_memory: bool = True) torch.Tensor [source]¶
Timing test for one architecture on CPU.
Test is run
num_trials
times and the timing for each trial is returned.The timing includes one dry run for the first iteration that is not returned.
Memory is cleared after the completion of all trials.
- timing_test_gpu(f: Union[NN, torch.nn.Module], x: torch.Tensor, num_trials: int = 10, clear_memory: bool = True)[source]¶
Timing test for one architecture on CPU.
Test is run
num_trials
times and the timing for each trial is returned.Each trial includes a
torch.cuda.synchonize
call.The timing includes one dry run for the first iteration that is not returned.
Memory is cleared after the completion of all trials.
- timing_test(in_feature_range: torch.Tensor, out_feature_range: torch.Tensor, nex: int = 10, num_trials: int = 10, width: int = 16, depth: int = 4, network_wrapper: str = 'hessQuik', network_type: str = 'resnet', device: str = 'cpu', clear_memory: bool = True) dict [source]¶
- Parameters:
in_feature_range (torch.Tensor) – available input feature dimensions
out_feature_range (torch.Tensor) – available output feature dimensions
nex (int, optional) – number of examples for network input. Default: 10
num_trials (int, optional) – number of trials per input-output feature combination. Default: 10
width (int, optional) – width of network. Default: 16
depth (int, optional) – depth of network. Default: 4
network_wrapper (str, optional) – type of network wrapper. Default: ‘hessQuik’. Options: ‘hessQuik’, ‘PytorchAD’, ‘PytorchHessian’
network_type (str, optional) – network architecture. Default: ‘resnet’. Options: ‘resnet’, ‘fully_connected’, ‘icnn’
device (str, optional) – device for testing. Default: ‘cpu’
clear_memory (bool, optional) – flag to clear memory after each set of trials
- Returns:
dictionary containing keys
’timing_trials’ (torch.Tensor) - time (in seconds) for each trial and each architecture
’timing_trials_mean’ (torch.Tensor) - average over trials for each architecture
’timing_trials_std’ (torch.Tensor) - standard deviation over trials for each architecture
’in_feature_range’ (torch.Tensor) - available input feature dimensions
’out_feature_range’ (torch.Tensor) - available output feature dimensions
’nex’ (int) - number of samples for the input data
’num_trials’ (int) - number of trials per architecture
Training Functions¶
- train_one_epoch(f: torch.nn.Module, x: torch.Tensor, y: torch.Tensor, optimizer: torch.optim.Optimizer, batch_size: int = 5, do_gradient: bool = False, do_Hessian: bool = False, loss_weights: Union[tuple, list] = (1.0, 1.0, 1.0))[source]¶
Training mean-square loss for one epoch where the loss function is
\[L(\theta) = \frac{1}{2N}\left(w_0\|f_{\theta}(x)\|^2 + w_1\|\nabla f_{\theta}(x)\|^2 + w_2\|\nabla^2 f_{\theta}(x)\|^2\right)\]where \(f_{theta}\) is the network \(\theta\) are the network weights, and \(N\) is the number of training samples. The loss corresponding to the function value, gradient, and Hessian each can have different weights, \(w_0\), \(w_1\), and \(w_2\), respectively.
- Parameters:
f (torch.nn.Module) – hessQuik neural network to train
x (torch.Tensor) – training data of shape \((N, *)\) where \(*\) can be any shapw
y (torch.Tensor) – target data of shape \((N, *)\)
optimizer (torch.optim.Optimizer) – method for updating the network weights
batch_size (int) – size of mini-batches for stochatistic training. Default: 5
do_gradient (bool, optional) – If set to
True
, the gradient will be computed during the forward call. Default:False
do_Hessian (bool, optional) – If set to
True
, the Hessian will be computed during the forward call. Default:False
loss_weights (tuple or list) – weight for each term in the loss function
- Returns:
tuple containing the overall running loss and the running loss for each term in the loss function
- test(f: torch.nn.Module, x: torch.Tensor, y: torch.Tensor, do_gradient: bool = False, do_Hessian: bool = False, loss_weights: Union[tuple, list] = (1.0, 1.0, 1.0))[source]¶
Evaluate mean-squared loss function without training
See
hessQuik.utils.training.train_one_epoch()
for details.
Additional Utilities¶
- module_getattr(obj: torch.nn.Module, names: Tuple)[source]¶
Get specific attribute of module at any level
- module_setattr(obj: torch.nn.Module, names: Tuple, val: torch.Tensor)[source]¶
Set specific attribute of module at any level
- extract_data(net: torch.nn.Module, attr: str = 'data')[source]¶
Extract data stored in specific attribute and store as 1D array
- insert_data(net: torch.nn.Module, theta: torch.Tensor) None [source]¶
Insert 1D array of data into specific attribute
- convert_to_base(a: tuple, b: float = 2.0) tuple [source]¶
Convert tuple of floats to a base-exponent pair for nice printouts.
See use in, e.g.,
hessQuik.utils.input_derivative_check.input_derivative_check()
.