Skip to content

Training Neural Network Surrogates

This tutorial covers the neural network architectures available in hyper-surrogate, the loss functions for thermodynamically consistent training, and the Trainer class with early stopping and learning rate scheduling.


Overview

The surrogate training pipeline:

Dataset (X, Y)  ──>  Model (MLP / ICNN / PolyconvexICNN / CANN)  ──>  Trainer  ──>  TrainingResult

1. Neural Network Architectures

Architecture comparison

Architecture Output Convexity Physics Guarantee Input Dim Output Dim
MLP Stress or Energy None None Any Any
ICNN Energy (scalar) Convex in inputs Thermodynamic consistency Any 1
PolyconvexICNN Energy (scalar) Polyconvex Stronger thermodynamic guarantee Any 1
CANN Energy (scalar) Optional Interpretable + sparse Any 1

1.1 MLP (Multi-Layer Perceptron)

The most flexible architecture — a standard feedforward network with no physics constraints:

import hyper_surrogate as hs

model = hs.MLP(
    input_dim=3,          # 3 invariants (I1_bar, I2_bar, J)
    output_dim=6,         # 6 stress components (Voigt)
    hidden_dims=[64, 64], # Two hidden layers with 64 neurons each
    activation="tanh",    # Activation function
)

Architecture diagram:

Input (3) ──> [Linear 64] ──> tanh ──> [Linear 64] ──> tanh ──> [Linear 6] ──> Output (6)

Available activations:

Activation Function Best For
"relu" \(\max(0, x)\) General use, fast
"tanh" \(\tanh(x)\) Stress prediction (bounded outputs)
"sigmoid" \(1/(1 + e^{-x})\) Bounded [0,1] outputs
"softplus" \(\ln(1 + e^x)\) Energy prediction (smooth, positive)

When to use MLP:

  • Direct stress prediction (output = PK2 Voigt)
  • Quick prototyping without physics constraints
  • When convexity is not required

1.2 ICNN (Input-Convex Neural Network)

Guarantees that the output is a convex function of the input. This is critical for energy-based training: a convex energy function ensures unique stress states and stable FE simulations.

model = hs.ICNN(
    input_dim=3,           # Invariants
    hidden_dims=[32, 32],  # Hidden layer sizes
    activation="softplus", # Must be convex activation
)
# Output dim is always 1 (scalar energy)

Architecture (convexity enforced via non-negative z-path weights):

x ─────────────────────────────────────────────────> Wx₂ ──> + ──> Wx₃ ──> + ──> output
                                                      |       ↑       |       ↑
x ──> [Wx₁] ──> σ ──> z₁ ──> [softplus(Wz₂)]·z₁ ──'  σ  [softplus(Wz₃)]·z₂  σ
                                                            ──> z₂             ──> z₃

Key properties:

  • z-path weights are passed through softplus() to enforce non-negativity
  • x-path weights are unconstrained (direct input skip connections)
  • The composition of convex non-decreasing functions with non-negative linear maps preserves convexity

1.3 PolyconvexICNN

For polyconvex energy functions, each invariant (or group of invariants) gets its own ICNN branch:

# Isotropic: 3 branches, one per invariant
model = hs.PolyconvexICNN(
    groups=[[0], [1], [2]],       # W = W₁(I1_bar) + W₂(I2_bar) + W₃(J)
    hidden_dims=[32, 32],
    activation="softplus",
)

# Anisotropic: 4 branches (fiber invariants grouped)
model = hs.PolyconvexICNN(
    groups=[[0], [1], [2], [3, 4]],  # W = W₁(I1) + W₂(I2) + W₃(J) + W₄(I4, I5)
    hidden_dims=[32, 32],
    activation="softplus",
)

Architecture:

I1_bar ──> [ICNN branch 1] ──> W₁ ──┐
I2_bar ──> [ICNN branch 2] ──> W₂ ──┤
J      ──> [ICNN branch 3] ──> W₃ ──┼──> W = W₁ + W₂ + W₃ (+ W₄)
I4, I5 ──> [ICNN branch 4] ──> W₄ ──┘

The sum of convex functions is convex, so polyconvexity is guaranteed by construction.

Advantages over plain ICNN:

  • Block-diagonal Hessian (more efficient tangent computation)
  • Physical interpretability (separate volumetric, isochoric, fiber contributions)
  • Better conditioning

1.4 CANN (Constitutive Artificial Neural Network)

An interpretable architecture using predefined basis functions with learnable non-negative weights:

from hyper_surrogate.models.cann import CANN

model = CANN(
    input_dim=3,
    n_polynomial=3,          # (I)^1, (I)^2, (I)^3
    n_exponential=2,         # exp(b·I²) - 1 (learnable b)
    use_logarithmic=True,    # log(1 + I²)
    learnable_exponents=True,
)

Energy function:

\[W = \sum_{k} \text{softplus}(w_k) \cdot \psi_k(I_\alpha)\]

where \(\psi_k\) are basis functions applied per invariant:

Type Basis Function \(\psi_k\) Parameters
Polynomial \((I_\alpha)^p,\quad p = 1, \ldots, n\) None (fixed powers)
Exponential \(\exp(b_k I_\alpha^2) - 1\) Learnable \(b_k > 0\)
Logarithmic \(\ln(1 + I_\alpha^2)\) None

Use case: Model discovery — after training with L1 regularization (SparseLoss), inspect which terms survived to identify the minimal constitutive law.


2. Loss Functions

2.1 StressLoss

Simple MSE on predicted vs. true stress:

\[\mathcal{L} = \frac{1}{N} \sum_{i=1}^{N} \|\mathbf{S}_{\text{pred}}^{(i)} - \mathbf{S}_{\text{true}}^{(i)}\|^2\]
loss_fn = hs.StressLoss()

Use with: MLP predicting stress directly (target_type="pk2_voigt").

2.2 EnergyStressLoss

Joint energy + stress-gradient loss that enforces thermodynamic consistency:

\[\mathcal{L} = \alpha \|W_{\text{pred}} - W_{\text{true}}\|^2 + \beta \left\|\frac{\partial W_{\text{pred}}}{\partial \mathbf{x}_{\text{norm}}} - \frac{\partial W_{\text{true}}}{\partial \mathbf{x}_{\text{norm}}}\right\|^2\]

The gradient term is computed via PyTorch autograd through the network — this ensures the predicted stress is consistent with the predicted energy.

loss_fn = hs.EnergyStressLoss(alpha=1.0, beta=1.0)
Parameter Default Effect
alpha 1.0 Weight on energy MSE
beta 1.0 Weight on gradient (stress) MSE

Use with: ICNN, PolyconvexICNN, MLP (output_dim=1), CANN (target_type="energy").

Important: The dataset target must be a tuple (W, dW/dx_norm) — see the data generation tutorial.

2.3 StressTangentLoss

Weighted MSE on both stress and material tangent:

\[\mathcal{L} = \alpha \|\mathbf{S}_{\text{pred}} - \mathbf{S}_{\text{true}}\|^2 + \beta \|\mathbb{C}_{\text{pred}} - \mathbb{C}_{\text{true}}\|^2\]
from hyper_surrogate.training.losses import StressTangentLoss

loss_fn = StressTangentLoss(alpha=1.0, beta=0.1)

Use with: MLP predicting stress + tangent (target_type="pk2_voigt+cmat_voigt", output_dim=27).

2.4 SparseLoss

EnergyStressLoss + L1 regularization for model discovery with CANN:

\[\mathcal{L} = \mathcal{L}_{\text{energy+stress}} + \lambda \sum_k |w_k|\]
from hyper_surrogate.training.losses import SparseLoss

loss_fn = SparseLoss(alpha=1.0, beta=1.0, l1_lambda=0.01)

Use with: CANN architecture for identifying minimal constitutive laws.

Loss function decision table

Scenario Loss Function Target Type Model Output
Direct stress prediction StressLoss pk2_voigt 6D
Stress + tangent StressTangentLoss pk2_voigt+cmat_voigt 27D
Energy-based (thermodynamic) EnergyStressLoss energy 1D
Model discovery (sparse) SparseLoss energy 1D

3. The Trainer

The Trainer class handles the training loop with early stopping and learning rate scheduling:

result = hs.Trainer(
    model,
    train_ds,
    val_ds,
    loss_fn=hs.EnergyStressLoss(alpha=1.0, beta=1.0),
    max_epochs=2000,
    lr=1e-3,
    patience=200,
    batch_size=512,
    device="cpu",  # or "cuda"
).fit()

Trainer parameters

Parameter Type Default Description
model SurrogateModel Network to train
train_ds MaterialDataset Training dataset
val_ds MaterialDataset Validation dataset
loss_fn Loss StressLoss() Loss function
max_epochs int 1000 Maximum training epochs
lr float 1e-3 Initial learning rate (Adam optimizer)
patience int 100 Early stopping patience
batch_size int 256 Mini-batch size
device str "cpu" Device ("cpu" or "cuda")

Training features

Feature How It Works
Early stopping Monitors val_loss; stops after patience epochs without improvement
LR scheduling ReduceLROnPlateau: halves LR when val_loss plateaus
Best checkpoint Saves best model state; restores it after training completes
Autograd detection Automatically enables gradient computation for EnergyStressLoss

TrainingResult

result = trainer.fit()

# Access training history
result.history["train_loss"]  # List[float] per epoch
result.history["val_loss"]    # List[float] per epoch

# Best model
result.model       # The model with best val_loss weights restored
result.best_epoch  # Epoch index of best val_loss

# Print summary
best_val = result.history["val_loss"][result.best_epoch]
print(f"Best val loss: {best_val:.6f} at epoch {result.best_epoch}")
print(f"Total epochs: {len(result.history['train_loss'])}")

4. Example: MLP for Direct Stress Prediction

import hyper_surrogate as hs

material = hs.NeoHooke({"C10": 0.5, "KBULK": 1000.0})

# Create dataset: invariants → PK2 Voigt
train_ds, val_ds, in_norm, out_norm = hs.create_datasets(
    material, n_samples=5000,
    input_type="invariants",
    target_type="pk2_voigt",
    seed=42,
)

# Build model: 3 inputs → 6 outputs
model = hs.MLP(input_dim=3, output_dim=6, hidden_dims=[64, 64], activation="tanh")

# Train
result = hs.Trainer(
    model, train_ds, val_ds,
    loss_fn=hs.StressLoss(),
    max_epochs=500,
    lr=1e-3,
).fit()

print(f"Best val loss: {result.history['val_loss'][result.best_epoch]:.6f}")

5. Example: ICNN for Convex Energy Prediction

import hyper_surrogate as hs

material = hs.NeoHooke({"C10": 0.5, "KBULK": 1000.0})

train_ds, val_ds, in_norm, out_norm = hs.create_datasets(
    material, n_samples=5000,
    input_type="invariants",
    target_type="energy",
    seed=42,
)

# ICNN: convex energy output (scalar)
model = hs.ICNN(input_dim=3, hidden_dims=[32, 32])

# EnergyStressLoss: enforces stress = dW/dI via autograd
result = hs.Trainer(
    model, train_ds, val_ds,
    loss_fn=hs.EnergyStressLoss(alpha=1.0, beta=1.0),
    max_epochs=500,
).fit()

print(f"Best val loss: {result.history['val_loss'][result.best_epoch]:.6f}")

6. Example: PolyconvexICNN for Polyconvex Energy

import hyper_surrogate as hs

material = hs.NeoHooke({"C10": 0.5, "KBULK": 1000.0})

train_ds, val_ds, in_norm, out_norm = hs.create_datasets(
    material, n_samples=5000,
    input_type="invariants",
    target_type="energy",
    seed=42,
)

# Each invariant gets its own convex branch
model = hs.PolyconvexICNN(
    groups=[[0], [1], [2]],  # W = W₁(I1_bar) + W₂(I2_bar) + W₃(J)
    hidden_dims=[32, 32],
    activation="softplus",
)

result = hs.Trainer(
    model, train_ds, val_ds,
    loss_fn=hs.EnergyStressLoss(alpha=1.0, beta=1.0),
    max_epochs=2000,
    lr=1e-3,
    patience=200,
).fit()

print(f"Best val loss: {result.history['val_loss'][result.best_epoch]:.6f}")

7. Example: Energy MLP for Hybrid UMAT

When targeting a hybrid UMAT (NN energy + analytical mechanics), use an MLP with output_dim=1 and EnergyStressLoss:

import numpy as np
import hyper_surrogate as hs
from hyper_surrogate.data.dataset import MaterialDataset, Normalizer
from hyper_surrogate.data.deformation import DeformationGenerator
from hyper_surrogate.mechanics.kinematics import Kinematics

# Generate data with volumetric perturbation
material = hs.NeoHooke({"C10": 0.5, "KBULK": 1000.0})
n = 20000

gen = DeformationGenerator(seed=42)
F = gen.combined(n, stretch_range=(0.8, 1.3), shear_range=(-0.2, 0.2))
rng = np.random.default_rng(99)
J_target = rng.uniform(0.95, 1.05, size=n)
F = F * (J_target / np.linalg.det(F))[:, None, None] ** (1.0 / 3.0)
C = Kinematics.right_cauchy_green(F)

# Inputs and targets
i1 = Kinematics.isochoric_invariant1(C)
i2 = Kinematics.isochoric_invariant2(C)
j = np.sqrt(Kinematics.det_invariant(C))
inputs = np.column_stack([i1, i2, j])

energy = material.evaluate_energy(C)
dW_dI = material.evaluate_energy_grad_invariants(C)

# Normalize
in_norm = Normalizer().fit(inputs)
X = in_norm.transform(inputs).astype(np.float32)
W = energy.reshape(-1, 1).astype(np.float32)
S = (dW_dI * in_norm.params["std"]).astype(np.float32)

# Split
n_val = int(n * 0.15)
idx = np.random.default_rng(42).permutation(n)
train_ds = MaterialDataset(X[idx[n_val:]], (W[idx[n_val:]], S[idx[n_val:]]))
val_ds = MaterialDataset(X[idx[:n_val]], (W[idx[:n_val]], S[idx[:n_val]]))

# Train energy MLP
model = hs.MLP(input_dim=3, output_dim=1, hidden_dims=[64, 64, 64], activation="softplus")
result = hs.Trainer(
    model, train_ds, val_ds,
    loss_fn=hs.EnergyStressLoss(alpha=1.0, beta=1.0),
    max_epochs=3000, lr=1e-3, patience=300, batch_size=512,
).fit()

best_val = result.history["val_loss"][result.best_epoch]
print(f"Best val loss: {best_val:.6f} (epoch {result.best_epoch})")

8. Evaluating a Trained Model

Direct inference

import torch

model.eval()
x_test = torch.tensor(val_ds.inputs[:100], dtype=torch.float32)

with torch.no_grad():
    y_pred = model(x_test).numpy()

Hybrid inference (energy → stress via autograd)

For energy-based models, compute stress as the gradient of energy:

model.eval()
x = torch.tensor(val_ds.inputs[:100], dtype=torch.float32, requires_grad=True)

W_pred = model(x)
dW_dx = torch.autograd.grad(W_pred.sum(), x, create_graph=True)[0]

# Convert from normalized to raw invariant space
std_t = torch.tensor(in_norm.params["std"], dtype=torch.float32)
dW_dI_pred = (dW_dx / std_t).detach().numpy()  # (100, 3)

Computing R² and MAE

import numpy as np

# Compare predictions to reference
dW_dI_ref = dW_dI[idx[:n_val]][:100]

for i, name in enumerate(["dW/dI1_bar", "dW/dI2_bar", "dW/dJ"]):
    ss_res = np.sum((dW_dI_pred[:, i] - dW_dI_ref[:, i]) ** 2)
    ss_tot = np.sum((dW_dI_ref[:, i] - dW_dI_ref[:, i].mean()) ** 2)
    r2 = 1 - ss_res / max(ss_tot, 1e-30)
    mae = np.mean(np.abs(dW_dI_pred[:, i] - dW_dI_ref[:, i]))
    print(f"  {name}: R²={r2:.6f}, MAE={mae:.6f}")

9. Hyperparameter Guidelines

Scenario Architecture Hidden Dims Activation Loss Epochs LR Patience
Quick prototype MLP [32, 32] tanh StressLoss 500 1e-3 50
Production stress MLP [64, 64, 64] tanh StressLoss 2000 1e-3 200
Convex energy ICNN [32, 32] softplus EnergyStressLoss 1000 1e-3 100
Polyconvex energy PolyconvexICNN [32, 32] softplus EnergyStressLoss 2000 1e-3 200
Hybrid UMAT MLP (out=1) [64, 64, 64] softplus EnergyStressLoss 3000 1e-3 300
Model discovery CANN SparseLoss 500 1e-3 50

Tips

  • Activation: Use softplus for energy models (smooth, positive), tanh for stress models
  • Batch size: 256--512 works well; larger batches stabilize gradients
  • Learning rate: Start at 1e-3; the scheduler will reduce automatically
  • Data size: 5000--20000 samples is usually sufficient for 3D invariants
  • Hidden dimensions: 32--64 neurons per layer, 2--3 layers is typical