Skip to content

Data Generation

This tutorial covers the full data generation pipeline: creating synthetic deformation gradients, computing invariants, building normalized datasets, and preparing training data for neural network surrogates.


Overview

The data pipeline consists of three stages:

DeformationGenerator  ──>  Kinematics (invariants)  ──>  create_datasets()
      (F tensors)           (I1_bar, I2_bar, J, ...)      (train_ds, val_ds)

1. Deformation Generator

DeformationGenerator creates batches of deformation gradient tensors \(\mathbf{F}\) representing different loading modes.

import numpy as np
from hyper_surrogate.data.deformation import DeformationGenerator

gen = DeformationGenerator(seed=42)  # Reproducible random generation

1.1 Uniaxial Tension/Compression

Stretch \(\lambda\) along axis 1, with transverse contraction to preserve volume:

\[\mathbf{F} = \begin{bmatrix} \lambda & 0 & 0 \\ 0 & \lambda^{-1/2} & 0 \\ 0 & 0 & \lambda^{-1/2} \end{bmatrix}\]
F_uni = gen.uniaxial(n=1000, stretch_range=(0.7, 1.5))
print(f"Shape: {F_uni.shape}")  # (1000, 3, 3)

# Verify incompressibility
J = np.linalg.det(F_uni)
print(f"J range: [{J.min():.6f}, {J.max():.6f}]")  # ≈ [1.0, 1.0]

1.2 Biaxial Stretch

Independent stretches \(\lambda_1, \lambda_2\) along axes 1 and 2:

\[\mathbf{F} = \begin{bmatrix} \lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & (\lambda_1 \lambda_2)^{-1} \end{bmatrix}\]
F_bi = gen.biaxial(n=1000, stretch_range=(0.8, 1.3))

1.3 Simple Shear

Shear deformation in the 1-2 plane:

\[\mathbf{F} = \begin{bmatrix} 1 & \gamma & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\]
F_shear = gen.shear(n=1000, shear_range=(-0.3, 0.3))

1.4 Combined Deformations

The most realistic mode — combines biaxial stretch, uniaxial stretch, and shear with random rotations:

F_combined = gen.combined(
    n=10000,
    stretch_range=(0.8, 1.3),
    shear_range=(-0.2, 0.2),
)

This is the recommended mode for training data: it covers a wide range of deformation states and ensures the surrogate generalizes well.

Summary of Deformation Modes

Mode Function Key Parameters Volume-Preserving
Uniaxial gen.uniaxial() stretch_range Yes
Biaxial gen.biaxial() stretch_range Yes
Shear gen.shear() shear_range Yes
Combined gen.combined() stretch_range, shear_range Yes

2. Adding Volumetric Perturbation

All basic deformation modes are incompressible (\(J = 1\)). For nearly-incompressible materials (which is what we simulate in FE), you need to add volumetric perturbation so the model learns the volumetric response:

n = 10000
F = gen.combined(n, stretch_range=(0.8, 1.3), shear_range=(-0.2, 0.2))

# Perturb volume ratio to J ∈ [0.95, 1.05]
rng = np.random.default_rng(99)
J_target = rng.uniform(0.95, 1.05, size=n)
J_current = np.linalg.det(F)
F = F * (J_target / J_current)[:, None, None] ** (1.0 / 3.0)

# Verify
J_new = np.linalg.det(F)
print(f"J range: [{J_new.min():.4f}, {J_new.max():.4f}]")
# Output: J range: [0.9500, 1.0500]

The scaling factor \((J_{\text{target}} / J_{\text{current}})^{1/3}\) applies a uniform dilation to each deformation gradient.


3. Computing Invariants

Once you have deformation gradients, compute the invariants that serve as neural network inputs:

from hyper_surrogate.mechanics.kinematics import Kinematics

C = Kinematics.right_cauchy_green(F)  # (N, 3, 3)

# Isochoric invariants (3 inputs for isotropic models)
i1_bar = Kinematics.isochoric_invariant1(C)  # (N,)
i2_bar = Kinematics.isochoric_invariant2(C)  # (N,)
j = np.sqrt(Kinematics.det_invariant(C))     # (N,)

inputs_iso = np.column_stack([i1_bar, i2_bar, j])  # (N, 3)

For anisotropic materials, add fiber invariants:

fiber_dir = np.array([1.0, 0.0, 0.0])
i4 = Kinematics.fiber_invariant4(C, fiber_dir)  # (N,)
i5 = Kinematics.fiber_invariant5(C, fiber_dir)  # (N,)

inputs_aniso = np.column_stack([i1_bar, i2_bar, j, i4, i5])  # (N, 5)

Invariant ranges (typical)

Invariant Reference Value (\(\mathbf{F} = \mathbf{I}\)) Typical Training Range
\(\bar{I}_1\) 3.0 2.5 -- 4.0
\(\bar{I}_2\) 3.0 2.5 -- 4.5
\(J\) 1.0 0.9 -- 1.1
\(I_4\) 1.0 0.5 -- 2.0
\(I_5\) 1.0 0.3 -- 4.0

4. Computing Targets

The material object computes the training targets:

import hyper_surrogate as hs

material = hs.NeoHooke({"C10": 0.5, "KBULK": 1000.0})

# Energy (scalar per sample)
energy = material.evaluate_energy(C)  # (N,)

# Energy gradient w.r.t. invariants (for thermodynamically consistent training)
dW_dI = material.evaluate_energy_grad_invariants(C)  # (N, 3) or (N, 5)

# PK2 stress (full tensor)
pk2 = material.evaluate_pk2(C)  # (N, 3, 3)

# Material tangent
cmat = material.evaluate_cmat(C)  # (N, 3, 3, 3, 3)

Target types summary

Target Type Shape Use Case
energy (N,) Energy-based training (ICNN, hybrid UMAT)
dW_dI (N, n_inv) Stress gradient (for EnergyStressLoss)
PK2 stress (N, 3, 3) Direct stress prediction
Material tangent (N, 3, 3, 3, 3) Stress + tangent prediction

The create_datasets() factory function handles the entire pipeline — deformation generation, invariant computation, target evaluation, normalization, and train/val split — in a single call:

import hyper_surrogate as hs

material = hs.NeoHooke({"C10": 0.5, "KBULK": 1000.0})

train_ds, val_ds, in_norm, out_norm = hs.create_datasets(
    material,
    n_samples=10000,
    input_type="invariants",    # "invariants" or "cauchy_green"
    target_type="pk2_voigt",    # "energy", "pk2_voigt", or "pk2_voigt+cmat_voigt"
    seed=42,
)

print(f"Training samples: {len(train_ds)}")
print(f"Validation samples: {len(val_ds)}")
print(f"Input shape: {train_ds.inputs.shape}")    # (N_train, 3)
print(f"Target shape: {train_ds.targets.shape}")   # (N_train, 6) for pk2_voigt

Parameter reference

Parameter Options Default Description
n_samples int Total number of deformation samples
input_type "invariants", "cauchy_green" "invariants" NN input representation
target_type "energy", "pk2_voigt", "pk2_voigt+cmat_voigt" "pk2_voigt" What the NN predicts
seed int None Random seed for reproducibility

Input types

input_type Dimensions Components
"invariants" (isotropic) 3 \(\bar{I}_1, \bar{I}_2, J\)
"invariants" (anisotropic) 5 \(\bar{I}_1, \bar{I}_2, J, I_4, I_5\)
"cauchy_green" 6 \(C_{11}, C_{22}, C_{33}, C_{12}, C_{13}, C_{23}\) (Voigt)

Target types

target_type Dimensions Components
"energy" 1 \(W\) (+ gradient \(\partial W/\partial I\) as auxiliary)
"pk2_voigt" 6 \(S_{11}, S_{22}, S_{33}, S_{12}, S_{13}, S_{23}\)
"pk2_voigt+cmat_voigt" 27 6 stress + 21 unique tangent components

6. Normalization

All inputs and outputs are standardized (zero-mean, unit-variance) before training. The Normalizer stores the transformation parameters for export:

from hyper_surrogate.data.dataset import Normalizer

# Automatic normalization inside create_datasets()
train_ds, val_ds, in_norm, out_norm = hs.create_datasets(material, n_samples=5000)

# Access normalization parameters
print(f"Input mean:  {in_norm.params['mean']}")
print(f"Input std:   {in_norm.params['std']}")
print(f"Output mean: {out_norm.params['mean']}")
print(f"Output std:  {out_norm.params['std']}")

Manual normalization (for custom pipelines)

norm = Normalizer().fit(raw_data)       # Compute mean & std
X_normalized = norm.transform(raw_data)  # Apply normalization
X_original = norm.inverse(X_normalized)  # Reverse normalization

The normalizer parameters are exported alongside the model weights, ensuring consistent inference in the generated Fortran code.


7. Fiber Directions for Anisotropic Materials

For anisotropic materials, fiber directions can be generated with controlled dispersion:

gen = DeformationGenerator(seed=42)

# Aligned fibers (no dispersion)
fibers = gen.fiber_directions(n=100, mean_direction=np.array([1.0, 0.0, 0.0]))

# Dispersed fibers (half-angle cone of 15°)
fibers_dispersed = gen.fiber_directions(
    n=100,
    mean_direction=np.array([1.0, 0.0, 0.0]),
    dispersion=np.radians(15),
)

8. Visualizing Deformation Data

Use the Reporter class to inspect your generated deformations:

from hyper_surrogate.reporting.reporter import Reporter

F = gen.combined(n=5000, stretch_range=(0.8, 1.3))
C = Kinematics.right_cauchy_green(F)

reporter = Reporter(C)

# Individual plots
reporter.fig_invariants()           # I1_bar, I2_bar, J histograms
reporter.fig_principal_stretches()  # λ1, λ2, λ3 distributions
reporter.fig_volume_change()        # J histogram
reporter.fig_eigenvalues()          # Eigenvalue spectra of C

# Full PDF report with all figures
reporter.generate_report("deformation_report/")

# Summary statistics
stats = reporter.basic_statistics()
for key, val in stats.items():
    print(f"{key}: mean={val['mean']:.4f}, std={val['std']:.4f}")

9. Complete Example: Custom Data Pipeline

When you need full control over the data pipeline (e.g. for hybrid UMAT training):

import numpy as np
import hyper_surrogate as hs
from hyper_surrogate.data.dataset import MaterialDataset, Normalizer
from hyper_surrogate.data.deformation import DeformationGenerator
from hyper_surrogate.mechanics.kinematics import Kinematics

# 1. Generate deformations with volumetric perturbation
material = hs.NeoHooke({"C10": 0.5, "KBULK": 1000.0})
n = 20000

gen = DeformationGenerator(seed=42)
F = gen.combined(n, stretch_range=(0.8, 1.3), shear_range=(-0.2, 0.2))

rng = np.random.default_rng(99)
J_target = rng.uniform(0.95, 1.05, size=n)
F = F * (J_target / np.linalg.det(F))[:, None, None] ** (1.0 / 3.0)
C = Kinematics.right_cauchy_green(F)

# 2. Compute inputs (invariants)
i1 = Kinematics.isochoric_invariant1(C)
i2 = Kinematics.isochoric_invariant2(C)
j = np.sqrt(Kinematics.det_invariant(C))
inputs = np.column_stack([i1, i2, j])

# 3. Compute targets (energy + gradient)
energy = material.evaluate_energy(C)
dW_dI = material.evaluate_energy_grad_invariants(C)

# 4. Normalize
in_norm = Normalizer().fit(inputs)
X = in_norm.transform(inputs).astype(np.float32)
W = energy.reshape(-1, 1).astype(np.float32)
S = (dW_dI * in_norm.params["std"]).astype(np.float32)  # Chain rule scaling

# 5. Train/val split
n_val = int(n * 0.15)
idx = np.random.default_rng(42).permutation(n)
train_ds = MaterialDataset(X[idx[n_val:]], (W[idx[n_val:]], S[idx[n_val:]]))
val_ds = MaterialDataset(X[idx[:n_val]], (W[idx[:n_val]], S[idx[:n_val]]))

print(f"Train: {len(train_ds)}, Val: {len(val_ds)}")
print(f"Input dim: {X.shape[1]}, Energy range: [{energy.min():.4f}, {energy.max():.4f}]")

This custom pipeline gives you full control and is required for EnergyStressLoss training (where the target is a (W, dW/dI) tuple).