Data Generation¶
This tutorial covers the full data generation pipeline: creating synthetic deformation gradients, computing invariants, building normalized datasets, and preparing training data for neural network surrogates.
Overview¶
The data pipeline consists of three stages:
DeformationGenerator ──> Kinematics (invariants) ──> create_datasets()
(F tensors) (I1_bar, I2_bar, J, ...) (train_ds, val_ds)
1. Deformation Generator¶
DeformationGenerator creates batches of deformation gradient tensors \(\mathbf{F}\) representing different loading modes.
import numpy as np
from hyper_surrogate.data.deformation import DeformationGenerator
gen = DeformationGenerator(seed=42) # Reproducible random generation
1.1 Uniaxial Tension/Compression¶
Stretch \(\lambda\) along axis 1, with transverse contraction to preserve volume:
F_uni = gen.uniaxial(n=1000, stretch_range=(0.7, 1.5))
print(f"Shape: {F_uni.shape}") # (1000, 3, 3)
# Verify incompressibility
J = np.linalg.det(F_uni)
print(f"J range: [{J.min():.6f}, {J.max():.6f}]") # ≈ [1.0, 1.0]
1.2 Biaxial Stretch¶
Independent stretches \(\lambda_1, \lambda_2\) along axes 1 and 2:
F_bi = gen.biaxial(n=1000, stretch_range=(0.8, 1.3))
1.3 Simple Shear¶
Shear deformation in the 1-2 plane:
F_shear = gen.shear(n=1000, shear_range=(-0.3, 0.3))
1.4 Combined Deformations¶
The most realistic mode — combines biaxial stretch, uniaxial stretch, and shear with random rotations:
F_combined = gen.combined(
n=10000,
stretch_range=(0.8, 1.3),
shear_range=(-0.2, 0.2),
)
This is the recommended mode for training data: it covers a wide range of deformation states and ensures the surrogate generalizes well.
Summary of Deformation Modes¶
| Mode | Function | Key Parameters | Volume-Preserving |
|---|---|---|---|
| Uniaxial | gen.uniaxial() |
stretch_range |
Yes |
| Biaxial | gen.biaxial() |
stretch_range |
Yes |
| Shear | gen.shear() |
shear_range |
Yes |
| Combined | gen.combined() |
stretch_range, shear_range |
Yes |
2. Adding Volumetric Perturbation¶
All basic deformation modes are incompressible (\(J = 1\)). For nearly-incompressible materials (which is what we simulate in FE), you need to add volumetric perturbation so the model learns the volumetric response:
n = 10000
F = gen.combined(n, stretch_range=(0.8, 1.3), shear_range=(-0.2, 0.2))
# Perturb volume ratio to J ∈ [0.95, 1.05]
rng = np.random.default_rng(99)
J_target = rng.uniform(0.95, 1.05, size=n)
J_current = np.linalg.det(F)
F = F * (J_target / J_current)[:, None, None] ** (1.0 / 3.0)
# Verify
J_new = np.linalg.det(F)
print(f"J range: [{J_new.min():.4f}, {J_new.max():.4f}]")
# Output: J range: [0.9500, 1.0500]
The scaling factor \((J_{\text{target}} / J_{\text{current}})^{1/3}\) applies a uniform dilation to each deformation gradient.
3. Computing Invariants¶
Once you have deformation gradients, compute the invariants that serve as neural network inputs:
from hyper_surrogate.mechanics.kinematics import Kinematics
C = Kinematics.right_cauchy_green(F) # (N, 3, 3)
# Isochoric invariants (3 inputs for isotropic models)
i1_bar = Kinematics.isochoric_invariant1(C) # (N,)
i2_bar = Kinematics.isochoric_invariant2(C) # (N,)
j = np.sqrt(Kinematics.det_invariant(C)) # (N,)
inputs_iso = np.column_stack([i1_bar, i2_bar, j]) # (N, 3)
For anisotropic materials, add fiber invariants:
fiber_dir = np.array([1.0, 0.0, 0.0])
i4 = Kinematics.fiber_invariant4(C, fiber_dir) # (N,)
i5 = Kinematics.fiber_invariant5(C, fiber_dir) # (N,)
inputs_aniso = np.column_stack([i1_bar, i2_bar, j, i4, i5]) # (N, 5)
Invariant ranges (typical)¶
| Invariant | Reference Value (\(\mathbf{F} = \mathbf{I}\)) | Typical Training Range |
|---|---|---|
| \(\bar{I}_1\) | 3.0 | 2.5 -- 4.0 |
| \(\bar{I}_2\) | 3.0 | 2.5 -- 4.5 |
| \(J\) | 1.0 | 0.9 -- 1.1 |
| \(I_4\) | 1.0 | 0.5 -- 2.0 |
| \(I_5\) | 1.0 | 0.3 -- 4.0 |
4. Computing Targets¶
The material object computes the training targets:
import hyper_surrogate as hs
material = hs.NeoHooke({"C10": 0.5, "KBULK": 1000.0})
# Energy (scalar per sample)
energy = material.evaluate_energy(C) # (N,)
# Energy gradient w.r.t. invariants (for thermodynamically consistent training)
dW_dI = material.evaluate_energy_grad_invariants(C) # (N, 3) or (N, 5)
# PK2 stress (full tensor)
pk2 = material.evaluate_pk2(C) # (N, 3, 3)
# Material tangent
cmat = material.evaluate_cmat(C) # (N, 3, 3, 3, 3)
Target types summary¶
| Target Type | Shape | Use Case |
|---|---|---|
energy |
(N,) |
Energy-based training (ICNN, hybrid UMAT) |
dW_dI |
(N, n_inv) |
Stress gradient (for EnergyStressLoss) |
| PK2 stress | (N, 3, 3) |
Direct stress prediction |
| Material tangent | (N, 3, 3, 3, 3) |
Stress + tangent prediction |
5. Using create_datasets() (Recommended)¶
The create_datasets() factory function handles the entire pipeline — deformation generation, invariant computation, target evaluation, normalization, and train/val split — in a single call:
import hyper_surrogate as hs
material = hs.NeoHooke({"C10": 0.5, "KBULK": 1000.0})
train_ds, val_ds, in_norm, out_norm = hs.create_datasets(
material,
n_samples=10000,
input_type="invariants", # "invariants" or "cauchy_green"
target_type="pk2_voigt", # "energy", "pk2_voigt", or "pk2_voigt+cmat_voigt"
seed=42,
)
print(f"Training samples: {len(train_ds)}")
print(f"Validation samples: {len(val_ds)}")
print(f"Input shape: {train_ds.inputs.shape}") # (N_train, 3)
print(f"Target shape: {train_ds.targets.shape}") # (N_train, 6) for pk2_voigt
Parameter reference¶
| Parameter | Options | Default | Description |
|---|---|---|---|
n_samples |
int | — | Total number of deformation samples |
input_type |
"invariants", "cauchy_green" |
"invariants" |
NN input representation |
target_type |
"energy", "pk2_voigt", "pk2_voigt+cmat_voigt" |
"pk2_voigt" |
What the NN predicts |
seed |
int | None |
Random seed for reproducibility |
Input types¶
input_type |
Dimensions | Components |
|---|---|---|
"invariants" (isotropic) |
3 | \(\bar{I}_1, \bar{I}_2, J\) |
"invariants" (anisotropic) |
5 | \(\bar{I}_1, \bar{I}_2, J, I_4, I_5\) |
"cauchy_green" |
6 | \(C_{11}, C_{22}, C_{33}, C_{12}, C_{13}, C_{23}\) (Voigt) |
Target types¶
target_type |
Dimensions | Components |
|---|---|---|
"energy" |
1 | \(W\) (+ gradient \(\partial W/\partial I\) as auxiliary) |
"pk2_voigt" |
6 | \(S_{11}, S_{22}, S_{33}, S_{12}, S_{13}, S_{23}\) |
"pk2_voigt+cmat_voigt" |
27 | 6 stress + 21 unique tangent components |
6. Normalization¶
All inputs and outputs are standardized (zero-mean, unit-variance) before training. The Normalizer stores the transformation parameters for export:
from hyper_surrogate.data.dataset import Normalizer
# Automatic normalization inside create_datasets()
train_ds, val_ds, in_norm, out_norm = hs.create_datasets(material, n_samples=5000)
# Access normalization parameters
print(f"Input mean: {in_norm.params['mean']}")
print(f"Input std: {in_norm.params['std']}")
print(f"Output mean: {out_norm.params['mean']}")
print(f"Output std: {out_norm.params['std']}")
Manual normalization (for custom pipelines)¶
norm = Normalizer().fit(raw_data) # Compute mean & std
X_normalized = norm.transform(raw_data) # Apply normalization
X_original = norm.inverse(X_normalized) # Reverse normalization
The normalizer parameters are exported alongside the model weights, ensuring consistent inference in the generated Fortran code.
7. Fiber Directions for Anisotropic Materials¶
For anisotropic materials, fiber directions can be generated with controlled dispersion:
gen = DeformationGenerator(seed=42)
# Aligned fibers (no dispersion)
fibers = gen.fiber_directions(n=100, mean_direction=np.array([1.0, 0.0, 0.0]))
# Dispersed fibers (half-angle cone of 15°)
fibers_dispersed = gen.fiber_directions(
n=100,
mean_direction=np.array([1.0, 0.0, 0.0]),
dispersion=np.radians(15),
)
8. Visualizing Deformation Data¶
Use the Reporter class to inspect your generated deformations:
from hyper_surrogate.reporting.reporter import Reporter
F = gen.combined(n=5000, stretch_range=(0.8, 1.3))
C = Kinematics.right_cauchy_green(F)
reporter = Reporter(C)
# Individual plots
reporter.fig_invariants() # I1_bar, I2_bar, J histograms
reporter.fig_principal_stretches() # λ1, λ2, λ3 distributions
reporter.fig_volume_change() # J histogram
reporter.fig_eigenvalues() # Eigenvalue spectra of C
# Full PDF report with all figures
reporter.generate_report("deformation_report/")
# Summary statistics
stats = reporter.basic_statistics()
for key, val in stats.items():
print(f"{key}: mean={val['mean']:.4f}, std={val['std']:.4f}")
9. Complete Example: Custom Data Pipeline¶
When you need full control over the data pipeline (e.g. for hybrid UMAT training):
import numpy as np
import hyper_surrogate as hs
from hyper_surrogate.data.dataset import MaterialDataset, Normalizer
from hyper_surrogate.data.deformation import DeformationGenerator
from hyper_surrogate.mechanics.kinematics import Kinematics
# 1. Generate deformations with volumetric perturbation
material = hs.NeoHooke({"C10": 0.5, "KBULK": 1000.0})
n = 20000
gen = DeformationGenerator(seed=42)
F = gen.combined(n, stretch_range=(0.8, 1.3), shear_range=(-0.2, 0.2))
rng = np.random.default_rng(99)
J_target = rng.uniform(0.95, 1.05, size=n)
F = F * (J_target / np.linalg.det(F))[:, None, None] ** (1.0 / 3.0)
C = Kinematics.right_cauchy_green(F)
# 2. Compute inputs (invariants)
i1 = Kinematics.isochoric_invariant1(C)
i2 = Kinematics.isochoric_invariant2(C)
j = np.sqrt(Kinematics.det_invariant(C))
inputs = np.column_stack([i1, i2, j])
# 3. Compute targets (energy + gradient)
energy = material.evaluate_energy(C)
dW_dI = material.evaluate_energy_grad_invariants(C)
# 4. Normalize
in_norm = Normalizer().fit(inputs)
X = in_norm.transform(inputs).astype(np.float32)
W = energy.reshape(-1, 1).astype(np.float32)
S = (dW_dI * in_norm.params["std"]).astype(np.float32) # Chain rule scaling
# 5. Train/val split
n_val = int(n * 0.15)
idx = np.random.default_rng(42).permutation(n)
train_ds = MaterialDataset(X[idx[n_val:]], (W[idx[n_val:]], S[idx[n_val:]]))
val_ds = MaterialDataset(X[idx[:n_val]], (W[idx[:n_val]], S[idx[:n_val]]))
print(f"Train: {len(train_ds)}, Val: {len(val_ds)}")
print(f"Input dim: {X.shape[1]}, Energy range: [{energy.min():.4f}, {energy.max():.4f}]")
This custom pipeline gives you full control and is required for EnergyStressLoss training (where the target is a (W, dW/dI) tuple).