Synthetic Darcy Flow Data Pipeline: From GRF Permeability to Pressure Solutions

The core skill taught is generating high-fidelity training data for operator learning on the 2D Darcy equation: -∇·(k∇u) = f over 0,1² with Dirichlet BCs u=0. Start with DarcyFlowDataGenerator(resolution=32, length_scale=0.15, variance=1.0). It builds a Gaussian Random Field (GRF) covariance matrix for permeability k(x,y) = exp(GRF), using exponential kernel exp(-dist²/(2*length_scale²)) + jitter, Cholesky decomposed for efficient sampling: z ~ N(0,I), samples = L @ z.

Solve for pressure u using iterative Jacobi: for interior points, ui,j = (k_e ui,j+1 + k_w ui,j-1 + k_n ui-1,j + k_s ui+1,j + dx² f) / (k_e + k_w + k_n + k_s), converging in ~5000 steps or tol=1e-6. Generate n_samples=200 train/50 test pairs. Wrap in PyTorch Dataset with channel dim and optional z-score normalization (store mean/std for denorm). Use DataLoader(batch_size=16). Principle: GRF captures realistic heterogeneous permeability (e.g., subsurface flows); finite differences provide ground-truth without external solvers. Common mistake: Underdamped length_scale (>0.2) yields smooth k, poor generalization—use 0.1-0.15 for multiscale. Quality check: Visualize 3 samples side-by-side (viridis for k, hot for u) to confirm pressure pools in high-k regions.

# Key generation snippet
generator = DarcyFlowDataGenerator(resolution=32, length_scale=0.15)
perm_train, press_train = generator.generate_dataset(200)

Fourier Neural Operator: Spectral Kernels for Resolution-Independent Mapping

FNO learns function-to-function operators k → u by parameterizing Fourier multipliers. Key blocks: SpectralConv2d(in_ch=1, out_ch=1, modes1=8, modes2=8) does FFT → low-freq multiply (weights ~1/(in*out)) → iFFT; handles wraparound with dual weights for positive/negative freqs. FNOBlock adds local Conv2d(1x1) residual + GELU. Full FourierNeuralOperator2D: lift k (32x32x1) + grid (x,y linspace 0-1) via Linear(3→width=32), pad=5, 4 FNOBlocks, unpad, project Linear(32→128→1). ~100k params. Forward: permute to NCHW, cat grid, process, return NC(1)HW.

Why spectral? Convolution = Fourier multiply; truncating high modes (modes=12 max for 64res) ignores noise, enables zero-shot super-res. Trade-off: Padding needed for FFT modes; fix via consistent pad/unpad. Train with MSE on full fields (no points). Mistake: Forgetting grid encoding—FNOs are translation-equivariant but need pos for bounded domains. Eval: Relative L2 = ||u_pred - u|| / ||u|| < 1e-3 good for surrogates.

fno = FourierNeuralOperator2D(modes1=8, modes2=8, width=32, n_layers=4).to(device)
# Forward: out = fno(perm_batch)  # learns k → u operator

Physics-Informed NNs: PDE Residuals Without Full Data

PINNs solve unsupervised via multi-task loss on sparse/no data. PINN_MLP(input_dim=3: x,y,k → u): Fourier embedding (sin/cos(2π B · x,y), B fixed rand, 64 freqs) + k, then Tanh MLP 256→128→...→1, Xavier init. Loss (lambda_data=1, pde=1, bc=10): data MSE(u_pred, u_obs), PDE residual -k(u_xx + u_yy) -1 via dual autograd (grad(u,x)→u_x→u_xx), BC MSE(u_bc=0). Collocation: sample interior/pde/bc points uniformly.

Principle: Autodiff enforces physics everywhere; Fourier feats boost freq capture vs ReLU. Trade-off: Stiff losses (tune lambdas, start data>>physics); slower than data-driven (grad graph). Mistake: No requires_grad_(True) on coords or forgetting create_graph=True for Hessians. Quality: Balance losses <1e-4 each; physics loss drops signal overfit.

pinn = PINN_MLP(hidden_dims=[128]*4, n_frequencies=64).to(device)
loss_fn = DarcyPINNLoss()
# Usage: losses = loss_fn(pinn, x_data,y_data,k_data,u_data, x_pde,...)

CNN Surrogate Baseline and Inference Benchmarking

Add convolutional surrogate: UNet-like with Conv2d blocks as baseline (not physics-aware). Train all (FNO/PINN/CNN) via Trainer: Adam(lr=1e-3), MSE/data loss for supervised, full physics loss for PINN. Loop: train_epoch (zero_grad→pred→loss→backward→step), validate no_grad MSE, save best val state, CosineAnnealLR. Plot semilogy train/val curves.

Benchmark: Time 1000 inferences on test set (torch.no_grad(), sync). FNO fastest (spectral lift), CNN mid, PINN slowest (autodiff). Save torch.save(model.state_dict(), 'fno_darcy.pth'). Principle: Surrogates 1000x faster than FD solvers for repeated k. Trade-off: FNO best gen (res-invariant), PINN data-efficient but eval slow. Post-train: Denorm preds, L2/rel err plots.

trainer = Trainer(fno, Adam(fno.parameters(),1e-3))
history = trainer.train(train_loader, test_loader, 100)

"The Fourier Neural Operator (FNO) learns mappings between function spaces by parameterizing the integral kernel in Fourier space. Key insight: Convolution in physical space = multiplication in Fourier space."

"Physics-Informed Neural Networks (PINNs) incorporate physical laws directly into the loss function... residual of the PDE at collocation points."

"GRF for permeability: realistic heterogeneous fields critical for subsurface modeling—smooth k leads to trivial solutions."

"Benchmark shows FNO at 50ms/inference vs FD Jacobi 2s—key for real-time surrogates in optimization loops."

"Fourier features in PINN: sine activations capture high freqs better than Tanh alone, converging 2x faster."