NES optimizes quadratic bowl via gaussian perturbations

Sample 50 perturbed weights from N(w, 0.1), weight by standardized rewards, update w by 0.001/(50*0.1) * sum(noise * weights) to converge in 300 iters.

NES Core Loop for Black-Box Optimization

NES treats parameters w as mean of a fixed-variance gaussian (sigma=0.1). To maximize black-box reward f(w) without gradients:

  1. Generate npop=50 noise samples N ~ N(0,1) (shape 50x3).
  2. Perturb: w_tryj = w + sigma * Nj, compute Rj = f(w_tryj). Here f(w) = -||w - 0.5,0.1,-0.3||^2_2 (max reward=0 at solution).
  3. Standardize: A = (R - mean(R)) / std(R) to zero-mean unit-variance (avoids div-by-zero on flat rewards; speeds convergence vs raw R).
  4. Update: w += alpha/(npop * sigma) * N.T @ A (alpha=0.001). This is score-function gradient estimator Ereward * noise/sigma.

Starts from random w≈1.76,0.40,0.98 (reward -3.32), reaches -0.000009 error by iter 280.

w = w + alpha/(npop*sigma) * np.dot(N.T, A)

sigma scales perturbation size and normalizes estimator (divisor matches multiplier for consistent gradient scale).

Proven Convergence on Toy Quadratic

300 iters suffice; prints every 20 show steady progress:

  • Iter 0: reward -3.323
  • Iter 100: -0.727
  • Iter 200: -0.001
  • Iter 280: -0.000009

Toy mimics NN optimization: f(w) would forward NN on env, return total reward. Solution hidden from optimizer.

Insights from Implementers

  • Standardization optional but boosts speed: Raw R works (paper-equivalent via Section 3.2), but centering/scaling prevents stagnation on negative/flat rewards.
  • Edge cases: Add epsilon to std(R) avoids div0 when all R equal (common early/simple problems).
  • Extensions: Handles moving targets with small jitters; libs like evostra apply to Flappy Bird. No crossover needed vs GA—NES is gradient-like via log-prob derivative.
  • Deployment: Save final w; reconstruct NN. Practical for RL vs DQN (no backprop, parallelizable evals).

Summarized by x-ai/grok-4.1-fast via openrouter

8855 input / 1292 output tokens in 10281ms

© 2026 Edge