Preprocessing Swings CNN Accuracy from 65% to 87% on CIFAR-10
Raw CIFAR-10 pixels yield 65% test accuracy; normalization/standardization lift to 69%; geometric augmentation maintains ~67%; photometric brightness/contrast crashes to 20%; combined pipeline with deeper CNN hits 87%.
Scale Pixels to Stabilize Gradients and Boost Baseline Performance
Train CNNs on raw CIFAR-10 images (32x32x3 pixels, 0-255 range) without preprocessing for a 65.47% test accuracy baseline after 10 epochs using Adam optimizer and sparse categorical cross-entropy. Large pixel values (up to 255) cause exploding gradients: ∂L/∂w ≈ 255 × δ, leading to overshooting and oscillations in weight updates. Normalize by dividing by 255.0 to scale to 0,1, reducing gradients to 1 × δ for smooth convergence, raising accuracy to 69.38%. Standardization (Z-score: (x - μ)/σ per channel) matches this at 69.38%, centering data at mean 0 and std 1—Ez = 0 and Var(z) = 1 proven via linearity of expectation and variance properties—but offers no extra gain for CNNs on images, as basic normalization suffices for stable training.
Use Geometric Augmentation for Invariance but Avoid Photometric Overkill
Apply geometric augmentations (RandomFlip horizontal, RandomRotation 0.1, RandomZoom 0.1) after normalization, training 20 epochs: accuracy dips to 67.13% on simple CNN, as added variability challenges the model without deeper capacity. These create rotation/scale/flip invariance via affine transformations—e.g., flip: x' = -x, rotation: cosθ -sinθ; sinθ cosθ, zoom: s scaling—forcing feature learning (wheels, wings) over memorization. Photometric augmentations (RandomBrightness/Contrast 0.2) after normalization catastrophically drop accuracy to 20.62%: clipping saturates pixels to 0/1 (e.g., 0.9 + 0.2 → 1.0), destroying edges/textures in low-res 32x32 images, worsening signal-to-noise ratio and erasing discriminative features like airplane wings or cat eyes.
Stack Normalization, Geometric Augs, and Architecture for 87% Accuracy
Combine Z-score standardization ((X - mean)/std, ε=1e-7), geometric augmentations (add RandomTranslation 0.1,0.1), one-hot labels with 0.1 label smoothing (y_smooth = (1-α)y_true + α/K, injecting 0.01 uniform noise across 10 classes to curb overconfidence), and deeper CNN (64-128-256 filters in padded conv blocks, BatchNorm, Dropout 0.2-0.5, MaxPool): achieves 87.32% test accuracy with EarlyStopping (patience=8 on val_acc) and ReduceLROnPlateau (factor=0.5, patience=3). BatchNorm normalizes layer activations: ˆx = (x - μ_B)/√(σ²_B + ε), then γˆx + β for learnable scaling/shift, stabilizing internal distributions. This pipeline aligns preprocessing with model capacity, proving no single technique wins—success demands tailored combinations avoiding info destruction while enforcing generalization.