The Functional Paradigm of JAX and Flax

Unlike PyTorch or TensorFlow, JAX treats models as pure functions rather than stateful objects. This requires a shift in how developers handle randomness and model state.

  • Explicit PRNG: JAX eliminates global random states. Developers must pass a PRNGKey to functions and split it using jax.random.split to generate independent, reproducible streams of randomness. This ensures that experiments are perfectly deterministic.
  • Model State Management: In Flax, parameters and optimizer states are stored in a plain Python data structure. The TrainState utility acts as a container for the model's apply_fn, current params, and the optimizer (tx).
  • Defining Architectures: Using the @nn.compact decorator allows for inline definition of sub-modules within the __call__ method. This triggers parameter initialization on the first forward pass, keeping the code concise and readable.

Bridging PyTorch and JAX

While JAX handles computation, it lacks built-in dataset management. The author demonstrates a common, effective pattern: using torchvision for data downloading and preprocessing, while overriding the DataLoader's collate_fn to output NumPy arrays instead of PyTorch tensors. This allows JAX to consume the data directly without unnecessary overhead.

Training and Activation Functions

To evaluate performance, the author implements a multi-layer perceptron (MLP) on the FashionMNIST dataset, comparing six different activation functions: Sigmoid, Tanh, ReLU, LeakyReLU, ELU, and Swish.

  • Numerical Stability: The model outputs raw logits rather than probabilities. This is standard practice because the cross-entropy loss function is numerically more stable when it handles log_softmax internally.
  • Initialization: The author uses lecun_uniform initialization to maintain parity with PyTorch defaults, which is critical for training stability in deep networks.
  • Performance: The training loop utilizes @jit (Just-In-Time compilation) to accelerate the train_step and eval_step functions, demonstrating how JAX achieves high performance through XLA compilation.