Snowflake-Native Fraud ML Pipeline: Train to Monitor

Overcome Data Gravity and Class Imbalance in Fraud Detection

Keep all ML stages—EDA, training, inference, monitoring—inside Snowflake to eliminate data movement risks like security gaps and lineage breaks. Start with SQL summaries on 100k transaction rows showing 0.5-2% fraud rate, then visualize patterns: fraud peaks 00:00-05:00 (high-risk hour flag), channel/merchant risks, and correlations (e.g., VELOCITY_SCORE, low DEVICE_TRUST_SCORE strongest). Engineer five key features: AMOUNT_TO_AVG_RATIO for deviation detection, IS_HIGH_RISK_HOUR binary, RISK_COMPOSITE (0.3VELOCITY_SCORE + 0.3(1-DEVICE_TRUST_SCORE) + 0.2*(FAILED_TRANSACTIONS_LAST_24H/10) + 0.2*(DISTINCT_COUNTRIES_7D/5)) as prior signal, LOG_AMOUNT for skew, CREDIT_SCORE_BIN (0-500=0, 500-650=1, etc.). One-hot encode categoricals (CHANNEL, MERCHANT_CATEGORY, etc.), yielding 39 features after stratified 80/20 split (80000 train w/2797 fraud, 20000 test w/699 fraud).

Train XGBoost with imbalance fix: scale_pos_weight = legit/fraud ratio (27.60), params like n_estimators=500, max_depth=6, learning_rate=0.05, eval_metric='aucpr' (prioritizes precision-recall over ROC-AUC for rare events), early_stopping_rounds=50. Use Snowflake ExperimentTracking to log params/metrics automatically. Result: best_iteration=7, ROC-AUC=0.7275, Average Precision=0.4907 (discriminates better on imbalance), default F1=0.5096. Optimize threshold by sweeping 0.1-0.9: 0.58 maximizes F1=0.5874 (Fraud precision=0.90, recall=0.43), balancing false positives (customer friction) vs. negatives (financial loss).

Top importances: RISK_COMPOSITE, VELOCITY_SCORE, DEVICE_TRUST_SCORE confirm engineered signals boost trees.

Productionize Models with Registry, Inference, and Observability

Register via Snowflake Registry: log_model with metrics, sample_input for schema inference, task=TABULAR_BINARY_CLASSIFICATION. Gets versioned artifact (FRAUD_DETECTION_XGBOOST V1) with audit trail, no external stores. For batch inference on new 1000 txns, reapply exact feature pipeline + column alignment (pad missing dummies to 39 cols). Call registered model.run(predict_proba), apply threshold, save predictions (FRAUD_PROBABILITY, FRAUD_PREDICTION) + metadata to governed table ML.PRODUCTION.FRAUD_PREDICTIONS. Flags 25.7% as fraud; top risks show ATM/online/phone patterns.

Enable observability: create ModelMonitor on scored table for daily drift checks (numeric/categorical distributions) and score distribution shifts. Alerts on evolving fraud tactics without separate dashboards—model degrades silently otherwise. Entire pipeline runs in Snowflake Notebooks: Snowpark for compute, no creds/context switches. Trade-off: warehouse costs scale with data size, but unified governance outweighs external stack fragility.