LoRA Fails Facts Due to High-Rank Updates; RS-LoRA Fixes Scaling

Style Updates Are Low-Rank, Facts Are High-Rank

Style changes like tone or format concentrate in few dimensions: singular values decay fast (top 10: 5.0, 4.5, 4.0, 3.5, 0.5,...). At rank-4, LoRA captures nearly all signal; rank-8 hits 99% cumulative variance. Facts like medical data or stats spread across dimensions (top 10 singular values: 3.0, 2.9, 2.8,... slow decay). Rank-8 captures only 28% variance, so low-rank LoRA (r=4-8) sounds fluent but outputs wrong/incomplete facts—model forgets high-dimensional tail.

To simulate: Generate low-rank delta with true_rank=4, linspace singular values 5→0.5; high-rank with linspace 3→0.5 over min(d,k)=64. QR orthogonalize U/V, add 1% noise. Frobenius-normalized error quantifies loss.

Standard LoRA Over-Scales at High Ranks, Causing Collapse

Increasing rank captures more facts (error drops from 0.85 at r=4 to 0.42 at r=32), but standard scaling α/r (α=16) shrinks update: r=1→16.0, r=4→4.0, r=8→2.0, r=16→1.0, r=32→0.5, r=64→0.25. Higher capacity but weaker signal forces optimizer overcompensation, leading to instability/poor convergence.

Error table (64x64 matrix):

Rank	Style Err	Facts Err
2	0.201	0.916
4	0.015	0.850
8	0.002	0.692
16	0.001	0.553
32	0.000	0.417
48	0.000	0.289

Style error →0 quickly; facts need r≥32 but scaling vanishes.

RS-LoRA's √r Scaling Enables High-Rank Fact Learning

Change scaling to α/√r: r=1→16.0, r=4→8.0, r=8→5.7, r=16→4.0, r=32→2.8, r=64→2.0—gradual drop preserves signal magnitude. RS-LoRA facts error: r=2→0.894, r=4→0.775, r=8→0.585, r=16→0.413, r=32→0.199, r=48→0.099 (steady improvement vs standard's plateau).

LoRA approx: SVD delta → U,S,Vt; truncate r; B=U*S, A=Vt,:; delta_approx = scale * (B @ A). Use for production fact-tuning (e.g., domain knowledge) at r=32+; stick to standard low-r for style.