Overview
Simplex v0.9.0 introduces Self-Learning Annealing—a system where optimization hyperparameters learn themselves through meta-gradients. Instead of manually tuning learning rates, temperature schedules, and thresholds, the system discovers optimal values by differentiating through the optimization process itself.
This is powered by Dual Numbers from v0.8.0, which enable gradients to flow through every parameter in the optimization schedule.
The Hyperparameter Problem
Traditional optimization requires choosing many hyperparameters:
- Initial temperature – Too high wastes computation, too low gets stuck
- Cooling rate – Fast cooling misses solutions, slow cooling is inefficient
- Learning rate – The eternal struggle of machine learning
- When to re-heat – Detecting plateaus is non-trivial
- Acceptance threshold – Balance exploration vs exploitation
These choices often require extensive experimentation. Self-Learning Optimization eliminates this manual tuning by making all parameters learnable.
Learnable Temperature Schedule
All schedule parameters are dual numbers, enabling gradient computation:
struct LearnableSchedule {
initial_temp: dual, // T⊂0;: starting temperature
cool_rate: dual, // α: exponential decay rate
min_temp: dual, // T⊂min;: temperature floor
reheat_threshold: dual, // ρ: stagnation steps before reheat
reheat_intensity: dual, // γ: heat increase on reheat
oscillation_amp: dual, // β: temperature oscillation amplitude
oscillation_freq: dual, // ω: oscillation frequency
accept_threshold: dual, // τ: soft acceptance threshold
}
// Temperature at step t
fn temperature(&self, step: u64) -> dual {
let base = self.initial_temp * (-self.cool_rate * step).exp();
let oscillation = self.oscillation_amp * (self.oscillation_freq * step).sin();
(base + oscillation).max(self.min_temp)
}
Differentiable Soft Acceptance
Traditional simulated annealing uses a hard accept/reject decision, which has zero gradient. Self-Learning uses a soft acceptance function that's fully differentiable:
// Differentiable relaxation of accept/reject
fn soft_accept(delta_e: dual, temp: dual, tau: dual) -> dual {
let scaled = (tau - delta_e) / temp;
scaled.sigmoid() // Smooth 0-1 output, fully differentiable!
}
// During training: weighted combination of branches
// After training: compiles to efficient hard decision
Meta-Optimizer
The Meta-Optimizer wraps your objective function and learns optimal schedule parameters through meta-gradients:
use simplex_training::{MetaOptimizer, LearnableSchedule};
// Define your objective function
fn objective(params: &[dual]) -> dual {
// Your loss function here
compute_loss(params)
}
// Create meta-optimizer with learnable schedule
let schedule = LearnableSchedule::default();
let meta = MetaOptimizer::new(schedule, objective)
.meta_learning_rate(0.01)
.inner_steps(100);
// Run optimization - schedule learns as it goes!
for epoch in 0..1000 {
let result = meta.step(current_params);
// Meta-gradient: how does final loss change w.r.t. schedule?
let meta_grad = result.meta_gradient;
// Update schedule parameters
meta.schedule.update(meta_grad);
}
Applications
The simplex-training library provides learnable schedules for common training scenarios:
Learnable Learning Rate
Automatic schedule optimization. No more manual warmup/decay tuning.
Learnable Distillation
Optimal temperature schedule for knowledge distillation from teacher to student.
Learnable Pruning
Automatic discovery of optimal pruning schedules for model compression.
Learnable Quantization
Gradient-guided precision reduction with minimal quality loss.
use simplex_training::{MetaTrainer, SpecialistConfig, CompressionPipeline};
// Full training pipeline with all learnable schedules
let trainer = MetaTrainer::new()
.with_learnable_lr() // Learning rate schedule
.with_learnable_distillation() // Knowledge distillation
.with_learnable_pruning() // Model pruning
.with_learnable_quantization(); // Precision reduction
// Train specialists with self-optimizing hyperparameters
let result = trainer.meta_train(&specialists, &teacher).await;
println("Final loss: {}", result.final_loss);
println("Learned LR schedule: {:?}", result.learned_lr_schedule);
println("Model compression: {}x", result.compression_ratio);
Performance Improvements
Self-Learning Optimization consistently outperforms fixed schedules:
| Metric | Fixed Schedule | Learned Schedule | Improvement |
|---|---|---|---|
| Final Loss | 1.0 | 0.85-0.90 | 10-15% better |
| Training Steps | 100K | 70-80K | 20-30% fewer |
| Pruning Quality | 50% @ 5% loss | 50% @ 2% loss | 60% less degradation |
| Quantization Quality | 4-bit @ 8% loss | 4-bit @ 4% loss | 50% less degradation |
Automatic Hyperparameter Tuning
Stop spending hours tuning hyperparameters. Self-Learning Optimization discovers optimal schedules automatically, often finding solutions that human tuning would miss. The meta-gradients capture exactly how each parameter affects final performance.
Using Self-Learning Optimization
use simplex_training::{LearnableSchedule, Annealer};
// Create annealer with learnable schedule
let annealer = Annealer::new(LearnableSchedule::default());
// Define your problem
fn energy(state: &State) -> dual {
// Your objective function
compute_energy(state)
}
// Run annealing - schedule learns optimal parameters
let solution = annealer.optimize(initial_state, energy, 10000);
// Examine what was learned
println("Learned initial temp: {}", annealer.schedule.initial_temp.val);
println("Learned cool rate: {}", annealer.schedule.cool_rate.val);
println("Number of re-heats: {}", annealer.stats.reheat_count);