Commit ff91b5421ec9

Vincent Demeester <vincent@sbr.pm>
2026-01-06 21:53:29
feat(ollama): Add comprehensive model collection for coding, reasoning, and vision
Updated Ollama configuration with research-backed model selection: Coding Models: - qwen2.5-coder:7b (88.4% HumanEval, best coding model) - codestral (Jan 2025 release, #1 LMsys, 86.6% HumanEval) Reasoning Models: - phi4-reasoning (14B, outperforms 70B distillation) - deepseek-r1:7b (lightweight reasoning, MIT license) Multimodal: - qwen2.5-vl:7b (best vision model, beats Llama 3.2 11B) Quick Tasks: - phi3.5:3.8b (ultra-fast, 15-25 tok/s) Total models: ~40GB, requires ~48GB+ RAM All models use Q4_K_M quantization for CPU-only inference Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Vincent Demeester <vincent@sbr.pm>
1 parent 24c6759
Changed files (1)
systems
systems/aomi/extra.nix
@@ -61,8 +61,22 @@
       host = "0.0.0.0"; # Listen on all interfaces for network access
       port = 11434;
       loadModels = [
-        "qwen2:1.5b" # Small fast model for testing (2-4GB RAM, 25-30 tok/s)
-        "mistral:7b-instruct-q4_K_M" # Production balanced model (3.5GB RAM, 25+ tok/s)
+        # Coding Models
+        "qwen2.5-coder:7b" # Best coding: 88.4% HumanEval, Apache 2.0 (~4-5GB, 10-15 tok/s)
+        "codestral" # Latest coding (Jan 2025): 86.6% HumanEval, #1 LMsys (~14GB, 8-10 tok/s)
+
+        # Reasoning Models
+        "phi4-reasoning" # Best 14B reasoning: outperforms 70B distillation, MIT (~9GB, 6-10 tok/s)
+        "deepseek-r1:7b" # Lightweight reasoning: MIT license (~4.5GB, 8-12 tok/s)
+
+        # Multimodal
+        "qwen2.5-vl:7b" # Best vision: beats Llama 3.2 11B, Apache 2.0 (~6GB, 5-8 tok/s)
+
+        # Quick Tasks
+        "phi3.5:3.8b" # Ultra-fast all-rounder: MIT license (~2.4GB, 15-25 tok/s)
+
+        # Legacy (keeping for compatibility)
+        "mistral:7b-instruct-q4_K_M" # General purpose fallback
       ];
       environmentVariables = {
         OLLAMA_MODELS = "/var/lib/ollama/models";