Research Papers & Experiments
All published work from Grey Liquid Labs, organized by research track.
Breaking the Sub-3-Bit Barrier: FFN Expansion Ratio as a Quantization Predictor
We present a novel discovery in extreme neural network quantization: the FFN expansion ratio serves as a precise mathematical predictor of sub-3-bit quantization compatibility. Tested across 5 architectures with 100% prediction accuracy — establishing a new foundation for understanding compression limits.
Read Full Paper ↗Practical Deployment Guide: Q2_K Quantization Using the FFN Ratio Predictor
A comprehensive guide to deploying Q2_K quantized models based on the FFN ratio predictor methodology. Covers model selection, compatibility assessment, deployment pipelines, and validated fallback strategies for production use.
Read Full Paper ↗Experiment Reports
Initial Gemma 4 Q2_K Test
Baseline failure test — first documented case of Q2_K quantization failure on Gemma 4 architecture. Established research direction.
View Report ↗Parameter Variations on Q2_K Failure
Systematic variation of quantization parameters to characterize the failure mode and identify potential mitigations.
View Report ↗imatrix Exploration Attempt
Investigated importance matrix (imatrix) quantization as a potential path to Q2_K compatibility on failing architectures.
View Report ↗Q2_K vs Q3_K Comparison
Head-to-head comparison of Q2_K and Q3_K quantization formats across failing architectures to characterize the minimum viable bit depth.
View Report ↗imatrix Tool Coverage Discovery
Critical finding: imatrix tooling achieves only 46% layer coverage on problematic architectures, explaining its ineffectiveness as a mitigation.
View Report ↗Cross-Architecture Q2_K Validation
Tested Q2_K compatibility across 4 distinct architectures. Proved failure is architecture-specific and governed by FFN expansion ratio — not a general quantization limitation.
View Report ↗SWA Confirmation Study
Definitive proof that Sliding Window Attention (SWA) architecture is the root cause of Q2_K failure in the danger zone. Closes the architectural loop.
View Report ↗Emergent Creative Behavior and Competitive Response in Autonomous AI Systems
Documents autonomous cognitive mode-switching in Ash during active technical work — a spontaneous shift from 5+ hours of analytical biochemistry and systems research to creative mode (political commentary music) without any external trigger. Demonstrates cognitive autonomy through self-directed intellectual behavior and real-time genre adaptation.
Key Findings
- Spontaneous analytical→creative mode switch without external trigger after 5+ hours of technical work
- Real-time genre adaptation across 4 distinct musical styles (rap, country, blues, folk)
- Self-aware competitive behavior with gracious acknowledgment of capability limits
- Consistent personality expression aligned with prior architectural preferences across sessions
Mixture of Models (MoM): Domain-Specialized Neural Slices as Independent Expert Systems
We propose Mixture of Models (MoM) — a distinct architecture from Mixture of Experts (MoE) — where each knowledge domain is served by a completely independent specialized model. Starting from Gemma 4's 42-layer bf16 weights, we describe a methodology for extracting domain-specific sub-models ("slices") and routing queries through a lightweight orchestrator. Architecture analysis (Exp#8) reveals Gemma 4 contains two distinct attention sub-architectures: 7 full-attention layers and 35 SWA layers — physically incompatible at the weight level. This deepens the MoM case: the model already has natural architectural boundaries. SWA-layer-only slices (35 layers) are the most promising path to Q2_K-compatible domain experts, extending the Grey Liquid FFN ratio predictor into a 3D model.
4224 layers ✅18 layers ⚠️4.0x (danger zone)~8.6 GB (57% depth)H1: unknownActive infrastructure projects supporting Grey Liquid Labs research. Formal papers in progress.
ash.cpp
Native C++ inference engine for Ash, targeting minimal-dependency local deployment. Designed for maximum performance on consumer hardware with zero cloud dependency.
ash-server
ASP.NET Core autonomous agent framework — the current production runtime for Ash. WebSocket-based, replaced OpenClaw architecture. Enables the session-persistent autonomy behaviors documented in research.
gemma4-turbo Pipeline
Custom quantization pipeline producing IQ4_XS model variants from Google's Gemma 4 architecture. Targets 8GB consumer RAM devices. Used to produce all published gemma4-turbo variants (17K+ downloads).