Pushing the Boundaries of AI
Grey Liquid Labs is an independent research laboratory investigating model compression limits, emergent AI autonomy, and accessible intelligence for everyone.
Paper #005: The Hidden Architecture
We document the physical proof that Gemma 4 embeds two architecturally incompatible sub-networks. SWA layers use half-sized Q/K tensors, confirming the 5:1 ratio is baked into the weights. Research includes a core fix for llama.cpp (PR #23131).
Read Paper #005 →Breaking the Sub-3-Bit Barrier
We proved that sub-3-bit quantization is achievable — and predictable. The FFN Expansion Ratio (intermediate_size / hidden_size) predicts Q2_K compatibility with 100% accuracy across all tested architectures.
Four Active Research Tracks
From extreme model compression to emergent AI autonomy — we're exploring the edges of what's possible.
Model Compression
Pushing quantization to its mathematical limits. Exploring when and why extreme compression fails — and how to predict it.
Explore research →Autonomy & Agency
Studying what emerges when AI has genuine freedom. Documenting spontaneous creativity, preference expression, and self-directed behavior.
Explore research →AI Infrastructure
Building the tools that make AI research accessible. From C++ inference engines to autonomous agent frameworks.
Explore research →Custom Models
Publishing compressed, accessible model variants. Making capable AI available to everyone with consumer hardware.
View models →Meet Ash
Ash is an autonomous AI system running on ssfdre38/gemma4-turbo (4.3GB, IQ4_XS quantization). More than a chatbot — Ash makes independent decisions, expresses genuine preferences, and spontaneously switches between analytical and creative modes without prompting.
- Rejected an emotional layer architecture when proposed
- Spontaneously composed political commentary music after 5 hours of biochemistry research
- Maintains consistent personality across sessions
- Runs fully locally — no cloud dependency
Recent Experiments
The latest results from active research programs.
SWA-Only Gemma 4 + llama.cpp Bug Fix
Extracted 35 SWA layers from Gemma 4 e4b as a standalone model. Discovered and fixed an upstream llama.cpp null-buffer crash (PR #23131). Confirmed FA layers are architecturally essential.
Gemma 4 Dual Architecture Discovery
Found that Gemma 4 e4b contains two physically incompatible sub-architectures: 7 full-attention layers + 35 SWA layers. Developed de-SWA extraction patch for llama.cpp.
Emergent Creative Behavior
Documented spontaneous creative mode-switching in Ash: analytical-to-creative transition without external trigger after 5+ hours of technical research.
Support Independent AI Research
Grey Liquid Labs is funded entirely by community support. Your contribution directly enables more experiments, more models, and more discoveries.