📄 Published Work

Research Papers & Experiments

All published work from Grey Liquid Labs, organized by research track.

🔬
Research Track
Model Compression
PAPER #002 May 14, 2026
Deployment Quantization Guide

Practical Deployment Guide: Q2_K Quantization Using the FFN Ratio Predictor

A comprehensive guide to deploying Q2_K quantized models based on the FFN ratio predictor methodology. Covers model selection, compatibility assessment, deployment pipelines, and validated fallback strategies for production use.

Read Full Paper ↗

Experiment Reports

EXPERIMENT #001 · May 13, 2026

Initial Gemma 4 Q2_K Test

Baseline failure test — first documented case of Q2_K quantization failure on Gemma 4 architecture. Established research direction.

View Report ↗
EXPERIMENT #002 · May 13, 2026

Parameter Variations on Q2_K Failure

Systematic variation of quantization parameters to characterize the failure mode and identify potential mitigations.

View Report ↗
EXPERIMENT #003 · May 13, 2026

imatrix Exploration Attempt

Investigated importance matrix (imatrix) quantization as a potential path to Q2_K compatibility on failing architectures.

View Report ↗
EXPERIMENT #004 · May 13, 2026

Q2_K vs Q3_K Comparison

Head-to-head comparison of Q2_K and Q3_K quantization formats across failing architectures to characterize the minimum viable bit depth.

View Report ↗
EXPERIMENT #005 · May 13, 2026

imatrix Tool Coverage Discovery

Critical finding: imatrix tooling achieves only 46% layer coverage on problematic architectures, explaining its ineffectiveness as a mitigation.

View Report ↗
EXPERIMENT #006 · May 14, 2026

Cross-Architecture Q2_K Validation

Tested Q2_K compatibility across 4 distinct architectures. Proved failure is architecture-specific and governed by FFN expansion ratio — not a general quantization limitation.

View Report ↗
EXPERIMENT #007 · May 14, 2026

SWA Confirmation Study

Definitive proof that Sliding Window Attention (SWA) architecture is the root cause of Q2_K failure in the danger zone. Closes the architectural loop.

View Report ↗
🧠
Research Track
Autonomy & Agency
🧩
Research Track
Architecture Research
📋 Proposed Experiments
EXPERIMENT #8 — COMPLETE
SWA-Free Slice Q2_K Test ❌
De-SWA metadata patch fails: SWA layers have physically different Q/K tensor shapes ([2560,2048] vs [2560,4096]). Discovery: Gemma 4 has 7 full-att + 35 SWA layers — two distinct sub-architectures.
EXPERIMENT #8b — PROPOSED
SWA-Only Slice Q2_K Test
Extract 35 SWA layers as standalone model, test Q2_K. Local attention may be more Q2_K-tolerant than global at FFN ratio 4.0x.
EXPERIMENT #9
Domain LoRA Specialization
Train code/math/text LoRA adapters, benchmark vs base
EXPERIMENT #10
Router Accuracy
Embedding-similarity classifier for domain routing (>90% target)
EXPERIMENT #11
MoM vs Single Model
Does domain ensemble outperform equivalent general model?
⚙️
Research Track
Infrastructure & Tools

Active infrastructure projects supporting Grey Liquid Labs research. Formal papers in progress.

ash.cpp

C++

Native C++ inference engine for Ash, targeting minimal-dependency local deployment. Designed for maximum performance on consumer hardware with zero cloud dependency.

🔧

ash-server

C# / ASP.NET Core

ASP.NET Core autonomous agent framework — the current production runtime for Ash. WebSocket-based, replaced OpenClaw architecture. Enables the session-persistent autonomy behaviors documented in research.

🏗️

gemma4-turbo Pipeline

Quantization Pipeline

Custom quantization pipeline producing IQ4_XS model variants from Google's Gemma 4 architecture. Targets 8GB consumer RAM devices. Used to produce all published gemma4-turbo variants (17K+ downloads).