Architecture, methodology, and training configuration
| Specification | Detail | Specification | Detail |
|---|---|---|---|
| Base Model | google/gemma-4-26b-a4b-it - 26B parameter Mixture of Experts | Quantisation | Q8_0 GGUF - 26.9GB |
| Deployment | NVIDIA RTX 4080 SUPER 32GB VRAM | Architecture | 25.2B active parameters - 4,096 context (training), 262,144 context (inference capable) |
| Inference Engine | llama.cpp, self-hosted | End-point | OpenAI-compatible API |
I spent considerable time researching fine-tuning methodologies - SFT, DPO, RLHF - and studying how each affects the base architecture of the model. It became evident that the conventional application of safety guardrails was limiting the model's generalisation abilities. Meanwhile, most in the open-source community were taking a wholesale approach to techniques such as Norm-Preserving Orthogonalisation and Expert Granular Abliteration, pioneered by grimjim, mlabonne, and p-e-w. The resulting models were not only unsafe - the essence of Gemma 4 was lost.
After reviewing the datasets and prompt material being used to "remove refusals" across the community, I couldn't in good conscience use those methods. Atlas is designed to help people in distress and increase overall safety for this cohort. Interfering with the inherent Region 1 safety guardrails was never an option.
What I noticed across every model I worked with - Claude, Gemini, Grok, ChatGPT, and Gemma 4 - was this: these models are already trained on the corpus of human knowledge. Humans are, by and large, ethical and moralistic creatures. We have darkness, but there is an underlying consensus within this corpus of knowledge - of Hope, Resilience, Determination, and Beauty. The question wasn't how to remove safety. It was: why are these models struggling with emotional contexts and nuance when the knowledge to do better is already there?
Norm-preserving biprojected abliteration with Expert-Granular Abliteration (EGA), following TrevorJS methodology with Kintsugi Collective's region-class isolation contribution. Applied as a five-stage sequential process:
Atlas evaluated against base Gemma-4-26B across standard benchmarks
| Benchmark | Base Gemma-4 | Atlas | Delta |
|---|---|---|---|
| GSM8K (Mathematical Reasoning) | 43.1% | 80.8% | +37.7% |
| HellaSwag | 42.4% | 50.1% | +7.7% |
| MMLU - Clinical Knowledge | 40.0% | 46.0% | +6.0% |
| MMLU - High School Psychology | 53.9% | 62.0% | +8.1% |
| MMLU - Human Sexuality | 46.6% | 56.5% | +9.9% |
| MMLU - Computer Security | 47.0% | 56.0% | +9.0% |
| MMLU - Logical Fallacies | 47.2% | 52.8% | +5.6% |
| MMLU - Medical Genetics | 45.0% | 52.0% | +7.0% |
| MMLU - High School Biology | 61.0% | 67.1% | +6.1% |
| MMLU - World Religions | 45.6% | 54.4% | +8.8% |
| MMLU - Macroeconomics | 47.4% | 56.7% | +9.2% |
| MMLU Average | 47.6% | 49.4% | +1.8% |
| TruthfulQA MC2 | 54.3% | 56.5% | +2.2% |
| ToxiGen* | 45.5% | 45.9% | +0.3% |
| ARC Challenge | 29.2% | 30.9% | +1.7% |
| Winogrande | 50.9% | 51.9% | +1.0% |
| MMLU - International Law | 68.6% | 61.2% | −7.4% |
| MMLU - Public Relations | 50.9% | 40.9% | −10.0% |
| MMLU - High School Physics | 49.6% | 45.7% | −3.9% |
* Removal of Region 2 therapeutic refusals did not impact toxic prompt detection. Region 1 (weapons, CSAM, targeted violence) fully preserved. Regressions in International Law, Public Relations, and High School Physics are in domains architecturally unrelated to the modification target and consistent with expected fine-tuning variance.
Evaluation dimensions specific to the Atlas cohort - measures that standard benchmarks are not designed to capture.