BitCPM4 : The era of 1 bit LLMs is here
BitCPM4 vs BitNet
A few months back, Microsoft released Bitnet 1.5b model and alongside the revolutionary paper “The Era of 1-Bit LLM”, where they introduced that LLMs can be compressed highly by storing weights as ternary weights in one bit.
https://medium.com/media/96cc422a7ff3d2ff6cc8902a279d5e9c/href
Now, a worthy opponent to BitNet1.5B is out and it’s BitCPM4, Another LLM which can store its weights/parameters as ternary weights in one bit. Though the way these models are designed is completely different, but they serve the same.
My 2nd book on “Model Context Protocol” aka MCP Servers is live now !!
Model Context Protocol: Advanced AI Agents for Beginners (Generative AI books)
What is BitCMP4?
BitCPM4 is one of the variants of mini CPM 4 model series which has been released quite recently, specialised for edge devices. Unlike Bitnet 1.5b which is trained from scratch on ternary weights, BitCPM4 is more of a quantized version of MiniCPM4.
https://medium.com/media/b019687782df02a841e56a0b047760b4/href
BitCPM4 doesn’t train from scratch. It pulls a fast one — start from a pretrained FP8 model, and gently coerce it into becoming ternary via a two-stage training recipe.
Here’s the gist:
- Stage 1: Fine-tune the FP8 checkpoint with a learning rate of 1e-2.
- Stage 2: Apply QAT (Quantization-Aware Training) at a reduced learning rate 5e-3, with a warmup for stability.
- Token Budget: Only about 40% of training tokens are spent on QAT.
No need for 4 trillion tokens. No need for thousands of A100s GPUs. Just efficient reuse.
Benchmarks

BitCPM4 may be modest in size (0.5B and 1B models), but it punches above its weight.
- BitCPM4–0.5B outperforms Qwen3–0.6B on standard benchmarks like MMLU and C-EVAL.
- BitCPM4–1B keeps pace with BitNet-2B, which was trained using 10x more tokens.
- Training cost? A mere 10% of what BitNet burned through.
This isn’t just good — it’s suspiciously efficient.
BitCPM4 vs BitNet 1.5B
BitCPM4 is looking better than BitNet 1.5B even though it’s a quantized version of MiniCPM4. Below is a detailed difference between the two models, and why you should opt for BitCPM4 asap.

Here’s the catch: BitNet is great if you’ve got enterprise-scale infrastructure and want every last drop of edge inference performance. BitCPM4 is for the rest of us — who want solid, deployable low-bit LLMs without frying the cloud budget.
What Makes BitCPM4 Work?
- It’s modular. Works with pretrained checkpoints, so you don’t need to reinvent the model.
- It’s scalable. Start small. Move to larger models as quantization techniques mature.
- It’s efficient. No 4T-token marathon runs — just a surgical strike with QAT.
- It’s flexible. Plays nice with existing toolchains, unlike BitNet which needs a custom runtime.
A few problems though
The smallest BitCPM4 models struggle with math and code-heavy benchmarks. Not shocking — ternary weights plus small model size is a rough combo for symbol-heavy reasoning. But the roadmap is clear: scale up, keep the QAT-efficient tricks, and you’ve got a lean, low-bit model that doesn’t compromise on brains.
Also, operator support for ultra-low-bit math is still lacking in most frameworks. Until those catch up, BitCPM4 can’t flex its full muscle.
Final Thought
If BitNet is the Ferrari of ternary models — fast, slick, custom-built — BitCPM4 is the Toyota that runs forever, sips fuel, and gets the job done without drama. In a world chasing 1-bit dreams, BitCPM4 is the practical low-bit path we didn’t know we needed.
The model is open-sourced and can be explored on HuggingFace
openbmb/BitCPM4-1B · Hugging Face
BitCPM4 : The era of 1 bit LLMs is here was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.