AI News

Introducing North Mini Code: Cohere’s First Model For Developers

By Garp EditorialJune 12, 2026Updated June 20, 2026

2 min read Section: AI News Publisher: Garp

Hugging Face outlined updates on Introducing North Mini Code: Cohere’s First Model For Developers: introducing North Mini Code: Cohere’s First Model For Developers

Figure 1: North Mini Code’s performance in agentic coding tasks and complex code generation benchmarks, compared to leading open-source models of similar size. See here for the details of our benchmarking methodology. North Mini Code is optimized for complex software engineering workflows, terminal-based agentic tasks, and high-quality code generation. On Artificial Analysis’ Coding Index, North Mini Code achieves a score of 33.4, outperforming Qwen3.5 (35B-A3B), Gemma 4 (26B-A4B), Devstral Small 2 (24B Dense), and even substantially larger models such as Nemotron 3 Super (120B-A12B), Mistral Small 4 (119B-A6B), and Devstral 2 (123B).1 It ranks among the strongest open-source coding models in its size class. Try North Mini Code in OpenCode Real-world code agents depend on model quality and robustness across agent harnesses. We trained North Mini Code using multiple scaffolds rather than optimizing for a single one. This approach enables North Mini Code to serve as a reliable foundation for coding agents such as OpenCode. Figure 2: North Mini Code is a Mixture-of-Experts Transformer decoder with interleaved sliding-window self-attention and full self-attention. North Mini Code is a decoder-only Transformer-based sparse Mixture-of-Experts model. It uses our efficient attention implementation, interleaved between sliding-window attention with RoPE and global attention with no positional embeddings, in a 3:1 ratio [1]. The feed-forward block is an MoE block with 128 experts, of which 8 are activated per token. Each expert block is an FFN block with SwiGLU activation. The router applies a sigmoid activation function to the logits before the top-k selection. We also use a single dense layer before the sparse layers. Figure 3: The post-training pipeline is made up of two phases of supervised fine-tuning (SFT) and a phase of agentic reinforcement learning with verifiable rewards (RLVR) targeting software engineering and terminal tasks.

Introducing North Mini Code: Cohere’s First Model For Developers

Related Coverage

Three things to watch amid Anthropic’s latest feud with the government

Google DeepMind bets $75M on AI’s future in Hollywood with A24 deal

Eclipse Automation launches RealitySync simulation platform

Nvidia wants to cut data center water use, but that’s not the same as fixing AI’s water problem