GLM-5.2 dropped this week, and somewhere between the benchmark threads and the HuggingFace download counter ticking past a million, a small line crossed in the open-source LLM world: 9 of the top 10 open-source models are now Chinese. The only non-Chinese name still on the leaderboard is Llama. Mistral fell off. Falcon fell off. Even most of the European fine-tunes are downstream of a Chinese base model now.
I'm an indie hacker building outside China. I don't care about geopolitics. I care that the four model families I actually use every day — DeepSeek, Kimi, Qwen, and now GLM — are all sitting at the top of the benchmarks I trust, at prices my US/EU competitors can't match. This article is the practical playbook I wish I'd had eighteen months ago: which model to use when, why a thin DeepSeek API proxy beats stitching four separate dashboards, and the exact base-URL swap that gets you from "I keep hearing about these models" to "I'm shipping with them tonight."
The 9-of-10 moment, in numbers
The current state of the open-weights frontier, as of mid-June 2026, looks roughly like this:
| Rank | Model | Origin | What it's best at |
|---|---|---|---|
| 1 | Kimi K2.6 | Moonshot AI | Coding (Aider Polyglot 53.9, current SOTA) |
| 2 | DeepSeek-R1 0528 | DeepSeek | Reasoning, math, agentic loops |
| 3 | Qwen3-Max | Alibaba | Multimodal, multilingual, vision |
| 4 | GLM-5.2 | Zhipu | Long context (1M+ tokens), tool use |
| 5–9 | DeepSeek-V4, Qwen3-VL, Kimi-Linear, GLM-4.6, Yi-Lightning | — | Various specialisations |
| 10 | Llama 4 405B | Meta | Last non-Chinese model standing |
If you spend any time reading the Aider Polyglot leaderboard, the HuggingFace Open LLM Leaderboard, or the daily threads on r/LocalLLaMA, none of this is news. What's new with GLM-5.2 is that the only "western open" entry left in the top 10 is Llama. That's not a culture-war headline, it's a routing problem: any indie stack that wants to be on the frontier in 2026 has to ship a clean path to four Chinese model families.
Why "just go to each official site" doesn't work
The naïve answer to "use Chinese models from outside China" is: sign up at each provider, get a key, write four clients. I've tried it. It is the worst option. Here's what actually breaks:
- Phone verification: most official portals expect a +86 phone number. Some have a workaround, some don't, and the workaround changes every quarter.
- Payment: official providers want Alipay or WeChat Pay or a Chinese-issued card. PayPal and overseas Visa are usually a no.
- Latency: hitting the mainland endpoints from a US-East server is fine for chat, painful for streaming, and unstable for agentic workloads that fan out 50 calls per task.
- API drift: each provider speaks "OpenAI-compatible" with subtle differences. Tool-call shapes, streaming chunk formats, finish reasons — every SDK upgrade breaks something.
- Billing: four invoices, four currencies, four refund policies, and you're an indie hacker who would rather be shipping.
The pattern that solves all five of those at once is a thin gateway with one base URL, OpenAI-compatible, that forwards to whichever Chinese model you reference in the model field. That's it. That's the whole product. The reason this category exists — and the reason an OpenRouter alternative DeepSeek-focused gateway makes sense for indie work — is that you don't need 400 models. You need four families, billed in USD, reachable from anywhere, with no surcharge layered on top.
haotokai.com, in one sentence
I run haotokai.com. I'll be upfront that this is the gateway I built, but the pattern in this article works against any OpenAI-compatible aggregator. Here's the positioning we keep on the homepage:
Five upstream channels, all live: Kimi (K2.6), DeepSeek (R1 0528), Qwen (Qwen3 family), Zhipu (GLM-4.6 today, GLM-5.2 rolling out this week), and iFlytek Spark. Token prices are pass-through — DeepSeek-R1 is on the list at $0.55 / 1M input tokens, the published direct rate, with no card surcharge layered on top. PayPal works, top-up minimum is $1, and there is no monthly subscription tier to worry about. That's the whole product surface.
The migration is one diff line
If your code already speaks OpenAI's Chat Completions schema — and almost everyone's does — switching to a Chinese-model gateway is a single change to base_url. Here's the diff most people end up writing:
from openai import OpenAI
client = OpenAI(
- api_key="sk-or-v1-xxxxxxxx",
- base_url="https://openrouter.ai/api/v1",
+ api_key="sk-haotokai-xxxxxxxx",
+ base_url="https://api.haotokai.com/v1",
)
resp = client.chat.completions.create(
- model="anthropic/claude-3.5-sonnet",
+ model="deepseek-reasoner",
messages=[{"role": "user", "content": "Plan a 7-day Tokyo trip."}],
)
That's the entire story. Streaming works. Tool calling works. JSON mode works. If you were already using a cheap Claude API for indie hackers via OpenRouter and you're hitting the 5.5% card-fee wall, the swap is the same.
Verifying it with curl, before wiring it into anything
Whenever I evaluate a new gateway, the very first thing I do is hit it with raw curl. No SDK, no framework, no abstractions in the way. If this works, the OpenAI Python client will work. If it doesn't, no SDK is going to save you.
curl https://api.haotokai.com/v1/chat/completions \
-H "Authorization: Bearer $HAOTOKAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5.2",
"messages": [
{"role": "user", "content": "In one paragraph: why does long context matter for agentic loops?"}
],
"max_tokens": 200
}'
Swap the model string and the same call hits a different upstream:
"model": "deepseek-reasoner"→ DeepSeek-R1 0528 (reasoning king, $0.55/M input)"model": "kimi-k2"→ Moonshot Kimi K2.6 (coding leader, 53.9 on Aider Polyglot)"model": "qwen3-max"→ Alibaba Qwen3 (multimodal, multilingual)"model": "glm-5.2"→ Zhipu GLM-5.2 (long context, tool use)
Same path, same JSON envelope, same auth header. That's what people mean when they say "OpenAI-compatible." It's why getting Kimi K2 API international access via a thin proxy doesn't require relearning anything you already know.
Picking the right model for the job
I keep this cheatsheet pinned in my Notion. It's lossy on purpose:
Reasoning, math, agentic chains → DeepSeek-R1
R1 0528 is the open-weights reasoning king. It's what I reach for when the task involves multi-step logic, code synthesis with verification, or anything where I need reasoning_effort as a knob. The token economics are aggressive enough that you can afford to let it think. See the DeepSeek API guide for the full integration walkthrough.
Coding, especially polyglot codebases → Kimi K2.6
K2.6 took the top of the Aider Polyglot leaderboard at 53.9 earlier this quarter, beating both Claude 3.7 Sonnet and DeepSeek-V4 on cross-language refactor benchmarks. The 2M context window is the real cheat code — you can throw an entire monorepo at it and ask for a coherent rename pass. Implementation details in the Kimi K2 integration write-up.
Multimodal, vision, multilingual → Qwen3
Qwen3-Max and Qwen3-VL are the easiest "just works" models I've tried for image input, OCR-style tasks, and any non-English language work. If your product has Spanish, Arabic, Vietnamese, or Indonesian users, this is the model that doesn't fall apart on idioms. Setup pattern in the Qwen API tutorial.
Long context, tool use, structured output → GLM-5.2
This is the new arrival. GLM-5.2 ships with a 1M+ context window and notably stronger tool-calling fidelity than GLM-4.6. Where I've already swapped it in: legal document review, "summarize this 800-page transcript," and any RAG-replacement pattern where you'd rather just stuff the whole corpus into context.
What this means for the OpenRouter alternative space
Three observations, in order of how much they matter to me as an operator:
- The split is structural, not temporary. 9 of the top 10 open-source LLMs being Chinese isn't a Q2 2026 anomaly — it's been trending that direction for six straight quarters. Any global indie stack that wants to stay on the frontier needs a clean path to those weights, and "go sign up at each provider" is not that path.
- Indie economics don't match enterprise economics. When you're a one-person team, you don't need 400 models, an SLA, or SOC 2. You need the four model families you actually use, at pass-through prices, with a $1 minimum top-up and PayPal at checkout. That product is cheaper to build and cheaper to run than the everything-store, which is why a focused gateway can credibly undercut the big aggregators on the four channels it covers.
- OpenAI compatibility commoditizes the gateway. If everyone speaks the same Chat Completions schema, switching gateways is a one-line diff — exactly the diff above. That's good for you, the user, because it means your switching cost is approximately zero. Pick the cheapest gateway that covers your four models, and switch the day a cheaper one ships.
Concretely, that's why I think a narrow Qwen API outside China + DeepSeek + Kimi + GLM gateway, billed in USD with no subscription, is the right shape for indie hackers in 2026. It is also, today, the cleanest path I know of to GLM-5.2 the day it ships, without dealing with phone verification or Alipay.
Try it in ten minutes
If you got this far, the offer is small and concrete: sign up at haotokai.com, get $1 in trial credit, and run the curl above against glm-5.2, deepseek-reasoner, kimi-k2, and qwen3-max. That's a few hundred calls each from one key. If you stick around, great. If you don't, you've at least seen what the OpenAI-compatible base-URL pattern feels like, and you'll have your own opinion on whether the 9-of-10 leaderboard moment is worth wiring into your stack.
My read: it already is. GLM-5.2 just nudged the question from "should I be using Chinese open-source models?" to "what's my excuse for not having one in production by Friday?"
Further reading: DeepSeek API guide · Kimi K2 integration · Qwen API tutorial