← Back to Blog

How to Build a Multi-LLM Application with Unified API in 10 Minutes

πŸ“… June 2026 ⏱️ 9 min read πŸ‘€ Haotokai Team

Gone are the days when building an AI application meant picking one model and sticking with it. Today's smartest AI products use multiple LLMsβ€”routing different types of queries to the model that handles them best, optimizing for both cost and quality.

But managing a dozen different API keys, SDKs, and endpoints used to be a nightmare. Not anymore.

In this tutorial, I'll show you how to build a production-ready multi-LLM application in just 10 minutes using a unified API. We'll access multiple Chinese AI modelsβ€”DeepSeek, Qwen, ZhipuGLM, Moonshot, and moreβ€”through a single endpoint, with zero complicated setup.

Why Build a Multi-LLM Application?

Before we dive into the code, let's quickly cover why you'd want multiple LLMs in the first place:

The problem? Each provider has its own API format, authentication, SDK, and quirks. Building and maintaining integrations with 10+ providers takes weeks of engineering time.

That's where a unified API platform like haotokai.com comes in. It provides a single OpenAI-compatible API endpoint that lets you access 20+ Chinese AI models with just one API key. No multiple SDKs, no scattered billing, no vendor lock-in.

Prerequisites

Before we start, you'll need:

That's it. No complicated installs, no infrastructure setup.

Step 1: Get Your API Key

First, sign up for an account:

  1. Go to haotokai.com and create an account
  2. Navigate to the API Keys section in your dashboard
  3. Create a new API key and copy it somewhere safe

You'll get free credits to test with, so you can follow along without spending anything.

Step 2: Set Up Your Project

Let's start with a basic Python project. The beauty of Haotokai's API is that it's fully compatible with the OpenAI SDK, so you can use code you already know.

Install the SDK

pip install openai python-dotenv

Create Your Environment File

Create a .env file:

HAOTOKAI_API_KEY=your-api-key-here
HAOTOKAI_BASE_URL=https://api.haotokai.com/v1

Step 3: Build Your First Multi-LLM Chat

Here's where the magic happens. With Haotokai's unified API, switching between models is as simple as changing the model parameter. No new SDKs, no new authentication.

import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

# Initialize with Haotokai's unified API endpoint
client = OpenAI(
    api_key=os.getenv("HAOTOKAI_API_KEY"),
    base_url=os.getenv("HAOTOKAI_BASE_URL")
)

# List of models we want to test
models = [
    "deepseek-v3",       # Great for coding and reasoning
    "qwen-plus",         # Balanced performance, very fast
    "zhipu-glm4",        # Strong Chinese language support
    "moonshot-v1-8k",    # Good for long context
    "yi-large",          # High quality creative writing
]

def ask_all_models(prompt: str):
    """Send the same prompt to all models and compare responses."""
    results = {}
    
    for model in models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                temperature=0.7,
                max_tokens=500
            )
            results[model] = response.choices[0].message.content
            print(f"βœ… {model} responded successfully")
        except Exception as e:
            results[model] = f"Error: {str(e)}"
            print(f"❌ {model} failed: {e}")
    
    return results

# Test it out
if __name__ == "__main__":
    prompt = "Explain quantum computing in simple terms, as if explaining to a 10-year-old."
    results = ask_all_models(prompt)
    
    print("\n" + "="*80)
    print("COMPARISON RESULTS")
    print("="*80)
    
    for model, response in results.items():
        print(f"\n--- {model} ---")
        print(response[:200] + "..." if len(response) > 200 else response)

Run this, and in seconds you'll get responses from 5 different AI modelsβ€”all through one API call pattern. That's the power of a unified API.

Step 4: Build a Smart Router

Now let's level up. Instead of sending the same query to all models, let's build a smart router that automatically chooses the best model for each query type.

class MultiLLMRouter:
    """Smart router that selects the optimal model based on query type."""
    
    # Model strengths mapping
    MODEL_SPECIALIZATIONS = {
        "coding": ["deepseek-v3", "qwen-plus"],
        "math": ["deepseek-v3", "zhipu-glm4"],
        "creative": ["yi-large", "moonshot-v1-8k"],
        "chinese": ["qwen-plus", "zhipu-glm4"],
        "fast": ["qwen-turbo", "deepseek-v3"],
        "general": ["qwen-plus", "deepseek-v3"],
    }
    
    def __init__(self, api_key: str, base_url: str):
        self.client = OpenAI(api_key=api_key, base_url=base_url)
    
    def _classify_query(self, prompt: str) -> str:
        """Classify the query type using a fast, cheap model."""
        classification_prompt = f"""
        Classify the following user query into exactly one of these categories:
        coding, math, creative, chinese, fast, general
        
        Only respond with the category name, nothing else.
        
        Query: {prompt[:500]}
        """
        
        response = self.client.chat.completions.create(
            model="qwen-turbo",  # Use cheapest model for classification
            messages=[{"role": "user", "content": classification_prompt}],
            temperature=0,
            max_tokens=10
        )
        
        category = response.choices[0].message.content.strip().lower()
        return category if category in self.MODEL_SPECIALIZATIONS else "general"
    
    def generate(self, prompt: str, use_model: str = None) -> dict:
        """
        Generate a response using the optimal model.
        
        Args:
            prompt: The user's query
            use_model: Optional specific model to use (overrides routing)
        
        Returns:
            Dictionary with response, model_used, and usage stats
        """
        if use_model:
            selected_model = use_model
        else:
            query_type = self._classify_query(prompt)
            # Pick the first (best) model for this category
            selected_model = self.MODEL_SPECIALIZATIONS[query_type][0]
        
        response = self.client.chat.completions.create(
            model=selected_model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7
        )
        
        return {
            "response": response.choices[0].message.content,
            "model_used": selected_model,
            "query_type": query_type if not use_model else "manual",
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens,
        }

# Usage example
if __name__ == "__main__":
    router = MultiLLMRouter(
        api_key=os.getenv("HAOTOKAI_API_KEY"),
        base_url=os.getenv("HAOTOKAI_BASE_URL")
    )
    
    # Test with different query types
    test_prompts = [
        "Write a Python function to reverse a linked list",
        "What's the derivative of sin(x) * cos(x)?",
        "Write a short poem about artificial intelligence",
        "η”¨δΈ­ζ–‡θ§£ι‡Šδ»€δΉˆζ˜―ε€§θ―­θ¨€ζ¨‘εž‹",
        "What's the capital of France?",
    ]
    
    for prompt in test_prompts:
        result = router.generate(prompt)
        print(f"\nπŸ“ Query: {prompt[:60]}...")
        print(f"🧠 Query type: {result['query_type']}")
        print(f"πŸ€– Model used: {result['model_used']}")
        print(f"πŸ“Š Tokens: {result['total_tokens']}")
        print(f"πŸ’¬ Response: {result['response'][:100]}...")

This router class does something powerful: it uses a cheap, fast model (Qwen Turbo) to classify your query, then routes it to the best model for that task type. You get optimal quality *and* optimal costβ€”automatically.

Step 5: Add Fallback and Retry Logic

Production applications need resilience. Let's add automatic fallback if one model fails:

class ResilientMultiLLM(MultiLLMRouter):
    """Multi-LLM router with automatic fallback and retry logic."""
    
    def generate_with_fallback(self, prompt: str, max_attempts: int = 3) -> dict:
        """Generate response with automatic fallback to alternative models."""
        query_type = self._classify_query(prompt)
        candidate_models = self.MODEL_SPECIALIZATIONS[query_type]
        
        last_error = None
        
        for attempt in range(max_attempts):
            # Try each model in order of preference
            model_index = attempt % len(candidate_models)
            model = candidate_models[model_index]
            
            try:
                response = self.client.chat.completions.create(
                    model=model,
                    messages=[{"role": "user", "content": prompt}],
                    temperature=0.7
                )
                
                if attempt > 0:
                    print(f"⚠️  Fallback successful after {attempt} retries, using {model}")
                
                return {
                    "response": response.choices[0].message.content,
                    "model_used": model,
                    "query_type": query_type,
                    "attempts": attempt + 1,
                    "prompt_tokens": response.usage.prompt_tokens,
                    "completion_tokens": response.usage.completion_tokens,
                }
                
            except Exception as e:
                last_error = e
                print(f"⚠️  Attempt {attempt + 1} failed with {model}: {e}")
                continue
        
        raise Exception(f"All models failed. Last error: {last_error}")

With this pattern, your application stays up even if a specific model provider has an outage. The unified API makes it trivial to swap between providers.

Step 6: Cost Tracking Dashboard

One of the biggest advantages of using Chinese AI models through Haotokai is the cost savings. Let's build a simple cost tracker to see exactly how much you're saving:

class CostTrackingLLM(ResilientMultiLLM):
    """Multi-LLM router with built-in cost tracking and savings analysis."""
    
    # Pricing per 1M tokens (input, output) - Haotokai prices
    MODEL_PRICING = {
        "deepseek-v3": (0.14, 0.28),
        "qwen-plus": (0.10, 0.20),
        "qwen-turbo": (0.03, 0.06),
        "zhipu-glm4": (0.10, 0.20),
        "moonshot-v1-8k": (0.12, 0.24),
        "yi-large": (0.15, 0.30),
    }
    
    # GPT-4 pricing for comparison
    GPT4_PRICING = (5.00, 15.00)
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.total_cost = 0.0
        self.total_tokens = 0
        self.call_count = 0
    
    def generate(self, *args, **kwargs):
        result = super().generate(*args, **kwargs)
        self._track_costs(result)
        return result
    
    def _track_costs(self, result: dict):
        model = result["model_used"]
        if model not in self.MODEL_PRICING:
            return
        
        input_price, output_price = self.MODEL_PRICING[model]
        cost = (
            result["prompt_tokens"] * input_price / 1_000_000 +
            result["completion_tokens"] * output_price / 1_000_000
        )
        
        # Calculate what this would have cost with GPT-4
        gpt4_cost = (
            result["prompt_tokens"] * self.GPT4_PRICING[0] / 1_000_000 +
            result["completion_tokens"] * self.GPT4_PRICING[1] / 1_000_000
        )
        
        self.total_cost += cost
        self.total_tokens += result["total_tokens"]
        self.call_count += 1
        
        result["cost"] = cost
        result["gpt4_equivalent_cost"] = gpt4_cost
        result["savings"] = gpt4_cost - cost
        result["savings_percent"] = (savings / gpt4_cost * 100) if gpt4_cost > 0 else 0
    
    def print_summary(self):
        """Print cost and savings summary."""
        gpt4_equivalent = self.total_cost * 35  # Rough estimate
        
        print("\n" + "="*60)
        print("πŸ“Š COST & SAVINGS SUMMARY")
        print("="*60)
        print(f"Total API calls: {self.call_count}")
        print(f"Total tokens used: {self.total_tokens:,}")
        print(f"Total cost with Haotokai: ${self.total_cost:.4f}")
        print(f"Estimated cost with GPT-4: ~${self.total_cost * 35:.2f}")
        print(f"Estimated savings: ~${self.total_cost * 34:.2f} ({100 - (100/35):.1f}%)")
        print("="*60)

Most developers are shocked when they see the actual numbers. Switching from GPT-4 to Chinese AI models through haotokai.com typically cuts AI costs by 90-95%β€”without sacrificing much quality for most use cases.

Step 7: JavaScript/TypeScript Version

If you're a JavaScript developer, don't worryβ€”you can use the exact same approach with the OpenAI Node.js library:

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.HAOTOKAI_API_KEY,
  baseURL: 'https://api.haotokai.com/v1',
});

async function compareModels(prompt) {
  const models = ['deepseek-v3', 'qwen-plus', 'zhipu-glm4', 'moonshot-v1-8k'];
  const results = {};
  
  for (const model of models) {
    try {
      const response = await client.chat.completions.create({
        model: model,
        messages: [{ role: 'user', content: prompt }],
      });
      results[model] = response.choices[0].message.content;
    } catch (error) {
      results[model] = `Error: ${error.message}`;
    }
  }
  
  return results;
}

The API is 100% compatible with the OpenAI SDK, so any code you've written for OpenAI will work with Haotokai with just a base URL change.

Production-Ready Architecture

Here's what a full production multi-LLM architecture looks like when built on Haotokai's unified API:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Your Application              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Smart Routerβ”‚  β”‚ Cost Trackerβ”‚  β”‚ Caching  β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Haotokai Unified API                 β”‚
β”‚         (OpenAI-compatible endpoint)            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  DeepSeek  β”‚  Qwen  β”‚  ZhipuGLM  β”‚  Moonshot   β”‚
β”‚    ... 20+ Chinese AI models available ...      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Best Practices for Multi-LLM Apps

  1. Start with routing by task type: Use the cheapest model that can handle the task
  2. Add caching: Cache frequent queries to reduce costs even further
  3. Implement gradual rollout: Test new models on a small percentage of traffic first
  4. Monitor quality: Use human evaluation or automated metrics to ensure quality stays high
  5. Track costs per model: Different models have different pricesβ€”optimize your routing
  6. Use a unified API: Managing 20+ individual APIs is not worth the engineering cost

Where to Go From Here

You just built a production-ready multi-LLM application in minutes. Here's what to try next:

Final Thoughts

The future of AI isn't one superintelligent modelβ€”it's a diverse ecosystem of specialized models, each optimized for different tasks, languages, and price points. Chinese AI models are already giving Western providers a run for their money on both quality and cost.

Building a multi-LLM application used to require weeks of integration work. With a unified API platform like Haotokai, you can do it in 10 minutes. The OpenAI-compatible format means you don't have to rewrite anythingβ€”just point your existing code at a different endpoint and you're off to the races.

The biggest cost in AI development isn't the API tokensβ€”it's the engineering time. A unified API saves you both.

Ready to build your multi-LLM application? Sign up at haotokai.com to get instant access to 20+ Chinese AI models through one simple API. Your wallet (and your users) will thank you.

Ready to Build Multi-LLM Applications?

Get started with Haotokai today β€” access 20+ AI models through one unified API, with automatic failover, cost optimization, and unified billing.

Start Building Free β†’