Qwen API Tutorial: Building Apps with Alibaba's Qwen Models

๐Ÿ“… June 10, 2025 โฑ๏ธ 15 min read ๐Ÿ‘ค Haotokai Team

Alibaba's Qwen (้€šไน‰ๅƒ้—ฎ) series of large language models has emerged as a powerful alternative to Western AI models, offering impressive capabilities at competitive prices. With strong performance in both Chinese and English, extensive context windows, and specialized variants for different use cases, Qwen has become a go-to choice for developers building applications for global markets.

In this tutorial, we'll dive deep into the Qwen API, exploring the different models available, understanding their strengths, and walking through practical code examples you can use in your own applications.

Table of Contents

What is Qwen (้€šไน‰ๅƒ้—ฎ)?

Qwen (pronounced "chwen"), short for ้€šไน‰ๅƒ้—ฎ (Tลngyรฌ Qiฤnwรจn), is Alibaba Cloud's family of large language models. Developed by the Tongyi Lab of Alibaba Group, Qwen models are trained on massive multilingual datasets and offer state-of-the-art performance across a wide range of natural language processing tasks.

What sets Qwen apart from other models is its exceptional bilingual capability โ€” it's one of the few models that performs at a high level in both Chinese and English. This makes it ideal for applications targeting both Western and Chinese markets.

๐ŸŒ Bilingual Excellence

Top-tier performance in both Chinese and English, with strong multilingual support for many other languages.

๐Ÿ“ Large Context

Up to 128K token context window for processing long documents, entire codebases, and complex conversations.

โšก Fast & Efficient

Optimized inference with competitive pricing, making it cost-effective for production workloads.

๐Ÿงฉ Model Variety

Specialized models for chat, coding, math, multimodal understanding, and more.

Qwen Model Family: Which One to Choose?

Alibaba offers an extensive lineup of Qwen models. Here's a breakdown of the most important ones for developers:

Model Context Best For Key Strength
qwen2.5-72b-instruct 128K General-purpose, complex tasks Best overall quality
qwen2.5-32b-instruct 128K Balanced performance & cost Great value
qwen2.5-14b-instruct 128K High-volume applications Fast & affordable
qwen2.5-7b-instruct 128K Simple tasks, prototyping Ultra cost-effective
qwen2.5-coder-32b-instruct 128K Code generation & analysis Best for coding
qwen2.5-math-72b-instruct 128K Mathematical reasoning Best for math
qwen-vl-max 32K Vision-language tasks Multimodal understanding
qwen-long 1M+ Extremely long documents Massive context window

๐Ÿ’ก Pro Tip: Start with 32B

For most production applications, we recommend starting with qwen2.5-32b-instruct. It strikes an excellent balance between quality and cost โ€” often performing close to the 72B model at a fraction of the price. Scale up to 72B only if you find the 32B model insufficient for your use case.

Qwen 2.5 Improvements

The Qwen 2.5 series (released in late 2024) brought significant improvements over previous versions:

Getting Started with Qwen API

Access Options

There are two primary ways to access Qwen models via API:

  1. Alibaba DashScope (direct) โ€” Alibaba's official API platform
  2. Haotokai (unified API) โ€” Access Qwen alongside 50+ other models with one API key, PayPal support

For developers outside of China, we recommend Haotokai because it offers:

Quick Start with Haotokai

Here's how to get up and running with Qwen in minutes using Haotokai:

# 1. Get your API key from https://www.haotokai.com
# 2. Set it as an environment variable
export HAOTOKAI_API_KEY="your-api-key-here"

# 3. Make a test call
curl -X POST https://www.haotokai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $HAOTOKAI_API_KEY" \
  -d '{
    "model": "qwen2.5-32b-instruct",
    "messages": [{"role": "user", "content": "Hello, Qwen!"}]
  }'

Core API Concepts

API Endpoints

When using Haotokai's OpenAI-compatible interface, you'll primarily use these endpoints:

Endpoint Method Purpose
/v1/chat/completions POST Chat-based interactions (most common)
/v1/completions POST Text completion (legacy format)
/v1/embeddings POST Generate text embeddings
/v1/models GET List available models

Message Format

The chat completions endpoint uses a conversation format with message roles:

Key Parameters

Understanding these parameters will help you get the best results from Qwen:

Code Examples: Practical Qwen Integrations

Basic Python Integration

Here's a complete Python example using the popular requests library:

import os
import requests

class QwenClient:
    """Client for interacting with Qwen models via Haotokai API."""
    
    def __init__(self, api_key=None, base_url="https://www.haotokai.com/v1"):
        self.api_key = api_key or os.getenv("HAOTOKAI_API_KEY")
        self.base_url = base_url
        self.headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }
    
    def chat(self, messages, model="qwen2.5-32b-instruct", 
             temperature=0.7, max_tokens=2000, stream=False):
        """
        Send a chat completion request to Qwen.
        
        Args:
            messages: List of message dicts with 'role' and 'content'
            model: Qwen model name
            temperature: Sampling temperature
            max_tokens: Maximum tokens to generate
            stream: Whether to stream the response
            
        Returns:
            Response dict or generator if streaming
        """
        payload = {
            "model": model,
            "messages": messages,
            "temperature": temperature,
            "max_tokens": max_tokens,
            "stream": stream
        }
        
        if stream:
            return self._stream_chat(payload)
        
        response = requests.post(
            f"{self.base_url}/chat/completions",
            headers=self.headers,
            json=payload,
            timeout=60
        )
        response.raise_for_status()
        return response.json()
    
    def _stream_chat(self, payload):
        """Handle streaming responses."""
        import json
        
        payload["stream"] = True
        response = requests.post(
            f"{self.base_url}/chat/completions",
            headers=self.headers,
            json=payload,
            stream=True,
            timeout=60
        )
        response.raise_for_status()
        
        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    data = line[6:]
                    if data == '[DONE]':
                        break
                    try:
                        yield json.loads(data)
                    except json.JSONDecodeError:
                        pass


# Example Usage
if __name__ == "__main__":
    client = QwenClient()
    
    # Basic chat
    messages = [
        {"role": "system", "content": "You are a helpful software engineering assistant."},
        {"role": "user", "content": "Write a Python function to reverse a string recursively."}
    ]
    
    result = client.chat(messages, model="qwen2.5-coder-32b-instruct")
    print("Qwen's response:")
    print(result["choices"][0]["message"]["content"])
    
    # Token usage
    usage = result["usage"]
    print(f"\nTokens used: {usage['total_tokens']} "
          f"(prompt: {usage['prompt_tokens']}, "
          f"completion: {usage['completion_tokens']})")

Using OpenAI SDK with Qwen

Because Haotokai uses an OpenAI-compatible API, you can use the official OpenAI Python SDK with Qwen:

from openai import OpenAI

# Initialize with Haotokai base URL
client = OpenAI(
    api_key=os.getenv("HAOTOKAI_API_KEY"),
    base_url="https://www.haotokai.com/v1"
)

# Use just like you would with OpenAI
response = client.chat.completions.create(
    model="qwen2.5-32b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

# Streaming works too
stream = client.chat.completions.create(
    model="qwen2.5-32b-instruct",
    messages=[{"role": "user", "content": "Tell me a story about space exploration."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

๐Ÿ”‘ Developer Benefit

Using Haotokai's OpenAI-compatible API means you can switch between Qwen, DeepSeek, Claude, and other models without rewriting your code. Just change the model parameter โ€” perfect for A/B testing different models to find the best fit for your use case.

Function Calling with Qwen

Qwen models support function calling (tool use), enabling you to connect the model to external systems:

def get_product_price(product_name):
    """Simulated function to look up product prices."""
    prices = {
        "qwen-pro": "$29/month",
        "qwen-enterprise": "Custom pricing",
        "haotokai-basic": "$9.99/month"
    }
    return prices.get(product_name.lower(), "Price not available")


# Function calling with Qwen
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_product_price",
            "description": "Get the price of a product by name",
            "parameters": {
                "type": "object",
                "properties": {
                    "product_name": {
                        "type": "string",
                        "description": "The name of the product"
                    }
                },
                "required": ["product_name"]
            }
        }
    }
]

messages = [
    {"role": "user", "content": "How much does Haotokai basic plan cost?"}
]

response = client.chat.completions.create(
    model="qwen2.5-32b-instruct",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

# Check if the model wants to call a function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    func_name = tool_call.function.name
    func_args = eval(tool_call.function.arguments)  # Use json.loads in production
    
    print(f"Model called function: {func_name}")
    print(f"Arguments: {func_args}")
    
    # Execute the function
    result = get_product_price(func_args["product_name"])
    
    # Send the result back to the model
    messages.append(response.choices[0].message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": str(result)
    })
    
    # Get final response
    final_response = client.chat.completions.create(
        model="qwen2.5-32b-instruct",
        messages=messages
    )
    
    print("\nFinal response:")
    print(final_response.choices[0].message.content)

Embeddings for RAG Applications

Qwen embeddings are excellent for building retrieval-augmented generation (RAG) systems:

import numpy as np

def get_embeddings(texts, model="text-embedding-v2"):
    """
    Generate embeddings for a list of texts using Qwen.
    """
    response = client.embeddings.create(
        model=model,
        input=texts
    )
    return [item.embedding for item in response.data]


# Example: Building a simple document search
documents = [
    "Qwen 2.5-72B is Alibaba's flagship large language model.",
    "Haotokai provides unified API access to multiple AI models including Qwen.",
    "Qwen models support both Chinese and English at high quality levels.",
    "The Qwen 2.5 series offers up to 128K token context windows.",
    "Haotokai supports PayPal payments for AI API access."
]

# Generate embeddings for all documents
doc_embeddings = get_embeddings(documents)

# Search function
def search_documents(query, top_k=2):
    query_embedding = get_embeddings([query])[0]
    
    # Calculate cosine similarity
    similarities = []
    for i, doc_emb in enumerate(doc_embeddings):
        similarity = np.dot(query_embedding, doc_emb) / (
            np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb)
        )
        similarities.append((i, similarity))
    
    # Sort by similarity
    similarities.sort(key=lambda x: x[1], reverse=True)
    
    return [(documents[i], score) for i, score in similarities[:top_k]]


# Test search
results = search_documents("How can I pay for Qwen API?")
for doc, score in results:
    print(f"[{score:.4f}] {doc}")

Advanced Features & Capabilities

Long Context Window Usage

Qwen's 128K context window allows you to process entire documents in a single prompt. Here are patterns for leveraging this effectively:

Multimodal Capabilities (Qwen-VL)

Qwen-VL extends text models with vision capabilities, enabling:

Structured Output

Qwen excels at generating structured output (JSON, XML, YAML), which is crucial for building reliable integrations. Use a system prompt like:

messages = [
    {
        "role": "system",
        "content": """You are a data extraction assistant. 
        Always respond with valid JSON in the following format:
        {
            "entities": [{"name": string, "type": string, "confidence": float}],
            "summary": string,
            "sentiment": "positive" | "negative" | "neutral"
        }
        Do not include any text outside the JSON object."""
    },
    {
        "role": "user",
        "content": "Analyze this product review: 'The Qwen API is amazing! It's fast, affordable, and the quality is top-notch. I had a small issue with documentation but overall very happy.'"
    }
]

Best Practices for Qwen Integration

Prompt Engineering for Qwen

While Qwen is capable, good prompt engineering still yields better results:

  1. Be specific โ€” Clearly state what you want, how you want it formatted, and any constraints
  2. Use role prompting โ€” Assign a specific role to the model (e.g., "You are an expert Python developer")
  3. Provide examples โ€” Few-shot learning dramatically improves output quality for structured tasks
  4. Ask for reasoning โ€” For complex problems, ask the model to think step-by-step
  5. Iterate โ€” Test with real inputs and refine your prompts based on actual behavior

Production Considerations

Cost Optimization

Use these strategies to get the most value from Qwen:

Access Qwen Through Haotokai's Unified API

Haotokai makes it easy to integrate Qwen into your application while also giving you the flexibility to use other models as needed. Here's why developers choose Haotokai:

One API, Endless Possibilities

With Haotokai, you get access to:

All through a single, OpenAI-compatible API endpoint.

PayPal & International Payments

No Chinese bank account required. Haotokai supports PayPal, credit cards, and other international payment methods, making it easy for developers worldwide to access Qwen and other Chinese AI models.

Developer-Friendly Features

Start Building with Qwen Today

Get instant access to Qwen models and 50+ other AI APIs through Haotokai. Pay with PayPal, use our OpenAI-compatible endpoint, and build faster.

Sign Up for Free โ†’

Conclusion

Alibaba's Qwen models have firmly established themselves as world-class LLMs, offering impressive performance, massive context windows, and competitive pricing. Whether you're building chatbots, code assistants, content generation tools, or RAG systems, Qwen has a model that fits your needs.

The easiest way to get started with Qwen โ€” especially for international developers โ€” is through Haotokai's unified API. With OpenAI compatibility, PayPal support, and access to dozens of models, Haotokai simplifies AI integration and gives you the flexibility to choose the best model for each task.

Ready to start building? Head over to haotokai.com, grab your API key, and start experimenting with Qwen today. The possibilities are endless.