Qwen API Tutorial: Building Apps with Alibaba's Qwen Models

Alibaba's Qwen (通义千问) series of large language models has emerged as a powerful alternative to Western AI models, offering impressive capabilities at competitive prices. With strong performance in both Chinese and English, extensive context windows, and specialized variants for different use cases, Qwen has become a go-to choice for developers building applications for global markets.

In this tutorial, we'll dive deep into the Qwen API, exploring the different models available, understanding their strengths, and walking through practical code examples you can use in your own applications.

What is Qwen (通义千问)?
Qwen Model Family: Which One to Choose?
Getting Started with Qwen API
Core API Concepts
Code Examples: Practical Qwen Integrations
Advanced Features & Capabilities
Best Practices for Qwen Integration
Access Qwen Through Haotokai's Unified API

What is Qwen (通义千问)?

Qwen (pronounced "chwen"), short for 通义千问 (Tōngyì Qiānwèn), is Alibaba Cloud's family of large language models. Developed by the Tongyi Lab of Alibaba Group, Qwen models are trained on massive multilingual datasets and offer state-of-the-art performance across a wide range of natural language processing tasks.

What sets Qwen apart from other models is its exceptional bilingual capability — it's one of the few models that performs at a high level in both Chinese and English. This makes it ideal for applications targeting both Western and Chinese markets.

🌍 Bilingual Excellence

Top-tier performance in both Chinese and English, with strong multilingual support for many other languages.

📏 Large Context

Up to 128K token context window for processing long documents, entire codebases, and complex conversations.

⚡ Fast & Efficient

Optimized inference with competitive pricing, making it cost-effective for production workloads.

🧩 Model Variety

Specialized models for chat, coding, math, multimodal understanding, and more.

Qwen Model Family: Which One to Choose?

Alibaba offers an extensive lineup of Qwen models. Here's a breakdown of the most important ones for developers:

Model	Context	Best For	Key Strength
`qwen2.5-72b-instruct`	128K	General-purpose, complex tasks	Best overall quality
`qwen2.5-32b-instruct`	128K	Balanced performance & cost	Great value
`qwen2.5-14b-instruct`	128K	High-volume applications	Fast & affordable
`qwen2.5-7b-instruct`	128K	Simple tasks, prototyping	Ultra cost-effective
`qwen2.5-coder-32b-instruct`	128K	Code generation & analysis	Best for coding
`qwen2.5-math-72b-instruct`	128K	Mathematical reasoning	Best for math
`qwen-vl-max`	32K	Vision-language tasks	Multimodal understanding
`qwen-long`	1M+	Extremely long documents	Massive context window

💡 Pro Tip: Start with 32B

For most production applications, we recommend starting with qwen2.5-32b-instruct. It strikes an excellent balance between quality and cost — often performing close to the 72B model at a fraction of the price. Scale up to 72B only if you find the 32B model insufficient for your use case.

Qwen 2.5 Improvements

The Qwen 2.5 series (released in late 2024) brought significant improvements over previous versions:

Enhanced reasoning — Better performance on math, logic, and coding problems
Improved instruction following — More accurate adherence to complex prompts
Longer context — Standard 128K context across most models
Better multilingual support — Improved performance in over 90 languages
Faster inference — Optimized architecture for lower latency
Reduced hallucinations — More factually accurate responses

Getting Started with Qwen API

Access Options

There are two primary ways to access Qwen models via API:

Alibaba DashScope (direct) — Alibaba's official API platform
Haotokai (unified API) — Access Qwen alongside 50+ other models with one API key, PayPal support

For developers outside of China, we recommend Haotokai because it offers:

International payment methods including PayPal
An OpenAI-compatible API format (easier integration)
Access to multiple model providers through one interface
English-language documentation and support

Quick Start with Haotokai

Here's how to get up and running with Qwen in minutes using Haotokai:

# 1. Get your API key from https://www.haotokai.com
# 2. Set it as an environment variable
export HAOTOKAI_API_KEY="your-api-key-here"

# 3. Make a test call
curl -X POST https://www.haotokai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $HAOTOKAI_API_KEY" \
  -d '{
    "model": "qwen2.5-32b-instruct",
    "messages": [{"role": "user", "content": "Hello, Qwen!"}]
  }'

Core API Concepts

API Endpoints

When using Haotokai's OpenAI-compatible interface, you'll primarily use these endpoints:

Endpoint	Method	Purpose
`/v1/chat/completions`	POST	Chat-based interactions (most common)
`/v1/completions`	POST	Text completion (legacy format)
`/v1/embeddings`	POST	Generate text embeddings
`/v1/models`	GET	List available models

Message Format

The chat completions endpoint uses a conversation format with message roles:

system — Sets the behavior and persona of the assistant
user — Represents the end user's input
assistant — Represents the model's responses
tool — Results from function/tool calls

Key Parameters

Understanding these parameters will help you get the best results from Qwen:

temperature (0-2) — Controls randomness. Lower values (0.1-0.3) for deterministic tasks, higher (0.7-1.0) for creative tasks.
top_p (0-1) — Nucleus sampling alternative to temperature. Controls diversity via cumulative probability.
max_tokens — Maximum number of tokens to generate in the response.
stream — Set to true for streaming responses (server-sent events).
frequency_penalty (-2 to 2) — Reduces repetition by penalizing frequent tokens.
presence_penalty (-2 to 2) — Encourages topic diversity by penalizing tokens that have already appeared.

Code Examples: Practical Qwen Integrations

Basic Python Integration

Here's a complete Python example using the popular requests library:

import os
import requests

class QwenClient:
    """Client for interacting with Qwen models via Haotokai API."""
    
    def __init__(self, api_key=None, base_url="https://www.haotokai.com/v1"):
        self.api_key = api_key or os.getenv("HAOTOKAI_API_KEY")
        self.base_url = base_url
        self.headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }
    
    def chat(self, messages, model="qwen2.5-32b-instruct", 
             temperature=0.7, max_tokens=2000, stream=False):
        """
        Send a chat completion request to Qwen.
        
        Args:
            messages: List of message dicts with 'role' and 'content'
            model: Qwen model name
            temperature: Sampling temperature
            max_tokens: Maximum tokens to generate
            stream: Whether to stream the response
            
        Returns:
            Response dict or generator if streaming
        """
        payload = {
            "model": model,
            "messages": messages,
            "temperature": temperature,
            "max_tokens": max_tokens,
            "stream": stream
        }
        
        if stream:
            return self._stream_chat(payload)
        
        response = requests.post(
            f"{self.base_url}/chat/completions",
            headers=self.headers,
            json=payload,
            timeout=60
        )
        response.raise_for_status()
        return response.json()
    
    def _stream_chat(self, payload):
        """Handle streaming responses."""
        import json
        
        payload["stream"] = True
        response = requests.post(
            f"{self.base_url}/chat/completions",
            headers=self.headers,
            json=payload,
            stream=True,
            timeout=60
        )
        response.raise_for_status()
        
        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    data = line[6:]
                    if data == '[DONE]':
                        break
                    try:
                        yield json.loads(data)
                    except json.JSONDecodeError:
                        pass


# Example Usage
if __name__ == "__main__":
    client = QwenClient()
    
    # Basic chat
    messages = [
        {"role": "system", "content": "You are a helpful software engineering assistant."},
        {"role": "user", "content": "Write a Python function to reverse a string recursively."}
    ]
    
    result = client.chat(messages, model="qwen2.5-coder-32b-instruct")
    print("Qwen's response:")
    print(result["choices"][0]["message"]["content"])
    
    # Token usage
    usage = result["usage"]
    print(f"\nTokens used: {usage['total_tokens']} "
          f"(prompt: {usage['prompt_tokens']}, "
          f"completion: {usage['completion_tokens']})")

Using OpenAI SDK with Qwen

Because Haotokai uses an OpenAI-compatible API, you can use the official OpenAI Python SDK with Qwen:

from openai import OpenAI

# Initialize with Haotokai base URL
client = OpenAI(
    api_key=os.getenv("HAOTOKAI_API_KEY"),
    base_url="https://www.haotokai.com/v1"
)

# Use just like you would with OpenAI
response = client.chat.completions.create(
    model="qwen2.5-32b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

# Streaming works too
stream = client.chat.completions.create(
    model="qwen2.5-32b-instruct",
    messages=[{"role": "user", "content": "Tell me a story about space exploration."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

🔑 Developer Benefit

Using Haotokai's OpenAI-compatible API means you can switch between Qwen, DeepSeek, Claude, and other models without rewriting your code. Just change the model parameter — perfect for A/B testing different models to find the best fit for your use case.

Function Calling with Qwen

Qwen models support function calling (tool use), enabling you to connect the model to external systems:

def get_product_price(product_name):
    """Simulated function to look up product prices."""
    prices = {
        "qwen-pro": "$29/month",
        "qwen-enterprise": "Custom pricing",
        "haotokai-basic": "$9.99/month"
    }
    return prices.get(product_name.lower(), "Price not available")


# Function calling with Qwen
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_product_price",
            "description": "Get the price of a product by name",
            "parameters": {
                "type": "object",
                "properties": {
                    "product_name": {
                        "type": "string",
                        "description": "The name of the product"
                    }
                },
                "required": ["product_name"]
            }
        }
    }
]

messages = [
    {"role": "user", "content": "How much does Haotokai basic plan cost?"}
]

response = client.chat.completions.create(
    model="qwen2.5-32b-instruct",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

# Check if the model wants to call a function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    func_name = tool_call.function.name
    func_args = eval(tool_call.function.arguments)  # Use json.loads in production
    
    print(f"Model called function: {func_name}")
    print(f"Arguments: {func_args}")
    
    # Execute the function
    result = get_product_price(func_args["product_name"])
    
    # Send the result back to the model
    messages.append(response.choices[0].message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": str(result)
    })
    
    # Get final response
    final_response = client.chat.completions.create(
        model="qwen2.5-32b-instruct",
        messages=messages
    )
    
    print("\nFinal response:")
    print(final_response.choices[0].message.content)

Embeddings for RAG Applications

Qwen embeddings are excellent for building retrieval-augmented generation (RAG) systems:

import numpy as np

def get_embeddings(texts, model="text-embedding-v2"):
    """
    Generate embeddings for a list of texts using Qwen.
    """
    response = client.embeddings.create(
        model=model,
        input=texts
    )
    return [item.embedding for item in response.data]


# Example: Building a simple document search
documents = [
    "Qwen 2.5-72B is Alibaba's flagship large language model.",
    "Haotokai provides unified API access to multiple AI models including Qwen.",
    "Qwen models support both Chinese and English at high quality levels.",
    "The Qwen 2.5 series offers up to 128K token context windows.",
    "Haotokai supports PayPal payments for AI API access."
]

# Generate embeddings for all documents
doc_embeddings = get_embeddings(documents)

# Search function
def search_documents(query, top_k=2):
    query_embedding = get_embeddings([query])[0]
    
    # Calculate cosine similarity
    similarities = []
    for i, doc_emb in enumerate(doc_embeddings):
        similarity = np.dot(query_embedding, doc_emb) / (
            np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb)
        )
        similarities.append((i, similarity))
    
    # Sort by similarity
    similarities.sort(key=lambda x: x[1], reverse=True)
    
    return [(documents[i], score) for i, score in similarities[:top_k]]


# Test search
results = search_documents("How can I pay for Qwen API?")
for doc, score in results:
    print(f"[{score:.4f}] {doc}")

Advanced Features & Capabilities

Long Context Window Usage

Qwen's 128K context window allows you to process entire documents in a single prompt. Here are patterns for leveraging this effectively:

Document summarization — Pass entire reports, articles, or books for comprehensive summaries
Codebase analysis — Feed multiple source files for cross-file understanding
Contract review — Analyze full legal documents for clause extraction
Conversation history — Maintain long-running chat sessions without context loss

Multimodal Capabilities (Qwen-VL)

Qwen-VL extends text models with vision capabilities, enabling:

Image captioning and description
Chart and diagram understanding
Document scanning and OCR
Visual question answering

Structured Output

Qwen excels at generating structured output (JSON, XML, YAML), which is crucial for building reliable integrations. Use a system prompt like:

messages = [
    {
        "role": "system",
        "content": """You are a data extraction assistant. 
        Always respond with valid JSON in the following format:
        {
            "entities": [{"name": string, "type": string, "confidence": float}],
            "summary": string,
            "sentiment": "positive" | "negative" | "neutral"
        }
        Do not include any text outside the JSON object."""
    },
    {
        "role": "user",
        "content": "Analyze this product review: 'The Qwen API is amazing! It's fast, affordable, and the quality is top-notch. I had a small issue with documentation but overall very happy.'"
    }
]

Best Practices for Qwen Integration

Prompt Engineering for Qwen

While Qwen is capable, good prompt engineering still yields better results:

Be specific — Clearly state what you want, how you want it formatted, and any constraints
Use role prompting — Assign a specific role to the model (e.g., "You are an expert Python developer")
Provide examples — Few-shot learning dramatically improves output quality for structured tasks
Ask for reasoning — For complex problems, ask the model to think step-by-step
Iterate — Test with real inputs and refine your prompts based on actual behavior

Production Considerations

Implement retries with backoff — Network issues happen. Use exponential backoff for transient errors.
Set appropriate timeouts — Complex requests may take 30+ seconds, especially with long contexts.
Monitor token usage — Track costs by monitoring prompt and completion token counts.
Use caching — Cache frequent or expensive queries to reduce cost and latency.
Handle rate limits — Implement rate limiting on your end to stay within API quotas.

Cost Optimization

Use these strategies to get the most value from Qwen:

Right-size your model — Use 7B or 14B for simple tasks, 32B for general use, and 72B only when needed
Optimize prompts — Keep system prompts and context concise but complete
Use Haotokai for volume discounts — Higher usage tiers get better rates
Batch processing — Combine multiple small requests when possible

Access Qwen Through Haotokai's Unified API

Haotokai makes it easy to integrate Qwen into your application while also giving you the flexibility to use other models as needed. Here's why developers choose Haotokai:

One API, Endless Possibilities

With Haotokai, you get access to:

All Qwen models (2.5 series, coder, math, VL)
DeepSeek R1 reasoning models
Anthropic Claude models
Llama and other open-source models
Embedding models

All through a single, OpenAI-compatible API endpoint.

PayPal & International Payments

No Chinese bank account required. Haotokai supports PayPal, credit cards, and other international payment methods, making it easy for developers worldwide to access Qwen and other Chinese AI models.

Developer-Friendly Features

OpenAI-compatible API — use existing SDKs and tools
Streaming support for real-time applications
Comprehensive API logs and analytics
English documentation and support
99.9% uptime SLA for production workloads

Start Building with Qwen Today

Get instant access to Qwen models and 50+ other AI APIs through Haotokai. Pay with PayPal, use our OpenAI-compatible endpoint, and build faster.

Conclusion

Alibaba's Qwen models have firmly established themselves as world-class LLMs, offering impressive performance, massive context windows, and competitive pricing. Whether you're building chatbots, code assistants, content generation tools, or RAG systems, Qwen has a model that fits your needs.

The easiest way to get started with Qwen — especially for international developers — is through Haotokai's unified API. With OpenAI compatibility, PayPal support, and access to dozens of models, Haotokai simplifies AI integration and gives you the flexibility to choose the best model for each task.

Ready to start building? Head over to haotokai.com, grab your API key, and start experimenting with Qwen today. The possibilities are endless.

Table of Contents