Alibaba's Qwen (้ไนๅ้ฎ) series of large language models has emerged as a powerful alternative to Western AI models, offering impressive capabilities at competitive prices. With strong performance in both Chinese and English, extensive context windows, and specialized variants for different use cases, Qwen has become a go-to choice for developers building applications for global markets.
In this tutorial, we'll dive deep into the Qwen API, exploring the different models available, understanding their strengths, and walking through practical code examples you can use in your own applications.
Table of Contents
- What is Qwen (้ไนๅ้ฎ)?
- Qwen Model Family: Which One to Choose?
- Getting Started with Qwen API
- Core API Concepts
- Code Examples: Practical Qwen Integrations
- Advanced Features & Capabilities
- Best Practices for Qwen Integration
- Access Qwen Through Haotokai's Unified API
What is Qwen (้ไนๅ้ฎ)?
Qwen (pronounced "chwen"), short for ้ไนๅ้ฎ (Tลngyรฌ Qiฤnwรจn), is Alibaba Cloud's family of large language models. Developed by the Tongyi Lab of Alibaba Group, Qwen models are trained on massive multilingual datasets and offer state-of-the-art performance across a wide range of natural language processing tasks.
What sets Qwen apart from other models is its exceptional bilingual capability โ it's one of the few models that performs at a high level in both Chinese and English. This makes it ideal for applications targeting both Western and Chinese markets.
๐ Bilingual Excellence
Top-tier performance in both Chinese and English, with strong multilingual support for many other languages.
๐ Large Context
Up to 128K token context window for processing long documents, entire codebases, and complex conversations.
โก Fast & Efficient
Optimized inference with competitive pricing, making it cost-effective for production workloads.
๐งฉ Model Variety
Specialized models for chat, coding, math, multimodal understanding, and more.
Qwen Model Family: Which One to Choose?
Alibaba offers an extensive lineup of Qwen models. Here's a breakdown of the most important ones for developers:
| Model | Context | Best For | Key Strength |
|---|---|---|---|
qwen2.5-72b-instruct |
128K | General-purpose, complex tasks | Best overall quality |
qwen2.5-32b-instruct |
128K | Balanced performance & cost | Great value |
qwen2.5-14b-instruct |
128K | High-volume applications | Fast & affordable |
qwen2.5-7b-instruct |
128K | Simple tasks, prototyping | Ultra cost-effective |
qwen2.5-coder-32b-instruct |
128K | Code generation & analysis | Best for coding |
qwen2.5-math-72b-instruct |
128K | Mathematical reasoning | Best for math |
qwen-vl-max |
32K | Vision-language tasks | Multimodal understanding |
qwen-long |
1M+ | Extremely long documents | Massive context window |
๐ก Pro Tip: Start with 32B
For most production applications, we recommend starting with qwen2.5-32b-instruct. It strikes an excellent balance between quality and cost โ often performing close to the 72B model at a fraction of the price. Scale up to 72B only if you find the 32B model insufficient for your use case.
Qwen 2.5 Improvements
The Qwen 2.5 series (released in late 2024) brought significant improvements over previous versions:
- Enhanced reasoning โ Better performance on math, logic, and coding problems
- Improved instruction following โ More accurate adherence to complex prompts
- Longer context โ Standard 128K context across most models
- Better multilingual support โ Improved performance in over 90 languages
- Faster inference โ Optimized architecture for lower latency
- Reduced hallucinations โ More factually accurate responses
Getting Started with Qwen API
Access Options
There are two primary ways to access Qwen models via API:
- Alibaba DashScope (direct) โ Alibaba's official API platform
- Haotokai (unified API) โ Access Qwen alongside 50+ other models with one API key, PayPal support
For developers outside of China, we recommend Haotokai because it offers:
- International payment methods including PayPal
- An OpenAI-compatible API format (easier integration)
- Access to multiple model providers through one interface
- English-language documentation and support
Quick Start with Haotokai
Here's how to get up and running with Qwen in minutes using Haotokai:
# 1. Get your API key from https://www.haotokai.com
# 2. Set it as an environment variable
export HAOTOKAI_API_KEY="your-api-key-here"
# 3. Make a test call
curl -X POST https://www.haotokai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $HAOTOKAI_API_KEY" \
-d '{
"model": "qwen2.5-32b-instruct",
"messages": [{"role": "user", "content": "Hello, Qwen!"}]
}'
Core API Concepts
API Endpoints
When using Haotokai's OpenAI-compatible interface, you'll primarily use these endpoints:
| Endpoint | Method | Purpose |
|---|---|---|
/v1/chat/completions |
POST | Chat-based interactions (most common) |
/v1/completions |
POST | Text completion (legacy format) |
/v1/embeddings |
POST | Generate text embeddings |
/v1/models |
GET | List available models |
Message Format
The chat completions endpoint uses a conversation format with message roles:
systemโ Sets the behavior and persona of the assistantuserโ Represents the end user's inputassistantโ Represents the model's responsestoolโ Results from function/tool calls
Key Parameters
Understanding these parameters will help you get the best results from Qwen:
temperature(0-2) โ Controls randomness. Lower values (0.1-0.3) for deterministic tasks, higher (0.7-1.0) for creative tasks.top_p(0-1) โ Nucleus sampling alternative to temperature. Controls diversity via cumulative probability.max_tokensโ Maximum number of tokens to generate in the response.streamโ Set totruefor streaming responses (server-sent events).frequency_penalty(-2 to 2) โ Reduces repetition by penalizing frequent tokens.presence_penalty(-2 to 2) โ Encourages topic diversity by penalizing tokens that have already appeared.
Code Examples: Practical Qwen Integrations
Basic Python Integration
Here's a complete Python example using the popular requests library:
import os
import requests
class QwenClient:
"""Client for interacting with Qwen models via Haotokai API."""
def __init__(self, api_key=None, base_url="https://www.haotokai.com/v1"):
self.api_key = api_key or os.getenv("HAOTOKAI_API_KEY")
self.base_url = base_url
self.headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json"
}
def chat(self, messages, model="qwen2.5-32b-instruct",
temperature=0.7, max_tokens=2000, stream=False):
"""
Send a chat completion request to Qwen.
Args:
messages: List of message dicts with 'role' and 'content'
model: Qwen model name
temperature: Sampling temperature
max_tokens: Maximum tokens to generate
stream: Whether to stream the response
Returns:
Response dict or generator if streaming
"""
payload = {
"model": model,
"messages": messages,
"temperature": temperature,
"max_tokens": max_tokens,
"stream": stream
}
if stream:
return self._stream_chat(payload)
response = requests.post(
f"{self.base_url}/chat/completions",
headers=self.headers,
json=payload,
timeout=60
)
response.raise_for_status()
return response.json()
def _stream_chat(self, payload):
"""Handle streaming responses."""
import json
payload["stream"] = True
response = requests.post(
f"{self.base_url}/chat/completions",
headers=self.headers,
json=payload,
stream=True,
timeout=60
)
response.raise_for_status()
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:]
if data == '[DONE]':
break
try:
yield json.loads(data)
except json.JSONDecodeError:
pass
# Example Usage
if __name__ == "__main__":
client = QwenClient()
# Basic chat
messages = [
{"role": "system", "content": "You are a helpful software engineering assistant."},
{"role": "user", "content": "Write a Python function to reverse a string recursively."}
]
result = client.chat(messages, model="qwen2.5-coder-32b-instruct")
print("Qwen's response:")
print(result["choices"][0]["message"]["content"])
# Token usage
usage = result["usage"]
print(f"\nTokens used: {usage['total_tokens']} "
f"(prompt: {usage['prompt_tokens']}, "
f"completion: {usage['completion_tokens']})")
Using OpenAI SDK with Qwen
Because Haotokai uses an OpenAI-compatible API, you can use the official OpenAI Python SDK with Qwen:
from openai import OpenAI
# Initialize with Haotokai base URL
client = OpenAI(
api_key=os.getenv("HAOTOKAI_API_KEY"),
base_url="https://www.haotokai.com/v1"
)
# Use just like you would with OpenAI
response = client.chat.completions.create(
model="qwen2.5-32b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
# Streaming works too
stream = client.chat.completions.create(
model="qwen2.5-32b-instruct",
messages=[{"role": "user", "content": "Tell me a story about space exploration."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
๐ Developer Benefit
Using Haotokai's OpenAI-compatible API means you can switch between Qwen, DeepSeek, Claude, and other models without rewriting your code. Just change the model parameter โ perfect for A/B testing different models to find the best fit for your use case.
Function Calling with Qwen
Qwen models support function calling (tool use), enabling you to connect the model to external systems:
def get_product_price(product_name):
"""Simulated function to look up product prices."""
prices = {
"qwen-pro": "$29/month",
"qwen-enterprise": "Custom pricing",
"haotokai-basic": "$9.99/month"
}
return prices.get(product_name.lower(), "Price not available")
# Function calling with Qwen
tools = [
{
"type": "function",
"function": {
"name": "get_product_price",
"description": "Get the price of a product by name",
"parameters": {
"type": "object",
"properties": {
"product_name": {
"type": "string",
"description": "The name of the product"
}
},
"required": ["product_name"]
}
}
}
]
messages = [
{"role": "user", "content": "How much does Haotokai basic plan cost?"}
]
response = client.chat.completions.create(
model="qwen2.5-32b-instruct",
messages=messages,
tools=tools,
tool_choice="auto"
)
# Check if the model wants to call a function
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
func_name = tool_call.function.name
func_args = eval(tool_call.function.arguments) # Use json.loads in production
print(f"Model called function: {func_name}")
print(f"Arguments: {func_args}")
# Execute the function
result = get_product_price(func_args["product_name"])
# Send the result back to the model
messages.append(response.choices[0].message)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})
# Get final response
final_response = client.chat.completions.create(
model="qwen2.5-32b-instruct",
messages=messages
)
print("\nFinal response:")
print(final_response.choices[0].message.content)
Embeddings for RAG Applications
Qwen embeddings are excellent for building retrieval-augmented generation (RAG) systems:
import numpy as np
def get_embeddings(texts, model="text-embedding-v2"):
"""
Generate embeddings for a list of texts using Qwen.
"""
response = client.embeddings.create(
model=model,
input=texts
)
return [item.embedding for item in response.data]
# Example: Building a simple document search
documents = [
"Qwen 2.5-72B is Alibaba's flagship large language model.",
"Haotokai provides unified API access to multiple AI models including Qwen.",
"Qwen models support both Chinese and English at high quality levels.",
"The Qwen 2.5 series offers up to 128K token context windows.",
"Haotokai supports PayPal payments for AI API access."
]
# Generate embeddings for all documents
doc_embeddings = get_embeddings(documents)
# Search function
def search_documents(query, top_k=2):
query_embedding = get_embeddings([query])[0]
# Calculate cosine similarity
similarities = []
for i, doc_emb in enumerate(doc_embeddings):
similarity = np.dot(query_embedding, doc_emb) / (
np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb)
)
similarities.append((i, similarity))
# Sort by similarity
similarities.sort(key=lambda x: x[1], reverse=True)
return [(documents[i], score) for i, score in similarities[:top_k]]
# Test search
results = search_documents("How can I pay for Qwen API?")
for doc, score in results:
print(f"[{score:.4f}] {doc}")
Advanced Features & Capabilities
Long Context Window Usage
Qwen's 128K context window allows you to process entire documents in a single prompt. Here are patterns for leveraging this effectively:
- Document summarization โ Pass entire reports, articles, or books for comprehensive summaries
- Codebase analysis โ Feed multiple source files for cross-file understanding
- Contract review โ Analyze full legal documents for clause extraction
- Conversation history โ Maintain long-running chat sessions without context loss
Multimodal Capabilities (Qwen-VL)
Qwen-VL extends text models with vision capabilities, enabling:
- Image captioning and description
- Chart and diagram understanding
- Document scanning and OCR
- Visual question answering
Structured Output
Qwen excels at generating structured output (JSON, XML, YAML), which is crucial for building reliable integrations. Use a system prompt like:
messages = [
{
"role": "system",
"content": """You are a data extraction assistant.
Always respond with valid JSON in the following format:
{
"entities": [{"name": string, "type": string, "confidence": float}],
"summary": string,
"sentiment": "positive" | "negative" | "neutral"
}
Do not include any text outside the JSON object."""
},
{
"role": "user",
"content": "Analyze this product review: 'The Qwen API is amazing! It's fast, affordable, and the quality is top-notch. I had a small issue with documentation but overall very happy.'"
}
]
Best Practices for Qwen Integration
Prompt Engineering for Qwen
While Qwen is capable, good prompt engineering still yields better results:
- Be specific โ Clearly state what you want, how you want it formatted, and any constraints
- Use role prompting โ Assign a specific role to the model (e.g., "You are an expert Python developer")
- Provide examples โ Few-shot learning dramatically improves output quality for structured tasks
- Ask for reasoning โ For complex problems, ask the model to think step-by-step
- Iterate โ Test with real inputs and refine your prompts based on actual behavior
Production Considerations
- Implement retries with backoff โ Network issues happen. Use exponential backoff for transient errors.
- Set appropriate timeouts โ Complex requests may take 30+ seconds, especially with long contexts.
- Monitor token usage โ Track costs by monitoring prompt and completion token counts.
- Use caching โ Cache frequent or expensive queries to reduce cost and latency.
- Handle rate limits โ Implement rate limiting on your end to stay within API quotas.
Cost Optimization
Use these strategies to get the most value from Qwen:
- Right-size your model โ Use 7B or 14B for simple tasks, 32B for general use, and 72B only when needed
- Optimize prompts โ Keep system prompts and context concise but complete
- Use Haotokai for volume discounts โ Higher usage tiers get better rates
- Batch processing โ Combine multiple small requests when possible
Access Qwen Through Haotokai's Unified API
Haotokai makes it easy to integrate Qwen into your application while also giving you the flexibility to use other models as needed. Here's why developers choose Haotokai:
One API, Endless Possibilities
With Haotokai, you get access to:
- All Qwen models (2.5 series, coder, math, VL)
- DeepSeek R1 reasoning models
- Anthropic Claude models
- Llama and other open-source models
- Embedding models
All through a single, OpenAI-compatible API endpoint.
PayPal & International Payments
No Chinese bank account required. Haotokai supports PayPal, credit cards, and other international payment methods, making it easy for developers worldwide to access Qwen and other Chinese AI models.
Developer-Friendly Features
- OpenAI-compatible API โ use existing SDKs and tools
- Streaming support for real-time applications
- Comprehensive API logs and analytics
- English documentation and support
- 99.9% uptime SLA for production workloads
Start Building with Qwen Today
Get instant access to Qwen models and 50+ other AI APIs through Haotokai. Pay with PayPal, use our OpenAI-compatible endpoint, and build faster.
Sign Up for Free โConclusion
Alibaba's Qwen models have firmly established themselves as world-class LLMs, offering impressive performance, massive context windows, and competitive pricing. Whether you're building chatbots, code assistants, content generation tools, or RAG systems, Qwen has a model that fits your needs.
The easiest way to get started with Qwen โ especially for international developers โ is through Haotokai's unified API. With OpenAI compatibility, PayPal support, and access to dozens of models, Haotokai simplifies AI integration and gives you the flexibility to choose the best model for each task.
Ready to start building? Head over to haotokai.com, grab your API key, and start experimenting with Qwen today. The possibilities are endless.