Qwen, developed by Alibaba Cloud, has rapidly become one of the most capable open-weight AI model families in the world. With the release of Qwen3.5 and Qwen3.6 series, these Chinese models now rival GPT-4o and Claude 3.5 on many benchmarks — at a fraction of the cost.
In this tutorial, we'll walk through everything you need to know about using the Qwen API in Python. We'll cover setup, basic chat completions, streaming, function calling, vision capabilities, and how to use a gateway like haotokai.com for easier access and cost optimization.
Prerequisites
Before we dive in, make sure you have:
- Python 3.8 or higher installed
- Basic familiarity with Python and APIs
- A Qwen API key (we'll show you how to get one)
- An internet connection
We'll be using the OpenAI Python SDK since Qwen's API is fully OpenAI-compatible, making it easy to switch between providers.
Getting Your Qwen API Key
There are two main ways to get access to the Qwen API:
Option 1: Alibaba Cloud Model Studio (Direct)
Qwen's official API is available through Alibaba Cloud's Model Studio platform. Here's how to sign up:
- Go to the Alibaba Cloud Model Studio
- Create an account or sign in
- Navigate to the API Keys section
- Generate a new API key
Note: Direct access may require an Alibaba Cloud account and can have regional restrictions depending on your location.
Option 2: haotokai.com (Easier for Global Developers)
For developers outside of China, using an AI API gateway like haotokai.com is often simpler. With haotokai, you get:
- Instant access to Qwen (and 10+ other models) with one API key
- English-language documentation and support
- Global low-latency endpoints
- Unified billing across all providers
- Automatic failover and cost optimization
You can sign up for haotokai in under a minute and start making API calls immediately. We'll use haotokai's endpoint for most examples in this tutorial since it's the most accessible option for global developers.
Installation: Setting Up Your Python Environment
First, let's set up our Python environment and install the required packages.
Step 1: Create a Virtual Environment (Recommended)
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
Step 2: Install the OpenAI Python SDK
Qwen's API is fully compatible with the OpenAI SDK, so we'll use that:
pip install -U openai python-dotenv
The python-dotenv package will help us manage our API key securely.
Step 3: Set Up Your Environment Variables
Create a .env file in your project directory:
Qwen_API_KEY=your-api-key-here
BASE_URL=https://api.haotokai.com/v1
If you're using haotokai, your API key will work for all models — Qwen, DeepSeek, OpenAI, and more.
Basic Chat Completion with Qwen API
Now let's write our first Qwen API call. This is the simplest and most common use case.
Your First Qwen API Call
Create a file called qwen_chat.py:
import os
from dotenv import load_dotenv
from openai import OpenAI
# Load environment variables
load_dotenv()
# Initialize the client with Qwen-compatible endpoint
client = OpenAI(
api_key=os.getenv("QWEN_API_KEY"),
base_url=os.getenv("BASE_URL", "https://api.haotokai.com/v1")
)
def chat_with_qwen(prompt: str, model: str = "qwen-plus") -> str:
"""
Send a chat message to Qwen API and return the response.
Args:
prompt: The user's message
model: The Qwen model to use (qwen-plus, qwen-turbo, qwen-max)
Returns:
The model's response text
"""
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful, friendly assistant."},
{"role": "user", "content": prompt}
],
temperature=0.7,
max_tokens=1000
)
return response.choices[0].message.content
# Example usage
if __name__ == "__main__":
result = chat_with_qwen("Explain quantum computing in simple terms, like you're talking to a 10-year-old.")
print("Qwen's response:")
print(result)
Run it with:
python qwen_chat.py
Understanding the Parameters
Let's break down the key parameters:
- model: The Qwen model variant. Common options include: -
qwen-turbo: Fastest and cheapest, good for simple tasks -qwen-plus: Balanced speed and quality, recommended for most use cases -qwen-max: Most powerful, for complex reasoning tasks - messages: The conversation history as a list of message objects
- temperature: Controls randomness (0 = deterministic, 1 = very creative)
- max_tokens: Maximum tokens in the response
- top_p: Nucleus sampling parameter for diversity control
Tip: Using haotokai.com as your gateway lets you switch between Qwen models (and even other providers) by just changing the model name — no new API keys or SDK changes needed.
Streaming Responses for Real-Time Chat
For chat applications, streaming responses dramatically improve user experience. Instead of waiting for the full response, users see tokens appear in real-time.
Here's how to implement streaming with the Qwen API:
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(
api_key=os.getenv("QWEN_API_KEY"),
base_url=os.getenv("BASE_URL", "https://api.haotokai.com/v1")
)
def stream_chat(prompt: str, model: str = "qwen-plus"):
"""Stream a chat response from Qwen API."""
stream = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
stream=True,
temperature=0.7,
max_tokens=2000
)
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
content = chunk.choices[0].delta.content
full_response += content
print(content, end="", flush=True)
return full_response
if __name__ == "__main__":
print("Qwen is typing...\n")
stream_chat("Write a short story about a robot learning to paint.")
print("\n\n--- End of story ---")
Streaming works with all Qwen models and is essential for building chatbots, AI writing tools, and any application where user experience matters.
Multi-Turn Conversations
Real applications rarely have single-turn interactions. Here's how to maintain conversation context across multiple turns:
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(
api_key=os.getenv("QWEN_API_KEY"),
base_url=os.getenv("BASE_URL", "https://api.haotokai.com/v1")
)
class QwenChatBot:
def __init__(self, system_prompt: str = "You are a helpful assistant.", model: str = "qwen-plus"):
self.model = model
self.messages = [{"role": "system", "content": system_prompt}]
def send_message(self, user_message: str) -> str:
"""Send a message and get a response, maintaining conversation history."""
# Add user message to history
self.messages.append({"role": "user", "content": user_message})
# Get response
response = client.chat.completions.create(
model=self.model,
messages=self.messages,
temperature=0.7,
max_tokens=2000
)
assistant_response = response.choices[0].message.content
# Add assistant response to history
self.messages.append({"role": "assistant", "content": assistant_response})
return assistant_response
def clear_history(self):
"""Reset the conversation, keeping only the system prompt."""
self.messages = [self.messages[0]]
# Example usage
if __name__ == "__main__":
bot = QwenChatBot(system_prompt="You are a coding expert who explains things clearly.")
print("Chat with Qwen (type 'quit' to exit, 'clear' to reset)")
while True:
user_input = input("\nYou: ")
if user_input.lower() == "quit":
break
elif user_input.lower() == "clear":
bot.clear_history()
print("Conversation cleared.")
continue
response = bot.send_message(user_input)
print(f"\nQwen: {response}")
This simple chat bot maintains full conversation context, making it easy to build more complex applications on top.
Function Calling with Qwen API
Qwen models support function calling (also known as tool use), which lets you connect the AI to external tools, APIs, and data sources.
Here's an example of function calling for a weather application:
import os
import json
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(
api_key=os.getenv("QWEN_API_KEY"),
base_url=os.getenv("BASE_URL", "https://api.haotokai.com/v1")
)
# Define the tools/functions available to the model
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"default": "celsius"
}
},
"required": ["location"]
}
}
}
]
def get_weather(location: str, unit: str = "celsius") -> dict:
"""Mock weather function - replace with real API call in production."""
# In production, you'd call a real weather API here
weather_data = {
"location": location,
"temperature": 22 if unit == "celsius" else 72,
"condition": "Partly Cloudy",
"humidity": 65,
"unit": unit
}
return weather_data
def chat_with_tools(prompt: str) -> str:
"""Chat with Qwen, allowing function calls."""
messages = [
{"role": "system", "content": "You are a helpful weather assistant. Use the provided tools to answer user questions about weather."},
{"role": "user", "content": prompt}
]
# First API call - let the model decide if it needs to use a tool
response = client.chat.completions.create(
model="qwen-plus",
messages=messages,
tools=tools,
tool_choice="auto"
)
response_message = response.choices[0].message
# Check if the model wants to call a function
if response_message.tool_calls:
# Add the assistant's response to messages
messages.append(response_message)
# Execute each function call
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
print(f"Qwen is calling {function_name} with args: {function_args}")
# Call the appropriate function
if function_name == "get_weather":
function_response = get_weather(
location=function_args.get("location"),
unit=function_args.get("unit", "celsius")
)
# Add the function response to messages
messages.append({
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": json.dumps(function_response)
})
# Second API call - get the final response with tool results
second_response = client.chat.completions.create(
model="qwen-plus",
messages=messages
)
return second_response.choices[0].message.content
# No function call needed, return the direct response
return response_message.content
if __name__ == "__main__":
result = chat_with_tools("What's the weather like in Beijing? Should I bring an umbrella?")
print(f"\nFinal response:\n{result}")
Function calling opens up a world of possibilities — you can build AI agents that query databases, call APIs, interact with files, and much more.
Vision Capabilities: Qwen with Image Inputs
Qwen multimodal models support image inputs, allowing you to build vision AI applications. Here's how to use the vision API:
import os
import base64
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(
api_key=os.getenv("QWEN_API_KEY"),
base_url=os.getenv("BASE_URL", "https://api.haotokai.com/v1")
)
def encode_image(image_path: str) -> str:
"""Encode an image file to base64."""
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
def analyze_image(image_path: str, prompt: str = "Describe this image in detail.") -> str:
"""Analyze an image using Qwen's vision capabilities."""
base64_image = encode_image(image_path)
response = client.chat.completions.create(
model="qwen-vl-plus",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
}
]
}
],
max_tokens=1000
)
return response.choices[0].message.content
# You can also use image URLs
def analyze_image_url(image_url: str, prompt: str = "Describe this image in detail.") -> str:
"""Analyze an image from a URL using Qwen's vision API."""
response = client.chat.completions.create(
model="qwen-vl-plus",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": image_url}}
]
}
],
max_tokens=1000
)
return response.choices[0].message.content
if __name__ == "__main__":
# Example with URL (replace with your own image)
image_url = "https://example.com/image.jpg"
result = analyze_image_url(
image_url,
prompt="What's in this image? Be specific and detailed."
)
print(result)
Vision capabilities are available on Qwen's multimodal models like qwen-vl-plus and qwen-vl-max.
Using Haotokai Gateway for Qwen API
While you can use Qwen's API directly, using a gateway like haotokai.com offers several advantages for production applications:
Why Use haotokai.com with Qwen?
- One API key for everything: Access Qwen, DeepSeek, OpenAI, GLM, and more with a single key
- Automatic failover: If Qwen's API has issues, traffic automatically routes to a backup provider
- Cost optimization: haotokai's smart routing selects the best-priced model for each task
- Global low latency: Edge endpoints worldwide for fast response times
- Usage analytics: Detailed dashboards showing costs, latency, and usage patterns
- English support: Full English documentation and support team
Getting Started with haotokai
Getting set up with haotokai takes less than 5 minutes:
- Sign up for haotokai.com
- Copy your API key from the dashboard
- Set your base URL to
https://api.haotokai.com/v1 - Use any model name — Qwen, DeepSeek, OpenAI, etc.
All the code examples in this tutorial work with haotokai — just use your haotokai API key and base URL.
Advanced: Fallback Routing
One of the most powerful features of haotokai is automatic fallback routing. If Qwen is slow or unavailable, your requests automatically fail over to another model like DeepSeek or GPT-4o Mini.
This means 99.9% uptime for your application, even when individual providers have issues. You configure the fallback order in your haotokai dashboard — no code changes required.
Best Practices for Production
Here are some best practices to follow when using the Qwen API in production:
1. Use Environment Variables for API Keys
Never hardcode your API key in your source code. Always use environment variables or a secrets manager.
2. Implement Error Handling
from openai import APIError, RateLimitError, APIConnectionError
try:
response = client.chat.completions.create(...)
except RateLimitError as e:
# Handle rate limiting - implement exponential backoff
print(f"Rate limit hit: {e}")
# Retry after delay
except APIConnectionError as e:
# Handle connectivity issues
print(f"Connection error: {e}")
except APIError as e:
# Handle other API errors
print(f"API error: {e}")
3. Add Retry Logic
Use a retry library like tenacity to handle transient errors gracefully:
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from openai import APIError, RateLimitError
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=10),
retry=retry_if_exception_type((APIError, RateLimitError, APIConnectionError))
)
def chat_with_retry(prompt: str) -> str:
response = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
4. Monitor Costs and Usage
Set up budget alerts and monitor your token usage regularly. With haotokai.com, you get a unified dashboard showing all your costs across all providers, making it easy to track and optimize spending.
5. Choose the Right Model for the Task
Don't use your most powerful (and expensive) model for every request. Use routing logic to match the model to the task:
- Simple tasks (classification, extraction):
qwen-turbo - General use cases:
qwen-plus - Complex reasoning, code, creative writing:
qwen-max
Haotokai can handle this routing automatically based on your configured rules.
Troubleshooting Common Issues
"Invalid API Key" Error
- Double-check that you're using the correct API key
- Verify the base URL is set correctly
- Ensure your account has available credits
Slow Response Times
- Try a lighter model (turbo instead of plus/max)
- Check your network connection
- Consider using haotokai for global edge endpoints
Rate Limiting
- Implement exponential backoff (as shown above)
- Consider upgrading your plan for higher rate limits
- Use batch processing for large workloads
Model Not Responding
- Check the provider's status page
- If using haotokai, your requests will automatically fail over to a backup
- Verify you're using a valid model name
Conclusion
Qwen's API is a powerful, cost-effective alternative to Western AI providers. With its OpenAI-compatible interface, strong performance, and aggressive pricing, it's no wonder so many developers are adding Qwen to their AI stack.
Whether you're building a chatbot, content generation tool, AI agent, or anything else, Qwen delivers excellent value. And when combined with haotokai.com for unified access, automatic failover, and cost optimization, you get a production-ready AI infrastructure that's both powerful and affordable.
The best way to evaluate Qwen is to try it on your own use cases. The code examples in this tutorial give you everything you need to get started — pick one that matches your use case and start experimenting.
Get Started with Haotokai
Ready to start building with Qwen API? Sign up for haotokai.com today and get instant access to Qwen, DeepSeek, OpenAI, GLM, and other top AI models through a single unified API.
- ⚡ 5-minute integration: Fully OpenAI-compatible — just change your base URL
- 🌍 Global access: Low-latency endpoints worldwide
- 🔄 Automatic failover: 99.9% uptime guaranteed
- 💰 Save up to 80%: Compare to direct provider pricing
- 📊 Unified dashboard: All your models, costs, and analytics in one place