How to Use Qwen API in Python: The Complete Guide for 2026

Qwen, developed by Alibaba Cloud, has rapidly become one of the most capable open-weight AI model families in the world. With the release of Qwen3.5 and Qwen3.6 series, these Chinese models now rival GPT-4o and Claude 3.5 on many benchmarks — at a fraction of the cost.

In this tutorial, we'll walk through everything you need to know about using the Qwen API in Python. We'll cover setup, basic chat completions, streaming, function calling, vision capabilities, and how to use a gateway like haotokai.com for easier access and cost optimization.

Prerequisites

Before we dive in, make sure you have:

Python 3.8 or higher installed
Basic familiarity with Python and APIs
A Qwen API key (we'll show you how to get one)
An internet connection

We'll be using the OpenAI Python SDK since Qwen's API is fully OpenAI-compatible, making it easy to switch between providers.

Getting Your Qwen API Key

There are two main ways to get access to the Qwen API:

Option 1: Alibaba Cloud Model Studio (Direct)

Qwen's official API is available through Alibaba Cloud's Model Studio platform. Here's how to sign up:

Go to the Alibaba Cloud Model Studio
Create an account or sign in
Navigate to the API Keys section
Generate a new API key

Note: Direct access may require an Alibaba Cloud account and can have regional restrictions depending on your location.

Option 2: haotokai.com (Easier for Global Developers)

For developers outside of China, using an AI API gateway like haotokai.com is often simpler. With haotokai, you get:

Instant access to Qwen (and 10+ other models) with one API key
English-language documentation and support
Global low-latency endpoints
Unified billing across all providers
Automatic failover and cost optimization

You can sign up for haotokai in under a minute and start making API calls immediately. We'll use haotokai's endpoint for most examples in this tutorial since it's the most accessible option for global developers.

Installation: Setting Up Your Python Environment

First, let's set up our Python environment and install the required packages.

Step 1: Create a Virtual Environment (Recommended)

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Step 2: Install the OpenAI Python SDK

Qwen's API is fully compatible with the OpenAI SDK, so we'll use that:

pip install -U openai python-dotenv

The python-dotenv package will help us manage our API key securely.

Step 3: Set Up Your Environment Variables

Create a .env file in your project directory:

Qwen_API_KEY=your-api-key-here
BASE_URL=https://api.haotokai.com/v1

If you're using haotokai, your API key will work for all models — Qwen, DeepSeek, OpenAI, and more.

Basic Chat Completion with Qwen API

Now let's write our first Qwen API call. This is the simplest and most common use case.

Your First Qwen API Call

Create a file called qwen_chat.py:

import os
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables
load_dotenv()

# Initialize the client with Qwen-compatible endpoint
client = OpenAI(
    api_key=os.getenv("QWEN_API_KEY"),
    base_url=os.getenv("BASE_URL", "https://api.haotokai.com/v1")
)

def chat_with_qwen(prompt: str, model: str = "qwen-plus") -> str:
    """
    Send a chat message to Qwen API and return the response.
    
    Args:
        prompt: The user's message
        model: The Qwen model to use (qwen-plus, qwen-turbo, qwen-max)
    
    Returns:
        The model's response text
    """
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful, friendly assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
        max_tokens=1000
    )
    
    return response.choices[0].message.content

# Example usage
if __name__ == "__main__":
    result = chat_with_qwen("Explain quantum computing in simple terms, like you're talking to a 10-year-old.")
    print("Qwen's response:")
    print(result)

Run it with:

python qwen_chat.py

Understanding the Parameters

Let's break down the key parameters:

model: The Qwen model variant. Common options include: - qwen-turbo: Fastest and cheapest, good for simple tasks - qwen-plus: Balanced speed and quality, recommended for most use cases - qwen-max: Most powerful, for complex reasoning tasks
messages: The conversation history as a list of message objects
temperature: Controls randomness (0 = deterministic, 1 = very creative)
max_tokens: Maximum tokens in the response
top_p: Nucleus sampling parameter for diversity control

Tip: Using haotokai.com as your gateway lets you switch between Qwen models (and even other providers) by just changing the model name — no new API keys or SDK changes needed.

Streaming Responses for Real-Time Chat

For chat applications, streaming responses dramatically improve user experience. Instead of waiting for the full response, users see tokens appear in real-time.

Here's how to implement streaming with the Qwen API:

import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI(
    api_key=os.getenv("QWEN_API_KEY"),
    base_url=os.getenv("BASE_URL", "https://api.haotokai.com/v1")
)

def stream_chat(prompt: str, model: str = "qwen-plus"):
    """Stream a chat response from Qwen API."""
    stream = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        stream=True,
        temperature=0.7,
        max_tokens=2000
    )
    
    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_response += content
            print(content, end="", flush=True)
    
    return full_response

if __name__ == "__main__":
    print("Qwen is typing...\n")
    stream_chat("Write a short story about a robot learning to paint.")
    print("\n\n--- End of story ---")

Streaming works with all Qwen models and is essential for building chatbots, AI writing tools, and any application where user experience matters.

Multi-Turn Conversations

Real applications rarely have single-turn interactions. Here's how to maintain conversation context across multiple turns:

import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI(
    api_key=os.getenv("QWEN_API_KEY"),
    base_url=os.getenv("BASE_URL", "https://api.haotokai.com/v1")
)

class QwenChatBot:
    def __init__(self, system_prompt: str = "You are a helpful assistant.", model: str = "qwen-plus"):
        self.model = model
        self.messages = [{"role": "system", "content": system_prompt}]
    
    def send_message(self, user_message: str) -> str:
        """Send a message and get a response, maintaining conversation history."""
        # Add user message to history
        self.messages.append({"role": "user", "content": user_message})
        
        # Get response
        response = client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            temperature=0.7,
            max_tokens=2000
        )
        
        assistant_response = response.choices[0].message.content
        
        # Add assistant response to history
        self.messages.append({"role": "assistant", "content": assistant_response})
        
        return assistant_response
    
    def clear_history(self):
        """Reset the conversation, keeping only the system prompt."""
        self.messages = [self.messages[0]]

# Example usage
if __name__ == "__main__":
    bot = QwenChatBot(system_prompt="You are a coding expert who explains things clearly.")
    
    print("Chat with Qwen (type 'quit' to exit, 'clear' to reset)")
    while True:
        user_input = input("\nYou: ")
        if user_input.lower() == "quit":
            break
        elif user_input.lower() == "clear":
            bot.clear_history()
            print("Conversation cleared.")
            continue
        
        response = bot.send_message(user_input)
        print(f"\nQwen: {response}")

This simple chat bot maintains full conversation context, making it easy to build more complex applications on top.

Function Calling with Qwen API

Qwen models support function calling (also known as tool use), which lets you connect the AI to external tools, APIs, and data sources.

Here's an example of function calling for a weather application:

import os
import json
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI(
    api_key=os.getenv("QWEN_API_KEY"),
    base_url=os.getenv("BASE_URL", "https://api.haotokai.com/v1")
)

# Define the tools/functions available to the model
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "default": "celsius"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

def get_weather(location: str, unit: str = "celsius") -> dict:
    """Mock weather function - replace with real API call in production."""
    # In production, you'd call a real weather API here
    weather_data = {
        "location": location,
        "temperature": 22 if unit == "celsius" else 72,
        "condition": "Partly Cloudy",
        "humidity": 65,
        "unit": unit
    }
    return weather_data

def chat_with_tools(prompt: str) -> str:
    """Chat with Qwen, allowing function calls."""
    messages = [
        {"role": "system", "content": "You are a helpful weather assistant. Use the provided tools to answer user questions about weather."},
        {"role": "user", "content": prompt}
    ]
    
    # First API call - let the model decide if it needs to use a tool
    response = client.chat.completions.create(
        model="qwen-plus",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )
    
    response_message = response.choices[0].message
    
    # Check if the model wants to call a function
    if response_message.tool_calls:
        # Add the assistant's response to messages
        messages.append(response_message)
        
        # Execute each function call
        for tool_call in response_message.tool_calls:
            function_name = tool_call.function.name
            function_args = json.loads(tool_call.function.arguments)
            
            print(f"Qwen is calling {function_name} with args: {function_args}")
            
            # Call the appropriate function
            if function_name == "get_weather":
                function_response = get_weather(
                    location=function_args.get("location"),
                    unit=function_args.get("unit", "celsius")
                )
            
            # Add the function response to messages
            messages.append({
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": json.dumps(function_response)
            })
        
        # Second API call - get the final response with tool results
        second_response = client.chat.completions.create(
            model="qwen-plus",
            messages=messages
        )
        
        return second_response.choices[0].message.content
    
    # No function call needed, return the direct response
    return response_message.content

if __name__ == "__main__":
    result = chat_with_tools("What's the weather like in Beijing? Should I bring an umbrella?")
    print(f"\nFinal response:\n{result}")

Function calling opens up a world of possibilities — you can build AI agents that query databases, call APIs, interact with files, and much more.

Vision Capabilities: Qwen with Image Inputs

Qwen multimodal models support image inputs, allowing you to build vision AI applications. Here's how to use the vision API:

import os
import base64
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI(
    api_key=os.getenv("QWEN_API_KEY"),
    base_url=os.getenv("BASE_URL", "https://api.haotokai.com/v1")
)

def encode_image(image_path: str) -> str:
    """Encode an image file to base64."""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

def analyze_image(image_path: str, prompt: str = "Describe this image in detail.") -> str:
    """Analyze an image using Qwen's vision capabilities."""
    base64_image = encode_image(image_path)
    
    response = client.chat.completions.create(
        model="qwen-vl-plus",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
                    }
                ]
            }
        ],
        max_tokens=1000
    )
    
    return response.choices[0].message.content

# You can also use image URLs
def analyze_image_url(image_url: str, prompt: str = "Describe this image in detail.") -> str:
    """Analyze an image from a URL using Qwen's vision API."""
    response = client.chat.completions.create(
        model="qwen-vl-plus",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {"type": "image_url", "image_url": {"url": image_url}}
                ]
            }
        ],
        max_tokens=1000
    )
    
    return response.choices[0].message.content

if __name__ == "__main__":
    # Example with URL (replace with your own image)
    image_url = "https://example.com/image.jpg"
    result = analyze_image_url(
        image_url,
        prompt="What's in this image? Be specific and detailed."
    )
    print(result)

Vision capabilities are available on Qwen's multimodal models like qwen-vl-plus and qwen-vl-max.

Using Haotokai Gateway for Qwen API

While you can use Qwen's API directly, using a gateway like haotokai.com offers several advantages for production applications:

Why Use haotokai.com with Qwen?

One API key for everything: Access Qwen, DeepSeek, OpenAI, GLM, and more with a single key
Automatic failover: If Qwen's API has issues, traffic automatically routes to a backup provider
Cost optimization: haotokai's smart routing selects the best-priced model for each task
Global low latency: Edge endpoints worldwide for fast response times
Usage analytics: Detailed dashboards showing costs, latency, and usage patterns
English support: Full English documentation and support team

Getting Started with haotokai

Getting set up with haotokai takes less than 5 minutes:

Sign up for haotokai.com
Copy your API key from the dashboard
Set your base URL to https://api.haotokai.com/v1
Use any model name — Qwen, DeepSeek, OpenAI, etc.

All the code examples in this tutorial work with haotokai — just use your haotokai API key and base URL.

Advanced: Fallback Routing

One of the most powerful features of haotokai is automatic fallback routing. If Qwen is slow or unavailable, your requests automatically fail over to another model like DeepSeek or GPT-4o Mini.

This means 99.9% uptime for your application, even when individual providers have issues. You configure the fallback order in your haotokai dashboard — no code changes required.

Best Practices for Production

Here are some best practices to follow when using the Qwen API in production:

1. Use Environment Variables for API Keys

Never hardcode your API key in your source code. Always use environment variables or a secrets manager.

2. Implement Error Handling

from openai import APIError, RateLimitError, APIConnectionError

try:
    response = client.chat.completions.create(...)
except RateLimitError as e:
    # Handle rate limiting - implement exponential backoff
    print(f"Rate limit hit: {e}")
    # Retry after delay
except APIConnectionError as e:
    # Handle connectivity issues
    print(f"Connection error: {e}")
except APIError as e:
    # Handle other API errors
    print(f"API error: {e}")

3. Add Retry Logic

Use a retry library like tenacity to handle transient errors gracefully:

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from openai import APIError, RateLimitError

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10),
    retry=retry_if_exception_type((APIError, RateLimitError, APIConnectionError))
)
def chat_with_retry(prompt: str) -> str:
    response = client.chat.completions.create(
        model="qwen-plus",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

4. Monitor Costs and Usage

Set up budget alerts and monitor your token usage regularly. With haotokai.com, you get a unified dashboard showing all your costs across all providers, making it easy to track and optimize spending.

5. Choose the Right Model for the Task

Don't use your most powerful (and expensive) model for every request. Use routing logic to match the model to the task:

Simple tasks (classification, extraction): qwen-turbo
General use cases: qwen-plus
Complex reasoning, code, creative writing: qwen-max

Haotokai can handle this routing automatically based on your configured rules.

Troubleshooting Common Issues

"Invalid API Key" Error

Double-check that you're using the correct API key
Verify the base URL is set correctly
Ensure your account has available credits

Slow Response Times

Try a lighter model (turbo instead of plus/max)
Check your network connection
Consider using haotokai for global edge endpoints

Rate Limiting

Implement exponential backoff (as shown above)
Consider upgrading your plan for higher rate limits
Use batch processing for large workloads

Model Not Responding

Check the provider's status page
If using haotokai, your requests will automatically fail over to a backup
Verify you're using a valid model name

Conclusion

Qwen's API is a powerful, cost-effective alternative to Western AI providers. With its OpenAI-compatible interface, strong performance, and aggressive pricing, it's no wonder so many developers are adding Qwen to their AI stack.

Whether you're building a chatbot, content generation tool, AI agent, or anything else, Qwen delivers excellent value. And when combined with haotokai.com for unified access, automatic failover, and cost optimization, you get a production-ready AI infrastructure that's both powerful and affordable.

The best way to evaluate Qwen is to try it on your own use cases. The code examples in this tutorial give you everything you need to get started — pick one that matches your use case and start experimenting.

Get Started with Haotokai

Ready to start building with Qwen API? Sign up for haotokai.com today and get instant access to Qwen, DeepSeek, OpenAI, GLM, and other top AI models through a single unified API.

⚡ 5-minute integration: Fully OpenAI-compatible — just change your base URL
🌍 Global access: Low-latency endpoints worldwide
🔄 Automatic failover: 99.9% uptime guaranteed
💰 Save up to 80%: Compare to direct provider pricing
📊 Unified dashboard: All your models, costs, and analytics in one place

Start building with Qwen and haotokai →