"Inference Engineering" is now available. Get your copy here
large language

Qwen LogoQwen3 Omni Thinker

An "omni" model that can process both image and audio input

Model details

View repository

Example usage

Qwen 3 Omni is compatible with the OpenAI SDK. It takes multiple modalities of input: text, image, and audio. The "Thinker" variant of the model, implemented here, returns text.

1{
2  "model": "qwen3-omni",
3  "messages": [
4    {"role": "system", "content": "You are a helpful assistant."},
5    {
6      "role": "user",
7      "content": [
8        {"type": "text", "text": "Describe what you see and hear."},
9        {
10          "type": "image_url",
11          "image_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cars.jpg"}
12        },
13        {
14          "type": "audio_url",
15          "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cough.wav"}
16        }
17      ]
18    }
19  ],
20  "max_tokens": 2048,
21  "temperature": 0.7,
22  "stream": false
23}
Input
1from openai import OpenAI
2import os
3
4client = OpenAI(
5    api_key=os.environ["BASETEN_API_KEY"],
6    base_url="https://model-xxxxxx.api.baseten.co/environments/production/sync/v1"
7)
8
9resp = client.chat.completions.create(
10    model="qwen3-omni",
11    messages=[
12        {"role": "system", "content": "You are a helpful assistant."},
13        {"role": "user", "content": [
14            {"type": "text", "text": "Describe this image and audio content."},
15            {"type": "image_url", "image_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cars.jpg"}},
16            {"type": "audio_url", "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cough.wav"}}
17        ]}
18    ],
19    max_tokens=2048,
20    temperature=0.7,
21    stream=False,
22)
23print(resp.choices[0].message.content)
JSON output
1{
2    "id": "chatcmpl-...",
3    "object": "chat.completion",
4    "created": 1710000000,
5    "model": "qwen3-omni",
6    "choices": [
7        {
8            "index": 0,
9            "finish_reason": "stop",
10            "message": {
11                "role": "assistant",
12                "content": "I see several parked cars in front of a building and hear a short cough."
13            }
14        }
15    ],
16    "usage": {
17        "prompt_tokens": 512,
18        "completion_tokens": 24,
19        "total_tokens": 536
20    }
21}

🔥 Trending models