large language

NVIDIA logoNVIDIA Nemotron Nano 12B V2 VL

Open source vision language model by NVIDIA for document processing

Model details

View repository

Example usage

The NVIDIA Nemotron Nano 12B V2 VL model is a 12 billion-parameter vision-language model that can ingest images (or videos) alongside text and generate detailed, contextual text responses — for example summarizing visuals, performing OCR, or answering questions about images.

The bar chart shows accuracy of Nemotron Nano VL and Nemotron Nano 2 VL models across visual benchmarks for multi-image understanding, document intelligence, and video captioning.Nemotron Nano 2 VL delivers improved accuracy across visual benchmarks for multi-image understanding, document intelligence, and video captioning.
Input
1from openai import OpenAI
2import os
3
4model_id
5
6# Configure your deployment
7client = OpenAI(
8    api_key=os.environ.get("BASETEN_API_KEY"),
9    base_url=f"https://model-{model_id}.api.baseten.co/environment/production/sync/v1"
10)
11
12# Test the model with streaming
13stream = client.chat.completions.create(
14    model="",  # Use the served model name from config
15    messages=[
16        {
17            "role": "user",
18            "content": [
19                {
20                    "type": "image_url",
21                    "image_url": {
22                        "url": "https://upload.wikimedia.org/wikipedia/commons/f/fa/Grayscale_8bits_palette_sample_image.png"
23                    }
24                },
25                {
26                    "type": "text",
27                    "text": "Describe this image in detail."
28                }
29            ]
30        }
31    ],
32    stream=True
33)
34
35# Stream the response
36for chunk in stream:
37    if chunk.choices[0].delta.content is not None:
38        print(chunk.choices[0].delta.content, end='', flush=True)
39print()
40
41"""
42This is a black and white image of a bird, which appears to be a parrot,
43perched on a curved metal stand. The bird is facing the left side of the
44image. It has a curved beak and its wings are slightly folded. The bird's
45feathers are short and fluffy, and it has a large, round head. Behind the
46bird, there are what appear to be bushes or shrubs.
47"""
JSON output
1{
2    "id": "143",
3    "choices": [
4        {
5            "finish_reason": "stop",
6            "index": 0,
7            "logprobs": null,
8            "message": {
9                "content": "[Model output here]",
10                "role": "assistant",
11                "audio": null,
12                "function_call": null,
13                "tool_calls": null
14            }
15        }
16    ],
17    "created": 1741224586,
18    "model": "",
19    "object": "chat.completion",
20    "service_tier": null,
21    "system_fingerprint": null,
22    "usage": {
23        "completion_tokens": 145,
24        "prompt_tokens": 38,
25        "total_tokens": 183,
26        "completion_tokens_details": null,
27        "prompt_tokens_details": null
28    }
29}

large language models

See all
Z AI
Model API
LLM

GLM 4.6

4.6
Qwen Logo
LLM

Qwen 3 32B

V3 - TRT-LLM - H100

NVIDIA models

See all
NVIDIA logo
LLM

Llama 3.1 Nemotron 70B

3.1 - Nemotron - A100

🔥 Trending models