Claude API Vision — How to Send Images and PDFs as Input

Learn how to send images to Claude via the API using three methods: Base64 encoding, URL reference, and the Files API. Step-by-step examples included for beginners.

🌐 This article was machine-translated and may contain inaccuracies. Refer to the Korean original if in doubt.

Claude can analyze images as well as text — this is called vision capability. Whether you want to describe a product photo, interpret a scanned document, or break down a chart, the Claude API accepts images through three distinct methods. This guide walks through each method so you can pick the right one for your use case and get up and running quickly.

Three Ways to Send Images to the Claude API ① Base64 Encoding Convert image to text string and embed in request body Best for: small, one-off images Watch out: payload grows with conversation length ② URL Reference Pass a publicly accessible image URL directly Best for: images already on the web Watch out: private URLs won't work ③ Files API Upload once, reference by file_id repeatedly Best for: repeated use, long chats Benefit: keeps payload lightweight

Before You Start

  • Anthropic API key — Required to call the Claude API. You can generate one at platform.claude.com.
  • Python environment — Examples in this guide use Python. Other language SDKs follow the same structure.
  • anthropic package — Install it by running pip install anthropic in your terminal.

Quick Glossary

  • Vision — The ability to accept images as input and understand their content, just like reading text.
  • Base64 — An encoding scheme that converts binary files (like images) into plain text strings so they can be embedded in JSON payloads. Think of it like packing a photo into a text message.
  • Multimodal — Handling more than one type of input at once — for example, both text and images in a single request.
  • Payload — The actual data bundle sent in an API request. Including large images inflates payload size.
  • file_id — A unique identifier returned when you upload a file via the Files API. Use this ID in later requests instead of re-uploading the file.
  • Content block — Each distinct input element (image or text) in an API message. Multiple blocks can be sent in one request as an array.

A Key Ordering Tip

According to the official documentation, Claude works best when images are placed before the text in a message. Images placed after text or interspersed with text still work, but if your use case allows it, put the image first. This is similar to how providing a long document before your question tends to yield better results.

One important platform note: on Amazon Bedrock and Google Cloud, only the Base64 encoding method is currently available. URL reference and Files API methods require a direct call to the Anthropic API.

Method 1: Base64 Encoding

Convert your image file into a Base64 string and embed it directly in the request body. No external server or public URL needed — ideal for small or one-off images.

import anthropic
import base64

# Read and encode the image file
with open("my_image.jpg", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",  # match your file's actual format
                        "data": image_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Describe this image."
                }
            ],
        }
    ],
)

print(message.content)

Success signal

You'll see a ContentBlock object printed in the terminal with type='text' followed by Claude's description of the image.

Supported image formats

  • JPEG — image/jpeg
  • PNG — image/png
  • GIF — image/gif
  • WebP — image/webp

Make sure the media_type value matches the actual file format. A mismatch will cause an error or unreliable results.

Method 2: URL Reference

If your image is already hosted online and publicly accessible, you can simply pass the URL. No encoding or uploading required.

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg",
                    },
                },
                {
                    "type": "text",
                    "text": "Describe this image."
                }
            ],
        }
    ],
)

print(message.content)

Things to watch out for

  • The URL must be publicly accessible. Local file paths or login-protected URLs will not work.
  • If the image is later removed or the URL changes, the same request will fail. For stable references, use the Files API instead.

Method 3: Files API (Upload Once, Reuse Many Times)

The Files API is the most efficient approach when you need to reference the same image across multiple requests or in a long multi-turn conversation. Upload the file once and reference it by its file_id from then on.

In multi-turn conversations, every request resends the full conversation history. If images are Base64-encoded, the full image bytes travel in every payload, ballooning request size as the conversation grows. With the Files API, only a short ID is included each time — the image bytes stay on Anthropic's servers.

import anthropic

client = anthropic.Anthropic()

# Step 1: Upload the image once
with open("image.jpg", "rb") as f:
    file_upload = client.beta.files.upload(
        file=("image.jpg", f, "image/jpeg")
    )

file_id = file_upload.id  # e.g. "file_abc123..."
print(f"Upload complete. file_id: {file_id}")

# Step 2: Reference the image by file_id
message = client.beta.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    betas=["files-api-2025-04-14"],
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "file",
                        "file_id": file_id,
                    },
                },
                {
                    "type": "text",
                    "text": "Describe this image."
                }
            ],
        }
    ],
)

print(message.content)

Success signal

First, you'll see Upload complete. file_id: file_… printed. Then Claude's image description follows. Save the file_id value — you can use it in future requests without re-uploading the image.

Common Problems & Fixes

  • Error: invalid_request_error — media_type mismatch
    Cause: The media_type value doesn't match the actual file format.
    Fix: Check the file extension and set the correct value: image/jpeg, image/png, image/gif, or image/webp.
  • URL method fails to load the image
    Cause: The URL is not publicly accessible, or requires authentication.
    Fix: Open the URL in a private/incognito browser window. If the image doesn't load, switch to Base64 or Files API.
  • Files API error — missing betas parameter
    Cause: The Files API is a beta feature and requires the betas parameter to be included.
    Fix: Use client.beta.messages.create and add betas=["files-api-2025-04-14"] as shown in the example above.
  • URL method not working on Amazon Bedrock or Google Cloud
    Cause: According to official documentation, only Base64 encoding is currently available on those platforms.
    Fix: Use Base64 encoding when working within Bedrock or Google Cloud environments.

Summary: Which Method Should You Use?

  • Base64 encoding — Simplest setup, no external dependencies. Gets heavy if you use many large images or long conversations.
  • URL reference — Shortest code when the image is already public. Doesn't work for private images.
  • Files API — Best for repeated use or multi-turn conversations. Requires an upload step, but pays off by keeping payloads small.

Next Steps

  • Send multiple images at once — Add more than one image block to the content array in a single request to analyze several images together.
  • Mix images and text — Combine image blocks and text blocks in one content array for compound analysis tasks.
  • Coordinates and bounding boxes — The official documentation's "Coordinates and bounding boxes" section covers how to reference specific regions within an image.
  • PDF input — Claude also accepts PDF files as input. Pairing this with the Files API is an efficient way to process long documents.

Keep reading

Go deeper in the community

Ask questions and share tips, or create and run your own topic board.