Agentium agents accept not only text but also images, audio, and files. Use the MessageContent type and ContentPart[] to send multi-modal input to vision and audio-capable models.
Not all providers support all content types. When an unsupported type is passed, the provider logs a warning and either skips the content or substitutes a placeholder.
Content Type
OpenAI
Anthropic
Google/Vertex
AWS Claude
AWS Bedrock
Azure OpenAI
Azure Foundry
Ollama
Image (URL)
Yes
Yes
Yes
Yes
No
Yes
Model-dependent
No
Image (base64)
Yes
Yes
Yes
Yes
Yes*
Yes
Model-dependent
Yes
Audio (base64)
Yes
No
Yes
No
No
Yes
No
No
File (URL)
Yes
Yes
Yes
Yes
No
Yes
No
No
File (base64)
Yes
Yes
Yes
Yes
Yes*
Yes
No
No
Ollama image support requires a vision-capable model (e.g., llava, bakllava, llama3.2-vision).
AWS Bedrock multi-modal support (*) depends on the specific model. Amazon Nova supports images; document support varies by model.
AWS Claude supports the same multi-modal features as the direct Anthropic provider.
Azure OpenAI supports the same multi-modal features as the direct OpenAI provider.
Azure AI Foundry vision support depends on the model (e.g., Phi-3.5-vision-instruct supports images).
CSV files can be sent to Anthropic and OpenAI as file input. The model reads and analyzes the data directly:
import { Agent, anthropic, type ContentPart } from "@agentium/core";import { readFileSync } from "node:fs";const agent = new Agent({ name: "DataAnalyst", model: anthropic("claude-sonnet-4-6"), instructions: "Analyze data files. Provide insights with specific numbers.",});// From a local CSV fileconst csvData = readFileSync("sales-data.csv").toString("base64");const result = await agent.run([ { type: "text", text: "Analyze this sales data. What are the top 3 products by revenue?" }, { type: "file", data: csvData, mimeType: "text/csv", filename: "sales-data.csv" },] as ContentPart[]);console.log(result.text);// "Based on the sales data, the top 3 products by revenue are:// 1. Widget Pro - $142,500 (1,425 units)// 2. Gadget Plus - $98,200 (982 units)// 3. Tool Basic - $67,800 (2,260 units)"
PDF documents can be sent via URL (no download needed) or base64:
import { Agent, anthropic, type ContentPart } from "@agentium/core";const agent = new Agent({ name: "DocumentReader", model: anthropic("claude-sonnet-4-6"), instructions: "Extract key information from documents. Be thorough but concise.",});// PDF via URL — Anthropic fetches it directlyconst result = await agent.run([ { type: "text", text: "Summarize the key findings in this research paper." }, { type: "file", data: "https://example.com/research-paper.pdf", mimeType: "application/pdf", filename: "paper.pdf", },] as ContentPart[]);// PDF via base64import { readFileSync } from "node:fs";const pdfData = readFileSync("contract.pdf").toString("base64");const contractResult = await agent.run([ { type: "text", text: "What are the payment terms and termination clauses?" }, { type: "file", data: pdfData, mimeType: "application/pdf", filename: "contract.pdf" },] as ContentPart[]);
Most providers cannot process Excel (.xlsx) files directly. Google Gemini is the exception — it handles XLSX natively via inlineData.For other providers, convert to CSV first:
When exposing agents via Express, you can accept file uploads and convert them to ContentPart[]. The transport layer provides buildMultiModalInput for this:See File Upload for how to handle multipart/form-data and build multi-modal input from uploaded files.