Multimodal Features

Multimodal Features

Chat with Video, Images, Audio, PDFs, and generate visuals.

Silkwave supports "Multimodal" interactions, meaning you can send more than just text to the AI.

Analyzing Files

You can attach files using the paperclip icon (📎) or by dragging and dropping them directly into the chat window.

Supported File Types:

Video Files: Upload .mp4 or .mov video files for analysis. The AI can summarize the video content, answer questions about specific scenes, or analyze the spoken audio.
Images: Upload .png or .jpg files for visual analysis.
Audio Files: Upload .mp3 or .wav files.
PDF Documents: Upload .pdf files. The AI can read the document to extract text, summarize long reports, or answer specific questions based on the content.

Note: Image, Video, Audio, and Document analysis capabilities depend on the specific model selected (e.g., Gemini 3.0 Pro or GPT-4o).

Generating Images

You can generate images directly within the chat using supported models such as gemini-3-pro-image-preview (Nano Banana).

Select a model capable of image generation.
Type your prompt (e.g., "Please generate a flat vector illustration of a peaceful mountain range in muted, deep colors.").
The image will appear in the chat.
Click on the image to preview it using the system's Quick Look feature, or save it to your desktop.

Rich Text & Math

Silkwave supports advanced rendering for technical users:

Markdown: Headers, lists, bold text, code blocks, tables and more.
LaTeX: Mathematical formulas.

Managing Models

Select, search, and favorite specific LLMs.

Configuring Transcription

Set up offline Apple Intelligence or Cloud APIs for speech-to-text.