A vision AI assistant that sees your screen, understands what you're doing, and guides you step by step.
Download the User Manual desktop app for Windows, macOS, or Linux. When you launch it, the app docks to the right side of your screen — like a taskbar. Every other application reshapes around it, so nothing overlaps.
The first time you send a message, you'll be prompted to grant screen capture permission. This is a one-time system dialog. After that, User Manual silently captures a screenshot each time you ask a question — no manual action needed.
Type naturally. "How do I add a reverb to this track?" or "What's wrong with this error?" Your screenshot is sent alongside the message, so the AI sees exactly what you see — the application, its state, menus, toolbars, and any error messages.
The AI responds with numbered, actionable steps referencing specific buttons, menus, and keyboard shortcuts visible on your screen. On-screen pointers (numbered dots) appear directly on your display, showing you exactly where to click. Tap a pointer to dismiss it.
User Manual remembers your conversation and previous screenshots within the session. As you follow instructions and your screen changes, the AI tracks your progress and adapts its guidance. It's a multi-turn, context-aware conversation — not a one-shot answer.
Powered by Gemini 2.5 Pro and GPT-4o — multimodal models that understand UI layouts, code, terminal output, and visual content.
Screenshots resized and compressed client-side (JPEG, max 1280px) before transmission. Sub-second capture-to-response.
Server-sent events deliver tokens in real time. Answers start appearing immediately while the model generates.
Screenshots processed in-flight and never stored server-side. Your screen data is discarded after each response.
Create a free account and download the app.