Local HTTP server that gives any AI agent eyes and hands on Windows. Screen capture, mouse, keyboard, OCR, and UI Automation. All on localhost, all through a simple REST API.
Head-to-head on the same Windows machine, April 2026
| Operation | eyehands | Claude Computer Use | |
|---|---|---|---|
| Screenshot | 57ms | 2-4 seconds | ~50x |
| 5-action batch | 67ms | 3-5 seconds | ~50x |
| Screenshot resolution | 1920x1080 | 1456x816 | 1.7x |
| Text input | 89ms native SendInput | Clipboard paste | |
| OCR text search | Built-in | Not available | Unique |
| UI Automation | Built-in | Not available | Unique |
Screen capture at ~20fps into a rolling buffer. On-demand screenshots in 57ms. BetterCam DXGI backend. Multi-monitor. Per-Monitor DPI v2. Region and window-specific capture.
Mouse and keyboard via Windows SendInput. Goes through the Raw Input pipeline, so games, Parsec, and pointer-lock apps see the input correctly. Absolute and relative movement. Batch up to 100 actions in one call.
OCR text search via EasyOCR with frame-level caching. Windows UI Automation tree walking. Find and click elements by name instead of pixel coordinates.
Everything on localhost. No cloud round-trips. Frame-hash ETag polling for zero-token screen watching. Cached OCR returns instantly on unchanged frames.
Building Godot games? (save $20)
Any AI agent that can make HTTP calls. Python SDK, MCP server, OpenAI function calling, or raw curl.
$49
One-time payment. 7-day free trial first.