5 things you can automate with eyehands + Claude Code

by Fireal Software · ~8 min read

Claude Code is very good at reading and editing files, running shell commands, and navigating a git repo. It’s very bad at clicking a button in Outlook, reading a Grafana panel, or driving a WinForms dialog. Its built-in tools stop at the terminal.

eyehands fills that gap. It’s a local HTTP server that runs on 127.0.0.1:7331 and exposes a small REST API for screen capture, mouse, keyboard, OCR, and Windows UI Automation. Claude Code calls it with curl. It ships with a skill file (SKILL.md) that teaches Claude to prefer the accessibility tree and OCR over blind screenshots, which keeps token usage in check.

Below are five real recipes I use on my own machine. Each one replaces a chore I used to do by hand. None of them need special permissions beyond running eyehands and Claude Code.

1. Inbox triage in native Outlook

I get a lot of low-value email. Newsletters, vendor pings, GitHub notifications that duplicate what’s in my actual feed. Instead of rules (brittle) or a dedicated triage tool (more software), I ask Claude Code to open Outlook, read the sender column via OCR, and delete anything that matches a list I keep in a text file.

The UI Automation endpoint finds Outlook’s window and clicks the “Inbox” folder by name:

TOKEN=$(cat ~/AppData/Roaming/eyehands/.eyehands-token)

# Find the Outlook window and open the Inbox folder
curl -s -H "Authorization: Bearer $TOKEN" \
  "http://127.0.0.1:7331/ui/windows" | jq '.windows[] | select(.name | contains("Outlook"))'

curl -s -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name":"Inbox","type":"TreeItem","window":"Outlook"}' \
  http://127.0.0.1:7331/ui/click_element

Then Claude reads the visible messages with /find, matches senders against my ignore list, and fires a keyboard Delete for each match. Outlook’s Undo buffer catches anything I wanted to keep — and Claude logs what it deleted to a file I can scan later. Triage that used to take ten minutes is now a one-shot /eyehands triage prompt.

2. Log tail scraping from a terminal window

The classic problem: a process writes logs to a console window I can see but can’t pipe. Maybe it’s a legacy service, maybe it’s a vendor tool that refuses to write to stdout, maybe it’s a game engine. You want alerts when an ERROR line scrolls past, but you can’t tail -f what isn’t a file.

eyehands solves this by polling. Claude Code sets up a loop that screenshots the terminal’s region every second or two, runs OCR over it, and flags lines containing the word ERROR. Because /find caches OCR results per frame hash, repeat polls on an unchanged screen return instantly — no wasted CPU.

# Poll a region for ERROR text and print any matches
while true; do
  result=$(curl -s -H "Authorization: Bearer $TOKEN" \
    "http://127.0.0.1:7331/find?text=ERROR&x=100&y=200&w=900&h=500")
  found=$(echo "$result" | jq -r '.found')
  if [ "$found" = "true" ]; then
    echo "$result" | jq '.matches[]'
  fi
  sleep 2
done

I’ve used this exact pattern to watch a Unity build log for asset import errors while I work on something else. Claude Code wraps the loop, tells me only when something interesting shows up, and stops when I ask it to. Faster to write than any “real” log-watching solution I’d build in Python or Go.

3. Daily dashboard screenshots without a headless browser

I run a small set of dashboards that don’t have good data-export stories: a Grafana board, a custom status page for the license server, and a couple of SaaS admin panels. I want a daily snapshot of each, saved to disk with the date in the filename, so I can scroll back through a week and spot anomalies.

A traditional approach would wire up Puppeteer or Playwright with login cookies, then generate a PNG per page. That works, but you’re maintaining a headless-browser automation stack forever. With eyehands, Claude Code just opens the real Chrome window I use anyway, navigates to each URL, and hits /screenshot:

# Focus Chrome and navigate to a URL via the address bar
curl -s -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name":"Google Chrome","type":"Window"}' \
  http://127.0.0.1:7331/ui/click_element

# Ctrl+L focuses the address bar; type the URL; press Enter
curl -s -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"vk":76,"modifiers":["ctrl"]}' \
  http://127.0.0.1:7331/key

curl -s -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"text":"https://grafana.internal/d/api-latency"}' \
  http://127.0.0.1:7331/type_and_enter

# Wait for the page, then save a PNG
sleep 3
curl -s -H "Authorization: Bearer $TOKEN" \
  "http://127.0.0.1:7331/screenshot?format=png&raw=1" \
  -o "dashboards/$(date +%Y-%m-%d)-grafana.png"

No Playwright stack. No cookie jar. No headless mode flakiness. It runs on the same session I’m already logged into. Scheduled with Windows Task Scheduler, it writes a folder of PNGs I can scrub through whenever I want.

4. Deploy health check via reference-image comparison

After a deploy, I usually eyeball the admin dashboard to make sure nothing looks broken. The columns are populated, the charts render, the sidebar isn’t missing a tab. I can automate that with an image diff.

The recipe: save a “known good” screenshot of the admin page once. After each deploy, Claude Code takes a fresh screenshot and compares pixel differences against the reference. If the diff exceeds a threshold (Claude uses Pillow for this — eyehands already needs it as a dep), it flags the deploy for manual review.

# Capture current state
curl -s -H "Authorization: Bearer $TOKEN" \
  "http://127.0.0.1:7331/screenshot?format=png" \
  -o current.png

# Python diff (Claude runs this in the Bash tool)
python -c "
from PIL import Image, ImageChops
ref = Image.open('admin-reference.png')
cur = Image.open('current.png').resize(ref.size)
diff = ImageChops.difference(ref, cur).getbbox()
print('DIFFER' if diff else 'OK')
"

This catches the kind of regressions automated tests miss: an empty container because the JSON schema changed, a CSS variable that stopped resolving, a nav item that got accidentally deleted in a refactor. It doesn’t catch subtle bugs, but it catches 80% of “oh no, did I just break the admin page” moments in about a second.

5. Form auto-fill in legacy Win32 apps

Every medium-sized business has one. A Win32 or WinForms tool nobody wants to touch, built in the early 2010s, with no REST API and no export capability. You have to fill in forms by hand. Maybe it’s a vendor’s order entry tool, maybe it’s a compliance reporting app, maybe it’s an internal HR thing.

eyehands lets Claude fill these out from a spreadsheet or a CSV. UI Automation walks the form’s control tree by name, and /type_and_enter types each value in a single round trip instead of character-by-character:

# Click a field by its accessible name
curl -s -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name":"Customer Name","type":"Edit","window":"Order Entry"}' \
  http://127.0.0.1:7331/ui/click_element

# Type the value and press Enter
curl -s -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"text":"Acme Corporation"}' \
  http://127.0.0.1:7331/type_and_enter

Chain this over a CSV of orders, and what used to be an hour of data entry becomes a few minutes of Claude Code running the loop while you work on something else. /click_and_wait handles the case where the form opens a confirmation dialog — it clicks the button, waits for the screen to change, and returns {"changed": true} so Claude knows to proceed without another screenshot round-trip.

The composite actions (/click_at, /click_and_wait, /type_and_enter, /batch) are all Pro features. If you’re doing any real volume of form automation, they’re worth the $19 — a single batch POST can replace dozens of individual calls and shave seconds off every iteration.

Going further

These are the five recipes I use most. There are a lot more — testing WinForms apps end-to-end, automating games with SendInput, driving Parsec sessions to a remote PC, OCR’ing closed captions on a video for transcription, turning a tablet into a secondary “macro pad” via the same API. The common thread is: if you can see it on your screen, Claude Code can see it too, and if you can click it, Claude can click it.

Two follow-up posts go deeper on specific use cases. Testing WinForms apps end-to-end with eyehands turns the form auto-fill pattern into a full test harness. Automating games with the SendInput pipeline covers the low-level input plumbing that makes eyehands compatible with pointer-lock applications where PyAutoGUI and mouse_event silently fail.

Give Claude eyes and hands on Windows

eyehands is a local HTTP server for screen capture, mouse control, and keyboard input. Open source with a Pro tier.

Try eyehands