QA-testing a legacy WinForms app with Claude Code

by Fireal Software · ~7 min read

A client handed me a WinForms tool written in 2012. No automated tests. No public API. No Selenium bindings. The previous “QA process” was: run the app, click around, eyeball the output, ship. I had to add regression testing before touching the code.

This post walks through what I actually did: wire Claude Code to eyehands, write natural-language test prompts, and have Claude drive the app end-to-end. It’s not the test framework you’d build from scratch — but for a legacy tool with no existing tests, it’s dramatically better than nothing.

The app

A data-processing GUI: user picks an input CSV, configures some filters, clicks “Process”, and the app writes an output CSV. Maybe 30 controls across 4 tabs. WinForms with some custom controls. The code base is ~40k lines of C#, half of which I don’t fully understand.

Why not Robot Framework / WhiteLibrary / FlaUI?

Tried them. For different reasons, each is painful for a one-off:

What I actually wanted was “tell Claude to open the tool, load a test CSV, run it, and verify the output matches”. Natural language → test run. That’s where eyehands + Claude Code came in.

Setup

pip install eyehands
eyehands --install-skill
eyehands

Claude Code with the skill loaded. CLAUDE.md in the test directory with one-line context:

The app under test is at C:\TestApp\TestApp.exe. It’s a WinForms data-processing tool. Test inputs are in test-data/. Expected outputs are in test-data/expected/.

The test prompt

Run regression test 01:
1. Launch TestApp.exe
2. Click "File → Open" and load test-data/input-01.csv
3. On the Filters tab, check "Remove duplicates" and "Normalize casing"
4. On the Output tab, set the format dropdown to "JSON"
5. Click "Process"
6. Wait for the "Processing complete" dialog
7. Save the output to test-data/actual-01.json
8. Diff actual-01.json against test-data/expected/expected-01.json
9. Report pass or fail

Claude read the prompt, loaded the eyehands skill, and started walking through the steps.

What Claude did

Step 1 — launch the app. Bash call: start C:\TestApp\TestApp.exe &. Polled /ui/windows until the TestApp window showed up.

Step 2 — File → Open. POST /ui/click_element with {"name": "File"} to open the menu. Then {"name": "Open..."} to trigger the file dialog. The standard Windows file-open dialog appeared. Claude typed the path into the filename field with POST /type_text, pressed Enter with POST /key {"vk": 13}.

Step 3 — Filters tab. POST /ui/click_element with {"name": "Filters"} to switch tabs. Then {"name": "Remove duplicates"} and {"name": "Normalize casing"} to toggle the checkboxes.

Step 4 — Output tab dropdown. This was the one place Claude had to fall back from UIA to OCR. The custom format dropdown was a WinForms composite control that didn’t expose its items through UIA. Claude called /ui/click_element on the dropdown (which opened it), then /find?text=JSON to locate the JSON option, then /click_at.

Step 5 — Process. POST /ui/click_element with {"name": "Process"}.

Step 6 — wait for the dialog. This was clever — Claude called /latest with If-None-Match and polled at 500ms intervals, waiting for a screen change. When the dialog appeared, the hash bumped, Claude got a 200, OCRed for “Processing complete”, and confirmed.

Step 7 — save the output. The output was already written to the filesystem by TestApp itself — the dialog just confirmed completion. Claude copied it with cp test-data/latest-output.json test-data/actual-01.json.

Step 8 — diff. Claude ran diff test-data/actual-01.json test-data/expected/expected-01.json via the Bash tool. One-line diff.

Step 9 — report. “Test 01 failed — 3 lines different. Expected created_at field in expected file is missing from actual. Possible timezone handling regression.” Fed me the diff.

Total time: ~45 seconds per test run. Total token cost: ~3,500 tokens per run.

What would have been painful without eyehands

File open dialog automation. Without UI Automation, the file path input field would need to be located by pixel coordinates or OCR. With UIA, it’s a named control you can type into directly.

Tab switching. WinForms tabs show up in the UIA tree as TabItemControl nodes. Without UIA, you’d click at fixed coordinates — brittle to theme and DPI changes.

Dropdown fallback. The one case where UIA didn’t work gracefully degraded to OCR. Without eyehands, you’d write a separate fallback path for every custom control. With eyehands, /find?text=JSON is the same API as the UIA calls, and Claude naturally tried it.

Waiting for dialogs. Screenshotting every 500ms for 30 seconds would cost ~60 × 1500 = 90,000 image tokens just to wait. With frame-hash polling, it costs essentially nothing until something actually changes.

The real value

The real value isn’t “I have a test suite now” — it’s “I can write new tests by typing English into a prompt”. Adding a new regression test is a one-minute task: I write the steps in natural language, Claude runs them, and if it fails I have a diff I can investigate.

For a legacy tool that would otherwise have zero automated testing, this is a step change. Not as good as a real test framework written from scratch, but enormously better than nothing.

What I’d add next

Test recording. Having Claude “remember” the exact sequence of calls it made during a successful test run, and replay them deterministically. This removes the natural-language layer for regression runs and only uses it for authoring new tests.

Visual diff on state. When a test fails due to UI changes rather than data changes, I want a side-by-side screenshot comparison. eyehands’ /latest + diff tooling could produce this.

Parallel runs. Right now tests are serial because there’s only one screen. Running multiple TestApps in parallel with distinct UIA window handles would require eyehands to support ?window= filtering on UIA calls, which is a feature I should add.

Install

pip install eyehands
eyehands --install-skill
eyehands

*Legacy WinForms apps are the quiet hellscape of enterprise software. Most of them never get real test suites because it's too much ceremony. If "tell Claude to test this" is good enough, it unlocks regression testing for a class of software that would otherwise never get any.*

Give Claude eyes and hands on Windows

eyehands is a local HTTP server for screen capture, mouse control, and keyboard input. Open source with a Pro tier.

Try eyehands