Testing WinForms apps end-to-end with eyehands

by Fireal Software · ~10 min read

WinForms testing is a graveyard. TestStack.White is unmaintained. Microsoft killed Coded UI Tests in Visual Studio 2019. FlaUI is alive and works, but it’s a .NET-only library that requires writing tests in C# and knowing a fair amount of UI Automation internals. WinAppDriver exists but is finicky about sessions and nobody loves it. If you’ve ever inherited a WinForms app with no tests and been asked to “just add some”, you know the feeling.

This post shows how to use eyehands as a WinForms UI-test harness. Tests are plain Python (or plain shell, or plain anything that can make HTTP requests) that click elements by accessible name, type into fields, and assert on screen content. No FlaUI. No Selenium-for-desktop. No “just use Playwright”. Just HTTP.

The trade-off: you get a much smaller, more ergonomic test layer, at the cost of running your app on a real desktop instead of in headless mode. For most legacy WinForms apps that’s a feature, not a bug — they don’t run headless anyway.

The two layers: UI Automation and OCR

eyehands exposes two complementary ways to interact with a running WinForms app. Knowing which to use for which problem is the main skill.

UI Automation (/ui/*) walks the Windows accessibility tree — the same tree screen readers use. Every WinForms control (Button, TextBox, Label, DataGridView, etc.) exposes an automation peer that you can find by name, type, or both. It’s fast (no pixels involved), deterministic, and survives layout changes as long as control names don’t drift.

OCR (/find) uses EasyOCR to read visible text on the screen and return its pixel coordinates. It’s slower than UIA (the first call takes a few seconds to load the model) but works on anything that renders text, including custom-painted controls, third-party charts, DataGridView cells that don’t expose themselves to UIA, and images with text baked in. OCR results are cached per frame hash, so repeat calls on an unchanged screen are essentially free.

The rule of thumb: try UI Automation first, fall back to OCR. If the element has an accessible name and a well-known type, use UIA. If it’s a cell in a grid, a label rendered on a custom canvas, or text in a dialog you can’t name-match, use OCR.

Walkthrough: testing a customer entry form

Let’s say you have a WinForms app with a simple form: full name, email, phone, and a Save button that opens a “Saved successfully” toast. You want a test that fills the form, clicks Save, and asserts the toast appears. Here’s the whole thing in Python:

import requests
import os

TOKEN = open(os.path.expanduser("~/AppData/Roaming/eyehands/.eyehands-token")).read().strip()
BASE = "http://127.0.0.1:7331"
HEADERS = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"}

def ui_click(name, type_=None, window="Customer Entry"):
    body = {"name": name, "window": window}
    if type_:
        body["type"] = type_
    r = requests.post(f"{BASE}/ui/click_element", json=body, headers=HEADERS)
    r.raise_for_status()

def type_text(text):
    r = requests.post(f"{BASE}/type_text", json={"text": text}, headers=HEADERS)
    r.raise_for_status()

def find_text(text, timeout_s=3):
    """Poll /find until the text appears or timeout."""
    import time
    deadline = time.time() + timeout_s
    while time.time() < deadline:
        r = requests.get(f"{BASE}/find", params={"text": text},
                         headers={"Authorization": f"Bearer {TOKEN}"})
        data = r.json()
        if data.get("found"):
            return data["matches"][0]
        time.sleep(0.2)
    raise AssertionError(f"Text {text!r} not found within {timeout_s}s")

def test_customer_save():
    # Focus the Name field, type a value
    ui_click("Full name", type_="Edit")
    type_text("Jane Smith")

    ui_click("Email", type_="Edit")
    type_text("jane@example.com")

    ui_click("Phone", type_="Edit")
    type_text("555-0100")

    # Click Save, wait for the success toast
    ui_click("Save", type_="Button")
    find_text("Saved successfully", timeout_s=5)

if __name__ == "__main__":
    test_customer_save()
    print("PASS")

That’s a complete end-to-end test in about 40 lines. Run it with python test_customer_save.py and it reports PASS or fails with a clear AssertionError. No test runner setup, no fixtures, no magic. You can drop it into pytest by replacing the __main__ block with a normal test_ function — pytest discovers it like any other test.

A few things to notice:

- **No sleeps in the happy path.** The `find_text` helper polls at 5Hz and bails when the text shows up. That's faster and more reliable than `time.sleep(2)` everywhere.
- **Names match the actual UI labels.** If your WinForms form uses `Label + TextBox` pairs, the accessible name of the TextBox is usually the associated Label's text. UIA lines them up automatically.
- **The `type="Edit"` filter disambiguates.** Windows can have multiple controls with the same name (e.g. a Label and a TextBox both named "Email"). Filtering by control type picks the right one.

Handling modal dialogs with click_and_wait

Modal dialogs are the classic UI-test pain point. You click a button, a dialog pops up, you want to click inside the dialog. If your test clicks too early — before the dialog has rendered — the click lands on the parent window and your test fails in a confusing way.

eyehands has a Pro endpoint called /click_and_wait that solves this cleanly. It clicks at a coordinate and then polls the frame buffer until the screen actually changes, returning {"changed": true} when it does. You know the UI reacted, and you can proceed with the next step without a sleep:

def click_and_wait(x, y, timeout_ms=2000):
    r = requests.post(f"{BASE}/click_and_wait",
                      json={"x": x, "y": y, "timeout_ms": timeout_ms},
                      headers=HEADERS)
    return r.json().get("changed", False)

# Click "Delete", wait for confirmation dialog, click "Yes"
ui_click("Delete", type_="Button")
assert click_and_wait(540, 380)  # modal should open
ui_click("Yes", type_="Button", window="Confirm Deletion")

Under the hood, /click_and_wait watches the background frame buffer’s frame hash. As soon as a new frame arrives with a different hash, it returns. No busy waiting, no race conditions, no flake.

Deterministic timing: why frame hashing beats sleep

Sleeps in tests are evil. Too short and they’re flaky; too long and your test suite drags. The usual solution in web testing is “wait for element” helpers like Selenium’s WebDriverWait, which poll DOM state until an element appears. eyehands gives you the same guarantee for native Windows apps.

There are three deterministic waits to know about:

- **Wait for text (`/find`)** — polls OCR results until a specific string appears. Use this for toasts, status labels, error messages.
- **Wait for UIA element (`/ui/find`)** — polls the accessibility tree until an element matching your query exists. Use this for dialogs, menus, newly-opened windows.
- **Wait for any change (`/click_and_wait`)** — polls the frame buffer's hash until something visual changes. Use this when you don't care what changed, just that something did.

All three are cheap because eyehands caches the latest frame in a background thread at 20 FPS. Polling /find or /ui/find at 5-10Hz doesn’t touch the GPU or the process memory of your app; it just re-reads the cached frame and accessibility tree.

Running tests from CI

The big caveat with desktop UI tests is that they need a real interactive desktop session. You can’t run them on a headless Linux runner — the WinForms app wouldn’t start. You need a Windows runner with a visible desktop.

GitHub Actions’ windows-latest runner works for this, but you have to be careful about two things:

- **The runner doesn't have an interactive session by default.** Services and GUI apps launch fine, but they run on a non-interactive window station unless you configure otherwise. For most WinForms apps this is OK — they render to the invisible desktop and eyehands reads their UI tree via accessibility, which doesn't need actual pixels visible to a human. If your app uses a custom-painted control that doesn't expose itself to UIA, you may need to switch to a self-hosted runner with a real logged-in user.
- **You need to start eyehands before the test.** Either run it as a background process in a CI step, or bundle it in a setup script. Pin the version in `pyproject.toml` so the runner gets the same behavior your dev box has.

Here’s a minimal GitHub Actions job:

name: WinForms UI tests

on: [push, pull_request]

jobs:
  ui-test:
    runs-on: windows-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - name: Install eyehands
        run: pip install "eyehands[ocr,ui]==1.4.0" requests
      - name: Build the app under test
        run: dotnet build CustomerEntry.sln -c Release
      - name: Start eyehands in background
        run: Start-Process -FilePath eyehands -WindowStyle Hidden
        shell: pwsh
      - name: Start the app under test
        run: Start-Process .\CustomerEntry\bin\Release\net8.0-windows\CustomerEntry.exe
        shell: pwsh
      - name: Wait for both to be ready
        run: |
          python -c "import time, requests; time.sleep(3); print(requests.get('http://127.0.0.1:7331/ping').json())"
      - name: Run UI tests
        run: python test_customer_save.py

This isn’t fundamentally different from running Playwright tests in CI — you’re just targeting a native Windows app instead of a headless browser.

Caveats

A few things to know before you bet your test suite on this:

- **Hidden or minimized windows.** UIA can usually read them, but OCR can't — the window needs to be rendered. If your tests are failing because a window is offscreen, check with `/ui/windows` and maximize it programmatically before proceeding.
- **Per-monitor DPI.** eyehands is Per-Monitor DPI v2 aware, so coordinates are physical pixels regardless of display scaling. But your WinForms app might not be DPI-aware, which can cause click offsets if your test machine has a scale factor other than 100%. Set scale to 100% on your CI runner and it goes away.
- **Focus stealing.** If another window grabs focus mid-test (a Windows Update prompt, a Teams notification), your keyboard input goes to the wrong window. On CI this is usually fine; on a dev box, run tests in a minimized session or use a dedicated test VM.
- **Test isolation.** Each test should start with a clean app state. If your tests share a running app, add a "reset" step at the top of each test (re-open the form, close all dialogs, etc.).

When not to use this

If your WinForms app has a proper backend API, test the backend directly and skip UI tests entirely. UI tests are for flows that only exist in the GUI — data validation, wizard state, multi-step workflows, error handling in dialogs. For CRUD operations that go through a REST API, API-level tests are faster, more reliable, and easier to maintain.

eyehands-based UI tests are best for the things that couldn’t be tested any other way: verifying that the form’s save button stays disabled until required fields are filled, confirming a new row appears in a DataGridView after saving, checking that an error dialog shows the right message when the user does something wrong. For that kind of test, this setup is about as simple as it gets.

Give Claude eyes and hands on Windows

eyehands is a local HTTP server for screen capture, mouse control, and keyboard input. Open source with a Pro tier.

Try eyehands