Path: blob/main/llm-docs/playwright-best-practices.md
6832 views
------Playwright Testing Best Practices
Best practices for writing reliable, maintainable Playwright tests in Quarto CLI, derived from comprehensive test development (axe-accessibility.spec.ts, 431 lines, 75 test cases across 3 formats).
Web-First Assertions
Always use Playwright's web-first assertions - they auto-retry and are more reliable than imperative DOM queries.
Text Content
Element Presence
CSS Properties
Waiting for Elements
Why Web-First Assertions?
Web-first assertions automatically retry until:
The condition is met (test passes)
Timeout occurs (test fails with clear error)
This handles timing issues gracefully without manual waitFor() calls or fixed delays.
Real-world impact from PR #14125: Converting from imperative checks to web-first assertions eliminated multiple race conditions in cross-format testing where different formats loaded at different speeds.
Role-Based Selectors
Prefer semantic role-based selectors over CSS attribute selectors.
Interactive Elements
When to Use CSS Selectors
Role-based selectors aren't always possible. Use CSS selectors for:
Custom components without ARIA roles
Testing implementation-specific classes (e.g.,
.quarto-axe-report)Dynamic content where role/name combinations are too generic
Why Role-Based Selectors?
Readability:
getByRole('tab', { name: 'Page 2' })is self-documentingAccessibility: If the selector works, the element is accessible
Resilience: CSS classes and attributes change; roles and labels are stable contracts
Maintenance: Easier to understand intent when reviewing tests
Real-world impact from PR #14125: Dashboard rescan tests (8 test cases) initially used CSS attribute selectors like a[data-bs-target="#page-2"]. Refactoring to role-based selectors made tests readable without opening the HTML fixtures.
Handling Non-Unique Selectors
When external tools produce selectors you don't control (e.g., axe-core returning generic "span"), use .first() with explanatory comments.
Pattern
When to Use .first()
Use when:
External tools generate selectors you don't control
Test focuses on interaction/integration, not selector precision
Selector is known to match multiple elements, but you only care about one
Always add comments explaining:
Why the selector may be non-unique (e.g., "axe-core produces generic selectors")
What the test is actually verifying (e.g., "hover triggers highlight, not selector uniqueness")
Why Not Fix the Selector?
In integration tests, you're often verifying end-to-end behavior with third-party libraries. The test validates that your integration code works correctly, not that the third-party library produces optimal selectors.
Real-world example from PR #14125:
Async Completion Signals
For tests that wait for async operations (network requests, library initialization, processing), add deterministic completion signals instead of arbitrary delays or polling.
Pattern
Error Handling
The signal should always be set, even on failure:
Why Not Use Fixed Delays?
Why Completion Signals?
Deterministic: Test knows exactly when async work is done
Fast: No waiting longer than necessary
Clear failures: Timeout means "work never completed" not "maybe we didn't wait long enough"
Debuggable: Missing attribute = work didn't finish or crashed
Real-world impact from PR #14125: The axe accessibility tests initially had race conditions where tests would sometimes pass/fail depending on axe-core's CDN load speed. Adding data-quarto-axe-complete in a finally block made tests deterministic - they wait exactly as long as needed and fail clearly if axe never initializes.
Advanced: Generation Counters for Rescanning
When operations can be triggered multiple times (e.g., rescanning on navigation), use generation counters to discard stale results:
From PR #14125 dashboard rescan: Users can switch tabs/pages faster than axe scans complete. Generation counters ensure old scans don't overwrite newer results.
Parameterized Tests
When testing the same behavior across multiple formats or configurations, use test.describe with a test cases array instead of separate spec files.
Pattern
When to Use Parameterized Tests
Use when:
Same assertion logic applied to multiple formats (html, revealjs, dashboard, pdf)
Testing multiple output modes (console, json, document)
Testing across configurations (themes, options, feature flags)
Benefits:
Reduces file count (1 spec file instead of 3-10)
Centralizes shared helpers
Easy to add new test cases
Clear comparison of format differences
From PR #14125: axe-accessibility.spec.ts tests 3 formats × 3 output modes = 9 base cases in a single 431-line file instead of 9 separate spec files.
Expected Failures with test.fail()
Mark known failures explicitly:
Why use test.fail():
Documents known issues in test suite
Test passes when it fails (expected behavior)
Test fails if it unexpectedly passes (signals the bug is fixed)
Better than commenting out tests or skipping with test.skip()
Summary
Four key patterns for reliable Playwright tests:
Web-first assertions -
expect(el).toContainText()notexpect(await el.textContent())Role-based selectors -
getByRole('tab', { name: 'Page 2' })notlocator('a[data-bs-target]')Explicit .first() comments - Explain why and what you're testing
Completion signals -
data-feature-completein finally blocks, not arbitrary delays
These patterns emerged from building comprehensive cross-format test coverage and debugging race conditions. They make tests:
More reliable (fewer flaky failures)
More readable (intent is clear)
Easier to maintain (resilient to markup changes)
Faster to debug (clear failure modes)
Reference implementations:
tests/integration/playwright/tests/axe-accessibility.spec.ts- 431 lines, 75 test casestests/integration/playwright/tests/html-math-katex.spec.ts- Parameterized format testing