GitHub Repository: quarto-dev/quarto-cli
Path: blob/main/llm-docs/playwright-best-practices.md
⁶⁸³² views

---

main_commit: ee0f68be1
analyzed_date: 2026-02-27
key_files:
  - tests/integration/playwright/tests/axe-accessibility.spec.ts
  - tests/integration/playwright/tests/html-math-katex.spec.ts

---

Playwright Testing Best Practices

Best practices for writing reliable, maintainable Playwright tests in Quarto CLI, derived from comprehensive test development (axe-accessibility.spec.ts, 431 lines, 75 test cases across 3 formats).

Web-First Assertions

Always use Playwright's web-first assertions - they auto-retry and are more reliable than imperative DOM queries.

Text Content

// ✅ Good - auto-retrying, declarative
await expect(element).toContainText('expected text');

// ❌ Bad - imperative, no auto-retry
const text = await element.textContent();
expect(text).toContain('expected text');

Element Presence

// ✅ Good - built-in waiting
await expect(element).toBeAttached();
await expect(element).toBeVisible();

// ❌ Bad - manual DOM check
const isAttached = await page.evaluate(() =>
  document.querySelector('.element') !== null
);
expect(isAttached).toBe(true);

CSS Properties

// ✅ Good - direct assertion with auto-retry
await expect(element).toHaveCSS('background-color', 'rgb(255, 0, 0)');
await expect(element).not.toHaveCSS('background-color', 'rgba(0, 0, 0, 0)');

// ❌ Bad - manual getComputedStyle
const bgColor = await element.evaluate(el =>
  window.getComputedStyle(el).backgroundColor
);
expect(bgColor).toBe('rgb(255, 0, 0)');

Waiting for Elements

// ✅ Good - use locator methods
await element.waitFor({ state: 'visible' });
await element.waitFor({ state: 'attached', timeout: 10000 });

// ❌ Bad - page-level selector waiting
await page.waitForSelector('.element');

Why Web-First Assertions?

Web-first assertions automatically retry until:

The condition is met (test passes)
Timeout occurs (test fails with clear error)

This handles timing issues gracefully without manual waitFor() calls or fixed delays.

Real-world impact from PR #14125: Converting from imperative checks to web-first assertions eliminated multiple race conditions in cross-format testing where different formats loaded at different speeds.

Role-Based Selectors

Prefer semantic role-based selectors over CSS attribute selectors.

Interactive Elements

// ✅ Good - semantic, resilient to markup changes
await page.getByRole('tab', { name: 'Page 2' }).click();
await page.getByRole('button', { name: 'Toggle sidebar' }).click();
await page.getByRole('heading', { name: 'Title' }).isVisible();
await page.getByRole('navigation').locator('a', { hasText: 'Home' }).click();

// ❌ Bad - fragile, coupled to implementation
await page.locator('a[data-bs-target="#page-2"]').click();
await page.locator('.collapse-toggle').click();
await page.locator('h1.title').isVisible();
await page.locator('nav a:has-text("Home")').click();

When to Use CSS Selectors

Role-based selectors aren't always possible. Use CSS selectors for:

Custom components without ARIA roles
Testing implementation-specific classes (e.g., .quarto-axe-report)
Dynamic content where role/name combinations are too generic

// Acceptable - testing specific implementation class
await page.locator('.quarto-axe-report').toBeVisible();

// Acceptable - no semantic role exists
await page.locator('section.quarto-axe-report-slide').toBeAttached();

Why Role-Based Selectors?

Readability: getByRole('tab', { name: 'Page 2' }) is self-documenting
Accessibility: If the selector works, the element is accessible
Resilience: CSS classes and attributes change; roles and labels are stable contracts
Maintenance: Easier to understand intent when reviewing tests

Real-world impact from PR #14125: Dashboard rescan tests (8 test cases) initially used CSS attribute selectors like a[data-bs-target="#page-2"]. Refactoring to role-based selectors made tests readable without opening the HTML fixtures.

Handling Non-Unique Selectors

When external tools produce selectors you don't control (e.g., axe-core returning generic "span"), use .first() with explanatory comments.

Pattern

// ✅ Good - explicit about why .first() is used
// Use .first() since axe-core may produce non-unique selectors (e.g., "span").
// This tests integration (hover triggers highlight) not selector uniqueness.
const element = page.locator(selector).first();
await expect(element).toHaveClass(/quarto-axe-hover-highlight/);

// ❌ Bad - no explanation, unclear intent
const element = page.locator(selector).first();
await expect(element).toHaveClass(/quarto-axe-hover-highlight/);

When to Use `.first()`

Use when:

External tools generate selectors you don't control
Test focuses on interaction/integration, not selector precision
Selector is known to match multiple elements, but you only care about one

Always add comments explaining:

Why the selector may be non-unique (e.g., "axe-core produces generic selectors")
What the test is actually verifying (e.g., "hover triggers highlight, not selector uniqueness")

Why Not Fix the Selector?

In integration tests, you're often verifying end-to-end behavior with third-party libraries. The test validates that your integration code works correctly, not that the third-party library produces optimal selectors.

Real-world example from PR #14125:

// axe-core returns CSS selectors like "span" for violations
// We test: "when hovering violation target, does page element get highlighted?"
// We don't test: "does axe produce unique selectors?" (that's axe's job)

const target = reportSlide.locator('.quarto-axe-violation-target').first();
await target.hover();

// Use .first() since axe-core may produce non-unique selectors (e.g., "span").
// This tests integration (hover triggers highlight) not selector uniqueness.
const element = page.locator(selector).first();
await expect(element).toHaveClass(/quarto-axe-hover-highlight/);

Async Completion Signals

For tests that wait for async operations (network requests, library initialization, processing), add deterministic completion signals instead of arbitrary delays or polling.

Pattern

// In application code:
async function init() {
  try {
    await loadLibrary();
    await processContent();
  } catch (error) {
    console.error('Initialization failed:', error);
  } finally {
    // Always set completion signal, even if work fails
    document.body.setAttribute('data-feature-complete', 'true');
  }
}

// In test:
await page.goto('/page.html');
await page.waitForSelector('[data-feature-complete]', { timeout: 15000 });

// Now you can safely assert on the results
await expect(page.locator('.result')).toBeVisible();

Error Handling

The signal should always be set, even on failure:

// ✅ Good - completion signal set in finally block
async function processData() {
  try {
    await doAsyncWork();
  } catch (error) {
    console.error('Processing failed:', error);
  } finally {
    document.body.setAttribute('data-processing-complete', 'true');
  }
}

// ❌ Bad - signal only set on success
async function processData() {
  try {
    await doAsyncWork();
    document.body.setAttribute('data-processing-complete', 'true');
  } catch (error) {
    console.error('Processing failed:', error);
  }
}

Why Not Use Fixed Delays?

// ❌ Bad - arbitrary delay, may be too short or unnecessarily long
await page.goto('/page.html');
await page.waitForTimeout(5000);  // Hope 5 seconds is enough?

// ✅ Good - deterministic, completes as soon as ready
await page.goto('/page.html');
await page.waitForSelector('[data-feature-complete]', { timeout: 15000 });

Why Completion Signals?

Deterministic: Test knows exactly when async work is done
Fast: No waiting longer than necessary
Clear failures: Timeout means "work never completed" not "maybe we didn't wait long enough"
Debuggable: Missing attribute = work didn't finish or crashed

Real-world impact from PR #14125: The axe accessibility tests initially had race conditions where tests would sometimes pass/fail depending on axe-core's CDN load speed. Adding data-quarto-axe-complete in a finally block made tests deterministic - they wait exactly as long as needed and fail clearly if axe never initializes.

Advanced: Generation Counters for Rescanning

When operations can be triggered multiple times (e.g., rescanning on navigation), use generation counters to discard stale results:

let scanGeneration = 0;

async function rescan() {
  const currentGeneration = ++scanGeneration;

  try {
    const results = await performScan();

    // Discard stale results if user triggered another scan
    if (currentGeneration !== scanGeneration) {
      return;
    }

    updateUI(results);
  } finally {
    // Only set completion for latest scan
    if (currentGeneration === scanGeneration) {
      document.body.setAttribute('data-scan-complete', 'true');
    }
  }
}

From PR #14125 dashboard rescan: Users can switch tabs/pages faster than axe scans complete. Generation counters ensure old scans don't overwrite newer results.

Parameterized Tests

When testing the same behavior across multiple formats or configurations, use test.describe with a test cases array instead of separate spec files.

Pattern

interface TestCase {
  format: string;
  url: string;
  expectedViolation?: string;
  shouldFail?: string;  // Reason for expected failure
}

const testCases: TestCase[] = [
  { format: 'html', url: '/html/feature.html', expectedViolation: 'color-contrast' },
  { format: 'revealjs', url: '/revealjs/feature.html', expectedViolation: 'link-name' },
  {
    format: 'dashboard',
    url: '/dashboard/feature.html',
    shouldFail: 'Dashboard has no <main> element (#13781)'
  },
];

test.describe('Feature across formats', () => {
  for (const { format, url, expectedViolation, shouldFail } of testCases) {
    test(`${format} — feature detects ${expectedViolation}`, async ({ page }) => {
      if (shouldFail) {
        test.fail();  // Mark as expected failure
      }

      await page.goto(url);
      // Shared test logic
    });
  }
});

When to Use Parameterized Tests

Use when:

Same assertion logic applied to multiple formats (html, revealjs, dashboard, pdf)
Testing multiple output modes (console, json, document)
Testing across configurations (themes, options, feature flags)

Benefits:

Reduces file count (1 spec file instead of 3-10)
Centralizes shared helpers
Easy to add new test cases
Clear comparison of format differences

From PR #14125: axe-accessibility.spec.ts tests 3 formats × 3 output modes = 9 base cases in a single 431-line file instead of 9 separate spec files.

Expected Failures with test.fail()

Mark known failures explicitly:

test('Feature that is broken in revealjs', async ({ page }) => {
  // RevealJS doesn't support this yet (#13781)
  test.fail();

  // Normal test logic - if this unexpectedly passes, Playwright will flag it
  await page.goto('/revealjs/feature.html');
  await expect(page.locator('.feature')).toBeVisible();
});

Why use test.fail():

Documents known issues in test suite
Test passes when it fails (expected behavior)
Test fails if it unexpectedly passes (signals the bug is fixed)
Better than commenting out tests or skipping with test.skip()

Summary

Four key patterns for reliable Playwright tests:

Web-first assertions - expect(el).toContainText() not expect(await el.textContent())
Role-based selectors - getByRole('tab', { name: 'Page 2' }) not locator('a[data-bs-target]')
Explicit .first() comments - Explain why and what you're testing
Completion signals - data-feature-complete in finally blocks, not arbitrary delays

These patterns emerged from building comprehensive cross-format test coverage and debugging race conditions. They make tests:

More reliable (fewer flaky failures)
More readable (intent is clear)
Easier to maintain (resilient to markup changes)
Faster to debug (clear failure modes)

Reference implementations:

tests/integration/playwright/tests/axe-accessibility.spec.ts - 431 lines, 75 test cases
tests/integration/playwright/tests/html-math-katex.spec.ts - Parameterized format testing

Playwright Testing Best Practices

Web-First Assertions

Text Content

Element Presence

CSS Properties

Waiting for Elements

Why Web-First Assertions?

Role-Based Selectors

Interactive Elements

When to Use CSS Selectors

Why Role-Based Selectors?

Handling Non-Unique Selectors

Pattern

When to Use `.first()`

Why Not Fix the Selector?

Async Completion Signals

Pattern

Error Handling

Why Not Use Fixed Delays?

Why Completion Signals?

Advanced: Generation Counters for Rescanning

Parameterized Tests

Pattern

When to Use Parameterized Tests

Expected Failures with test.fail()

Summary

Product

Resources

Company

Playwright Testing Best Practices

Web-First Assertions

Text Content

Element Presence

CSS Properties

Waiting for Elements

Why Web-First Assertions?

Role-Based Selectors

Interactive Elements

When to Use CSS Selectors

Why Role-Based Selectors?

Handling Non-Unique Selectors

Pattern

When to Use .first()

Why Not Fix the Selector?

Async Completion Signals

Pattern

Error Handling

Why Not Use Fixed Delays?

Why Completion Signals?

Advanced: Generation Counters for Rescanning

Parameterized Tests

Pattern

When to Use Parameterized Tests

Expected Failures with test.fail()

Summary

When to Use `.first()`