Agent Sessions — E2E Tests
Automated dogfooding tests for the Agent Sessions window using a compile-and-replay architecture powered by playwright-cli and Copilot CLI.
Mocking Architecture
These tests run the real Sessions workbench with only the minimal set of services mocked — specifically the services that require external backends (auth, LLM, git). Everything downstream from the mock agent's canned response runs through the real code paths.
What's Mocked (Minimal)
| Service | Mock | Why |
|---|---|---|
IChatEntitlementService | Returns ChatEntitlement.Free | No real Copilot account in CI |
IDefaultAccountService | Returns a fake signed-in account | Hides the "Sign In" button |
IGitService | Resolves immediately (no 10s barrier) | No real git extension in web tests |
Chat agents (copilotcli, etc.) | Canned keyword-matched responses with textEdit progress items | No real LLM backend |
mock-fs:// FileSystemProvider | InMemoryFileSystemProvider registered directly in the workbench (not extension host) | Must be available before any service tries to resolve workspace files |
| GitHub authentication | Always-signed-in mock provider (extension) | No real OAuth flow |
| Code Review command | Returns canned review comments per file (extension) | No real Copilot AI review |
| PR commands (Create/Open/Merge) | No-op handlers that log and show info messages (extension) | No real GitHub API |
What's Real (Everything Else)
The following services run with their real implementations, ensuring tests exercise the actual code paths:
ChatEditingService— ProcessestextEditprogress items from the mock agent, createsIModifiedFileEntryobjects with real before/after diffs, and computes actuallinesAdded/linesRemovedfrom content changesChatModel— Routes agent progress throughacceptResponseProgress()ChangesViewPane— Reads file modification state fromIChatEditingServiceobservables and renders the tree with real diff statsDiff editor — Opens a real diff view when clicking files in the changes list
Context keys —
hasUndecidedChatEditingResourceContextKey,hasAppliedChatEditsContextKeyare set by realModifiedFileEntryStateobservationsMenu actions — "Create PR", "Accept", "Reject" buttons appear based on real context key state
CodeReviewService— Orchestrates review requests, processes results from the mockgithub.copilot.chat.codeReview.runcommand, and stores commentsCodeReviewToolbarContribution— Shows the Code Review button in the Changes view toolbar based on real context key state
Data Flow
The mock agent is the only point where canned data enters the system. Everything downstream uses real service implementations.
Code Review & PR Button Flow
The PR buttons (Create PR, Open PR, Merge) are contributed via the mock extension's package.json menus, gated by chatSessionType == copilotcli. The chatSessionType context key is derived from the session URI scheme (getChatSessionType()), which returns copilotcli for mock sessions.
Why the FileSystem Provider Is Registered in the Workbench
The mock-fs:// InMemoryFileSystemProvider is registered directly on IFileService inside TestSessionsBrowserMain.createWorkbench() — not in the mock extension. This is critical because several workbench services (SnippetsService, AgenticPromptFilesLocator, MCP, etc.) try to resolve files in the workspace folder before the extension host activates. If the provider were only registered via vscode.workspace.registerFileSystemProvider() in the extension, these services would see ENOPRO: No file system provider errors and fail silently.
The mock extension still registers a mock-fs provider via the extension API (needed for extension host operations), but the workbench-level registration is the source of truth.
File Edit Strategy
Mock edits target files that exist in the mock-fs:// file store so the ChatEditingService can compute real before/after diffs:
Existing files (e.g.
/mock-repo/src/index.ts,/mock-repo/package.json) — edits use a full-file replacement range (line 1 → line 99999) so the editing service diffs the old content against the new contentNew files (e.g.
/mock-repo/src/build.ts) — edits use an insert-at-beginning range, producing a "file created" entry in the changes view
Mock Workspace Folder
The workspace folder URI is mock-fs://mock-repo/mock-repo. The path /mock-repo (not root /) is used so that basename(folderUri) returns "mock-repo" — this is what the folder picker displays. All mock files are stored under this path in the in-memory file store.
How It Works
There are two phases:
Phase 1: Generate (uses LLM — slow, run once)
For each .scenario.md file, the generate script:
Starts the Sessions web server and opens the page in
playwright-cliTakes an accessibility tree snapshot of the current page
Sends each natural-language step + snapshot to Copilot CLI, which returns the exact
playwright-clicommands (e.g.click e43,type "hello")Executes the commands to advance the UI state for the next step
Writes the compiled commands to a
.commands.jsonfile in thescenarios/generated/folder
The .commands.json files are committed to git — they're the deterministic test plan that everyone runs.
Phase 2: Test (no LLM — fast, deterministic)
The test runner reads each .commands.json and replays the playwright-cli commands mechanically. No LLM calls, no regex matching, no icon stripping. Just sequential commands and assertions.
When to Re-generate
Run npm run generate when:
You add a new
.scenario.mdfileThe UI changes and refs are stale (tests start failing)
You modify an existing scenario's steps
File Structure
Supporting files outside e2e/:
Prerequisites
VS Code compiled (
out/at the repo root):Dependencies installed:
Copilot CLI available (for
npm run generateonly):
Running
Example test output:
Writing a New Scenario
Create a new
NN-description.scenario.mdfile inscenarios/. Files are sorted by name and run in order.Use this format:
Run
npm run generateto compile it into a.commands.jsonfile.Run
npm testto verify it works.Commit both the
.scenario.mdand.commands.jsonfiles.
Step Language
Write steps in plain English. The Copilot agent interprets them against the page's accessibility tree. Common patterns:
| Pattern | Example |
|---|---|
| Click a button | Click button "Cloud" |
| Type in an input | Type "hello" in the chat input |
| Press a key | Press Enter |
| Verify visibility | Verify the repository picker dropdown is visible |
| Verify button state | Verify the "Send" button is disabled |
You're not limited to these patterns — the agent understands natural language.
The .commands.json Format
Each compiled step looks like:
For assertions, the agent outputs a snapshot command followed by an assertion comment:
The test runner understands these comment-based assertions:
# ASSERT_VISIBLE: <text>— checks snapshot contains the text# ASSERT_DISABLED: <label>— checks button has[disabled]# ASSERT_ENABLED: <label>— checks button doesn't have[disabled]
How a Step Executes (Worked Example)
Let's trace Click button "Cloud" through both phases.
Generate phase — the agent sees the accessibility tree snapshot:
Copilot CLI returns: click e143
This is saved to .commands.json and the click is executed to advance state.
Test phase — the runner reads:
It shells out to playwright-cli click e143. Done. No parsing, no matching.
Tips
Use exact button labels as they appear in the UI.
One action per step — keep steps atomic for clear failure messages.
Order matters — scenarios run sequentially; an Escape is pressed between them.
Prefix filenames with numbers (
01-,02-, …) to control execution order.Re-generate selectively:
npm run generate -- 01-repoto recompile one scenario.
Testing File Diffs
To test that chat responses produce real file diffs:
Use a message keyword that triggers file edits in the mock agent (e.g. "build", "fix" — see
getMockResponseWithEdits()inweb.test.ts)The mock agent emits
textEditprogress items that flow through the realChatEditingServiceOpen the secondary side bar to see the Changes view
Assert file names are visible in the changes tree
Click a file to open the diff editor and assert content is visible
Example scenario:
Important: Don't assert hardcoded line counts (e.g. +23). Instead assert on file names and content snippets — the real diff engine computes the actual counts, which may change as mock file content evolves.
Adding Mock File Edits
To add new keyword-matched responses with file edits, update getMockResponseWithEdits() in src/vs/sessions/test/web.test.ts:
For existing files — target URIs whose paths match
EXISTING_MOCK_FILES(files pre-seeded in the mock extension's file store). TheemitFileEdits()helper uses a full-file replacement range so theChatEditingServicecomputes a real diff.For new files — target any other path. The helper uses an insert range for these, producing a "file created" entry.
Mock file store — to add or change pre-seeded files, update
MOCK_FILESinextensions/sessions-e2e-mock/extension.jsAND updateEXISTING_MOCK_FILESinweb.test.tsto match. All paths must be under/mock-repo/(e.g./mock-repo/src/newfile.ts).