MCP, Accessibility, and the Future of Agentic Browsing

Category: Web Standards, AI, Accessibility

Timeframe: 2024-Present

Role: Technical Architect, Specification Author

Technologies: MCP Protocol, Web Standards, Accessibility APIs

Related: MCP Discovery Specification

How the Model Context Protocol represents a convergence point between accessibility standards and AI-mediated web interaction.

The Convergence Thesis

There's a pattern emerging that most people haven't noticed yet: accessibility interfaces and AI agent interfaces are converging toward the same architectural requirements.

Both need:

This isn't a coincidence. It's a fundamental insight about how we should be building web services.

A Brief History of Web Interfaces

EraPrimary InterfaceMachine Accessibility
Web 1.0 (1990s)Semantic HTML documentsScreen readers parsed markup directly
Web 2.0 (2000s)JavaScript-heavy SPAsARIA attributes bolted on after the fact
API Era (2010s)REST/GraphQL for machines, UI for humansSeparate codepaths, duplicated logic
AI Era (2020s)AI agents mediate between services and usersNeed discoverable, structured interfaces

Each transition exposed the same problem: when we build for visual presentation first, we create barriers for any consumer that isn't a human with a standard browser.

Screen reader users know this intimately. They've spent decades dealing with websites that work fine visually but are unusable when the presentation layer is stripped away.

Now AI agents face the same challenge. They can scrape HTML and try to understand it, but that's fragile, slow, and error-prone. What they actually need is exactly what accessibility advocates have been asking for all along: structured, semantic, discoverable interfaces.

The MCP Discovery Pattern

The MCP Discovery specification I've been working on addresses this by providing a standard way for websites to advertise their capabilities:

{
  "mcp": {
    "spec_version": "2026-01-24",
    "status": "draft",
    "servers": [
      {
        "name": "hastebin",
        "description": "Text paste and sharing service",
        "url": "https://haste.nixc.us/mcp",
        "capabilities": ["create-paste", "retrieve-paste"]
      }
    ]
  }
}

An AI agent hitting /.well-known/mcp.json can immediately understand what services are available, how to authenticate, and what capabilities each service offers.

This is the same information a well-designed accessibility API would expose: what can be done, how to do it, and what the result will be.

Accessibility and Agents: Same Requirements

Consider what a screen reader needs to make a website usable:

  1. Structure discovery - What elements exist? What's their hierarchy?
  2. Capability enumeration - What actions can be taken? Buttons, links, forms?
  3. Semantic labels - What does each element mean?
  4. State information - Is this checkbox checked? Is this section expanded?
  5. Interaction patterns - How do I activate this?

Now consider what an AI agent needs:

  1. Service discovery - What APIs exist? What's available?
  2. Capability enumeration - What actions can be taken? What endpoints?
  3. Semantic descriptions - What does each service do?
  4. State information - What's the current status? What data exists?
  5. Interaction patterns - How do I invoke this?

The parallel is striking. Both are asking the same fundamental question: "What can I do here, and how do I do it?"

The Disappearing Visual Web

Here's the speculative part: as these technologies mature, we may see the visual web become one rendering option among many rather than the primary interface.

Consider a user with severe visual impairments today. Their experience is:

  1. Browser downloads HTML/CSS/JS
  2. Browser renders to visual pixels
  3. Accessibility layer extracts structure back out
  4. Screen reader presents information aurally

That's wildly inefficient. The page was semantic, became visual, then got re-semanticized.

A more sensible architecture:

  1. Browser fetches structured data
  2. User's preferred rendering surface presents it
  3. No wasted visual rendering step

Now consider an AI agent helping someone book a flight:

  1. Agent discovers airline's MCP endpoint
  2. Agent queries available flights directly
  3. Agent presents options through whatever interface the user prefers
  4. User confirms, agent executes booking

The visual website never enters the picture. The airline's web interface becomes just one possible rendering of their underlying services—useful for some users, unnecessary for others.

Agentic Browsing as Assistive Technology

This reframing is powerful: AI agents are a form of assistive technology.

They assist users by:

This is exactly what traditional assistive technologies do, just at a different level of abstraction.

A screen reader assists by translating visual interfaces to audio. An AI agent assists by translating user intentions to API calls. Both are bridging the gap between how systems present themselves and how users want to interact.

Implications for Web Architecture

If we accept this convergence thesis, several architectural implications follow:

1. MCP as Accessibility Infrastructure

MCP discovery documents could evolve to serve both AI agents and assistive technologies. The same endpoint that tells Claude about your API could tell a screen reader about your application's capabilities.

2. Content-Presentation Separation at the Protocol Level

Instead of HTML as the universal format with CSS for presentation, we might see:

3. Browser Evolution

Browsers could become:

4. Progressive Enhancement Redefined

Traditional progressive enhancement: core HTML, enhanced with CSS and JS.

Future progressive enhancement: core structured data, enhanced with visual rendering when useful, accessible through MCP when an agent is involved.

My Live Implementations

I've implemented MCP discovery on several services to explore these ideas:

Hastebin Service

haste.nixc.us provides paste functionality with MCP support:

Markdown Renderer

md.colinknapp.com offers markdown rendering:

This Website

colinknapp.com/.well-known/mcp.json demonstrates the discovery pattern:

Future Speculations

Browser-Native MCP (Near term)

Browser extensions already exist that read MCP discovery documents. Native browser support seems inevitable:

Accessibility-First Enables Agent-First (Medium term)

Organizations that invested in accessibility will find their services naturally agent-ready:

Custom Rendering Surfaces (Long term)

Users might configure personal rendering preferences:

The underlying data would be standard; the presentation would be user-controlled.

The Visual Web as Legacy Interface (Very long term)

For certain use cases, visual web browsing might become what terminal interfaces are today: still used by some, but not the primary interaction mode for most people.

Many interactions could be entirely mediated:

The visual interface remains for cases where human judgment of visual content matters, but becomes optional for transactional interactions.

Observations

  1. Accessibility work is agent work: Investments in accessibility now pay dividends as AI agents become prevalent.
  2. MCP is an accessibility standard: Viewed through this lens, MCP discovery is about making services accessible to all consumers, not just visual browsers.
  3. The future is multi-modal: No single interface will dominate. Services need to support visual, aural, agent-mediated, and other interaction modes.
  4. Structure over style: The more we can represent interactions as structured data, the more flexible our systems become.
  5. User control over presentation: Ultimately, users should control how they receive and interact with information, not service providers.

Conclusion

The Model Context Protocol and its discovery mechanism aren't just about AI agents talking to services. They represent a broader shift toward treating the visual web as one possible interface among many.

This shift validates decades of accessibility advocacy: the insistence that content should be separate from presentation, that interactions should be semantically meaningful, that users should control their experience.

AI agents and assistive technologies are converging on the same requirements because they face the same fundamental challenge: interacting with services designed primarily for visual human consumption.

The solution for both is the same: structured, discoverable, semantic interfaces. MCP discovery is one implementation of that solution. More will follow.

For the formal specification of MCP Discovery via Well-Known URI, see the specification document.