Docs · MCP

Your company's brain, exposed as a tool any AI agent can call.

The Quelvio MCP server lets Claude Desktop, Cursor, Claude Connectors, and any other Model Context Protocol client query your tenant's connected knowledge with synthesized, cited answers — under your employees' real identities, not a shared service account. Same retrieval and synthesis pipeline as the dashboard, exposed over standardized JSON-RPC.

Install for Claude Desktop View on GitHub

Install

Three install paths — same server, different clients.

Claude Desktop · Claude Connectors

Open Settings → Integrations → Add server. Paste:

https://mcp.quelvio.com/http

Claude walks you through OAuth sign-in with your Quelvio workspace credentials. The integration auto-binds to your employee identity — the same one used for the dashboard.

Cursor

Paste into ~/.cursor/mcp.json (or via Cursor settings UI):

JSON · ~/.cursor/mcp.json

{
  "mcpServers": {
    "quelvio": {
      "url": "https://mcp.quelvio.com/http"
    }
  }
}

Cursor's first call triggers the OAuth flow in a browser tab.

Any other MCP client

Server URL: https://mcp.quelvio.com/http
Authentication: OAuth 2.1 with PKCE; DCR-supported
Registry entry: com.quelvio/mcp-server on registry.modelcontextprotocol.io

Tools

Three tools. The first is the workhorse; the other two are zero-cost helpers.

query_knowledgeBillable

Search every connected source (Drive, SharePoint, Confluence, Slack, Notion, …) with a synthesized, cited answer.

Parameter	Type	Default	Notes
query	string	required	Natural-language, 1–2000 chars
mode	enum	standard	fast · standard · deep (see Modes)
max_sources	int	5	1–20; the 25K output-token cap may force fewer
domain	string	—	Optional taxonomy filter (e.g. engineering.platform)

Returns: a synthesized answer with [n] citations, the ranked source chunks, and a structured metadata block (query_id, coverage level, retrieval mode, synthesis model, result count, sources kept, truncated flag, kT consumed, latency, risk flags).

Example. An employee in Claude Desktop asks: "What's our deployment process for the payments service?"

Response (abridged)

We deploy the payments service via ECS rolling updates [1]. Releases
are cut by tagging the main branch — the backend-deploy workflow
applies migrations to RDS, registers a new task def, and rolls the
service [2]. Pre-deploy checklist lives at [3].

Sources:
  [1] runbooks/payments-service-deploy.md (Confluence) — Senior Platform Eng, 2 days ago
  [2] .github/workflows/backend-deploy.yml (GitHub) — Platform Team, 5 days ago
  [3] checklists/pre-deploy.md (Confluence) — Senior Platform Eng, 2 weeks ago

{
  "query_id": "<query-id>",
  "coverage": "expert",
  "retrieval_mode": "hybrid",
  "tokens_consumed": 12500,
  "latency_ms": 1180
}

list_domains0 kT · free

Discover what's indexed.

Returns every taxonomy domain in your tenant with document count, expert count, coverage level (expert · partial · none), and top contributor by authority. Use it before issuing a billable query to confirm the brain has relevant knowledge.

get_source_detail0 kT · free

Verify a citation.

Pass a query_id from an earlier query_knowledge response. Returns per-chunk provenance: document path, lifecycle state (live · stale · superseded · user_deleted), embedding timestamp + model, contributor + department, authority score, taxonomy domain. Scoped to your own queries by default — only Owner/Admin can inspect other members' queries.

Modes

Knowledge Tokens are Quelvio's unit of work. Each mode has a fixed kT cost — no surprise charges from variable token usage.

Mode	Knowledge Tokens	Synthesis	When to use
fast	1,500 kT	none — chunks only	Keyword-style retrieval. Names, IDs, 'find me the file that mentions X'.
standard (default)	12,500 kT	yes — built-in	Most questions. Returns the synthesized answer + cited sources.
deep	25,000 kT	yes — premium model + wider retrieval	Complex analytical questions; comparison across multiple sources; 'what's our position on…?'

Note

Synthesis is bundled into standard and deep. fast is the only tier that returns chunks without a synthesized answer.

Your daily kT pool depends on your plan. See /pricing for plan-specific pool sizes. The pool resets at 00:00 UTC.

Plans bill per seat — kT pools come with each seat. See /pricing for plan details.

Authentication & identity

Every query is scoped to your individual employee identity, not just your tenant. The MCP server records the per-employee identity on each call so cross-employee provenance inspection is automatic — Owner/Admin can audit, members can't snoop on each other.

OAuth 2.1 with PKCE is required (the MCP spec mandates it). The Quelvio MCP server also supports Dynamic Client Registration (RFC 7591), so MCP-aware clients like Claude Desktop register themselves on first connect — no operator pre-provisioning.

Sign-in uses your workspace's identity provider — the same one used for the dashboard at enterprise.quelvio.com. When configured, this federates to your SAML, Google Workspace, or Microsoft Entra provider. Access tokens are stored encrypted (AES-GCM) in Cloudflare KV on the MCP server — never on the client's filesystem. Token rotation and revocation happen server-side via the /oauth/revoke endpoint; admins can revoke active tokens from the dashboard.

Permission model

Three layers, all enforced server-side:

Tenant isolation. Content is stored in per-tenant collections; cross-tenant content is structurally unreachable, not just policy-gated.
Role-based feature access. Owner / Admin / Member determines which calls succeed. get_source_detail is scoped to your own queries by default — only Owner/Admin can inspect another member's queries.
Source permission filtering. permission_emails on every chunk, resolved at ingest time from each source's ACLs (Drive shares, SharePoint groups, Confluence space restrictions). MCP queries see only what your individual account would see in the source system. There is no flag, env var, or dashboard toggle that disables this filter.

Response caps & truncation

Per Claude Connectors policy, every tool response is capped at 25,000 output tokens. The MCP server enforces this deterministically:

Clamp excerpts to 200 chars
Drop lowest-rank sources (preserving ≥3)
Truncate the synthesis body

Truncation is signaled via truncated: true in the structured metadata block, alongside the query_id so the agent can call get_source_detail for full provenance on any source it wants to inspect deeper.

Streaming

When a client sends Accept: text/event-stream and mode is standard or deep, the MCP server proxies Quelvio's SSE pipeline — the synthesized answer streams token-by-token from the model to the MCP server. The MCP response itself is still a single JSON body (most MCP clients today expect that); native per-token streaming to the client lands when the spec adds output deltas.

fast never streams — it has no synthesis body to incrementally produce.

Cost transparency

Every query_knowledge response includes tokens_consumed in the structured metadata block. Your tenant's daily pool state, current period's billable usage, and projected next invoice live on the billing dashboard at enterprise.quelvio.com/billing.