Your company's brain, exposed as a tool any AI agent can call.
The Quelvio MCP server lets Claude Desktop, Cursor, Claude Connectors, and any other Model Context Protocol client query your tenant's connected knowledge with synthesized, cited answers — under your employees' real identities, not a shared service account. Same retrieval and synthesis pipeline as the dashboard, exposed over standardized JSON-RPC.
Install
Three install paths — same server, different clients.
Claude Desktop · Claude Connectors
Open Settings → Integrations → Add server. Paste:
Claude walks you through OAuth sign-in with your Quelvio workspace credentials. The integration auto-binds to your employee identity — the same one used for the dashboard.
Cursor
Paste into ~/.cursor/mcp.json (or via Cursor settings UI):
{
"mcpServers": {
"quelvio": {
"url": "https://mcp.quelvio.com/http"
}
}
}Cursor's first call triggers the OAuth flow in a browser tab.
Any other MCP client
- Server URL:
https://mcp.quelvio.com/http - Authentication: OAuth 2.1 with PKCE; DCR-supported
- Registry entry:
com.quelvio/mcp-serveronregistry.modelcontextprotocol.io
Tools
Three tools. The first is the workhorse; the other two are zero-cost helpers.
Search every connected source (Drive, SharePoint, Confluence, Slack, Notion, …) with a synthesized, cited answer.
| Parameter | Type | Default | Notes |
|---|---|---|---|
| query | string | required | Natural-language, 1–2000 chars |
| mode | enum | standard | fast · standard · deep (see Modes) |
| max_sources | int | 5 | 1–20; the 25K output-token cap may force fewer |
| domain | string | — | Optional taxonomy filter (e.g. engineering.platform) |
Returns: a synthesized answer with [n] citations, the ranked source chunks, and a structured metadata block (query_id, coverage level, retrieval mode, synthesis model, result count, sources kept, truncated flag, kT consumed, latency, risk flags).
Example. An employee in Claude Desktop asks: "What's our deployment process for the payments service?"
We deploy the payments service via ECS rolling updates [1]. Releases
are cut by tagging the main branch — the backend-deploy workflow
applies migrations to RDS, registers a new task def, and rolls the
service [2]. Pre-deploy checklist lives at [3].
Sources:
[1] runbooks/payments-service-deploy.md (Confluence) — Senior Platform Eng, 2 days ago
[2] .github/workflows/backend-deploy.yml (GitHub) — Platform Team, 5 days ago
[3] checklists/pre-deploy.md (Confluence) — Senior Platform Eng, 2 weeks ago
{
"query_id": "<query-id>",
"coverage": "expert",
"retrieval_mode": "hybrid",
"tokens_consumed": 12500,
"latency_ms": 1180
}Discover what's indexed.
Returns every taxonomy domain in your tenant with document count, expert count, coverage level (expert · partial · none), and top contributor by authority. Use it before issuing a billable query to confirm the brain has relevant knowledge.
Verify a citation.
Pass a query_id from an earlier query_knowledge response. Returns per-chunk provenance: document path, lifecycle state (live · stale · superseded · user_deleted), embedding timestamp + model, contributor + department, authority score, taxonomy domain. Scoped to your own queries by default — only Owner/Admin can inspect other members' queries.
Modes
Knowledge Tokens are Quelvio's unit of work. Each mode has a fixed kT cost — no surprise charges from variable token usage.
| Mode | Knowledge Tokens | Synthesis | When to use |
|---|---|---|---|
| fast | 1,500 kT | none — chunks only | Keyword-style retrieval. Names, IDs, 'find me the file that mentions X'. |
| standard (default) | 12,500 kT | yes — built-in | Most questions. Returns the synthesized answer + cited sources. |
| deep | 25,000 kT | yes — premium model + wider retrieval | Complex analytical questions; comparison across multiple sources; 'what's our position on…?' |
Your daily kT pool depends on your plan. See /pricing for plan-specific pool sizes. The pool resets at 00:00 UTC.
Plans bill per seat — kT pools come with each seat. See /pricing for plan details.
Authentication & identity
Every query is scoped to your individual employee identity, not just your tenant. The MCP server records the per-employee identity on each call so cross-employee provenance inspection is automatic — Owner/Admin can audit, members can't snoop on each other.
OAuth 2.1 with PKCE is required (the MCP spec mandates it). The Quelvio MCP server also supports Dynamic Client Registration (RFC 7591), so MCP-aware clients like Claude Desktop register themselves on first connect — no operator pre-provisioning.
Sign-in uses your workspace's identity provider — the same one used for the dashboard at enterprise.quelvio.com. When configured, this federates to your SAML, Google Workspace, or Microsoft Entra provider. Access tokens are stored encrypted (AES-GCM) in Cloudflare KV on the MCP server — never on the client's filesystem. Token rotation and revocation happen server-side via the /oauth/revoke endpoint; admins can revoke active tokens from the dashboard.
Permission model
Three layers, all enforced server-side:
- Tenant isolation. Content is stored in per-tenant collections; cross-tenant content is structurally unreachable, not just policy-gated.
- Role-based feature access. Owner / Admin / Member determines which calls succeed. get_source_detail is scoped to your own queries by default — only Owner/Admin can inspect another member's queries.
- Source permission filtering. permission_emails on every chunk, resolved at ingest time from each source's ACLs (Drive shares, SharePoint groups, Confluence space restrictions). MCP queries see only what your individual account would see in the source system. There is no flag, env var, or dashboard toggle that disables this filter.
Response caps & truncation
Per Claude Connectors policy, every tool response is capped at 25,000 output tokens. The MCP server enforces this deterministically:
- Clamp excerpts to 200 chars
- Drop lowest-rank sources (preserving ≥3)
- Truncate the synthesis body
Truncation is signaled via truncated: true in the structured metadata block, alongside the query_id so the agent can call get_source_detail for full provenance on any source it wants to inspect deeper.
Streaming
When a client sends Accept: text/event-stream and mode is standard or deep, the MCP server proxies Quelvio's SSE pipeline — the synthesized answer streams token-by-token from the model to the MCP server. The MCP response itself is still a single JSON body (most MCP clients today expect that); native per-token streaming to the client lands when the spec adds output deltas.
fast never streams — it has no synthesis body to incrementally produce.
Cost transparency
Every query_knowledge response includes tokens_consumed in the structured metadata block. Your tenant's daily pool state, current period's billable usage, and projected next invoice live on the billing dashboard at enterprise.quelvio.com/billing.