---
title: Permission-aware retrieval, explained
slug: permission-aware-retrieval
category: Product
tags: Permissions, Security, Retrieval
date: 2026-04-07
read_time: 6 min read
word_count: 940
canonical: https://quelvio.com/blog/permission-aware-retrieval
machine_url: https://quelvio.com/ai/blog/permission-aware-retrieval
publisher: Quelvio
---

# Permission-aware retrieval, explained

Product · April 7, 2026 · 6 min read

Every employee only sees results from documents they already had access to. Here's the mechanics behind enforcing it at the vector level.

There is a category of knowledge-tool bug that nobody catches in the demo and everybody catches in the third week of production. An engineer asks the assistant for a summary of last quarter's compensation review. The assistant happily produces it, citing a Google Drive document the engineer was never given access to in the source system. The assistant did not steal anything; it just retrieved the document and synthesized over it because, from the retrieval pipeline's point of view, all documents in the index are equally retrievable. Permissions were a problem the source system enforced. Once the document landed in the index, that enforcement was gone.

Permission-aware retrieval is the architectural commitment that this cannot happen — that every query in the system is filtered against the requesting user's actual permissions in the source systems, at retrieval time, before any chunks reach the synthesis layer. We enforce it at the vector level. This post is what that means and how it works.

## What "at the vector level" means

Most retrieval pipelines treat permissions as a post-processing filter. Retrieve top-k vectors first; then check whether the user has access to each one; then drop the disallowed ones. This is fast and almost always wrong. If the user has access to 1% of the corpus, top-k retrieval will overwhelmingly return chunks the user can't see — and the filter discards them — leaving the result thin or empty. Worse, the filter can leak metadata: an empty result tells the attacker that *some* chunk near their query exists, just not one they can see.

Enforcing permissions at the vector level means the access predicate is pushed into the retrieval query itself. The vector database executes a search constrained to the subset of vectors the requesting user has access to. The top-k operates over the *allowed* subset, not over the whole corpus. The chunks that come back are guaranteed to be retrievable by this user; the result is not thin, and no metadata leaks about chunks the user couldn't see.

## Where the access predicate comes from

Every chunk in the index carries an ACL inherited from the source document. When Quelvio ingests a Drive document, it reads Drive's permission state — which users and groups have view access, which are explicitly denied, which inherit from the parent folder — and stores that ACL alongside the vector. Same for SharePoint, Confluence, Notion, Slack channels, and every other connector. The ACLs are stored in normalized form so the retrieval query can join against them efficiently.

When a user authenticates with their Personal Access Token (or via OAuth), the request carries their identity. The retrieval pipeline resolves that identity against the tenant's identity graph — which groups they belong to, which roles they hold — and assembles a *current* permission set for the query. This is not cached aggressively; permission changes in the source systems propagate within minutes, because a stale permission set is exactly the failure mode this whole layer exists to prevent.

The vector search is then constrained: return the top-k chunks where the chunk ACL intersects the user's current permission set. The database does this in a single pass.

## What this rules out

**Cross-employee leakage.** An employee querying the brain cannot, by construction, retrieve a chunk from a document they were not allowed to see in the source system. The permission enforcement is the same enforcement Drive or Confluence would apply — just executed at the moment of retrieval rather than at the moment of source-system access.

**Stale-permission leakage.** When access is revoked in the source system, the change propagates to Quelvio's permission cache within minutes. A document the user could see yesterday but cannot see today will not appear in today's retrieval. The propagation lag is bounded and measurable, and we expose the lag on the dashboard so administrators can confirm the layer is current.

**Inference leaks.** Because the vector search runs only over the user's allowed subset, an empty result is genuinely an empty result. It does not tell the user that a near-match exists in some restricted corner they can't see. Permission enforcement happens before any signal leaves the database.

## What this means in practice

Two queries, two users, the same brain. The CFO asking *"what are the company's largest contracts this quarter?"* gets answers grounded in actual contract documents and the Q4 finance review. The engineer asking the same question gets a refusal — the corpus does have the answer, but not in any chunks this engineer has access to. The refusal is honest: it does not say *"the corpus does not contain this"*, which would be a lie; it says *"the corpus does not contain this for you"*, which is the truth.

Permission-aware retrieval is not a feature the user sees, unless they go looking. It is a load-bearing property of the system that you would notice immediately if it ever failed — the system would start leaking, and trust would not recover. Building it at the vector level rather than as a filter is the difference between a property the system can guarantee and a behavior it tries to remember.

[Auditing your tenant] Every retrieval is logged with the requesting identity, the ACL set used, and the chunks returned. Administrators can audit the trail at `enterprise.quelvio.com/audit`. If you suspect a stale-permission event, the audit log will show the ACL the retrieval used and when it was last refreshed from the source system.

Tags: Permissions, Security, Retrieval

# Permission-aware retrieval, explained

## What "at the vector level" means

## Where the access predicate comes from

## What this rules out

## What this means in practice

## Related