Skip to content

Commit

Permalink
resolve comments and suggestions
Browse files Browse the repository at this point in the history
  • Loading branch information
Taofiqq committed Feb 18, 2025
1 parent b4f343a commit b661dc6
Show file tree
Hide file tree
Showing 8 changed files with 210 additions and 125 deletions.
189 changes: 88 additions & 101 deletions docs/retrievers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,135 +2,122 @@
"cells": [
{
"cell_type": "markdown",
"id": "e1455a0c",
"id": "e94b2389",
"metadata": {},
"source": [
"#Langchain Permit Retrievers\n",
"This module provides two specialized retrievers that integrate Permit.io's authorization capabilities with Langchain's retrieval systems:\n",
"# Permit + LangChain Retrievers\n",
"\n",
"* ReBACSelfQueryRetriever: Implements Relationship-Based Access Control (ReBAC) using natural language queries\n",
"This notebook illustrates how to integrate [Permit.io](https://permit.io/) permissions into LangChain retrievers.\n",
"\n",
"* RBACEnsembleRetriever: Combines semantic search with Role-Based and Attribute-Based Access Control (RBAC/ABAC)\n",
"We provide two custom retrievers:\n",
"\n",
"## Prerequisites\n",
"- **`PermitSelfQueryRetriever`** – Uses a self-query approach to parse the user’s natural-language prompt, fetch the user’s permitted resource IDs from Permit, and apply that filter automatically in a vector store search. \n",
" \n",
"- **`PermitEnsembleRetriever`** – Combines multiple underlying retrievers (e.g., BM25 + Vector) via LangChain’s `EnsembleRetriever`, then filters the merged results with Permit.io.\n",
"\n",
"```python\n",
"from permit import Permit\n",
"from langchain_openai import OpenAIEmbeddings\n",
"from langchain_chroma import Chroma\n",
"# ... other imports as needed\n",
"## Installation\n",
"\n",
"# Initialize Permit client\n",
"permit_client = Permit(\n",
" token=\"your_permit_api_key\",\n",
" pdp=\"your_pdp_url\"\n",
")\n",
"```bash\n",
"pip install langchain-permit\n",
"```\n",
"\n",
"## ReBACSelfQueryRetriever\n",
"This retriever extends Langchain's SelfQueryRetriever to include relationship-based access control through Permit.io integration.\n",
"## Environment Variables\n",
"\n",
"### Basic Usage\n",
"```bash\n",
"PERMIT_API_KEY=your_api_key\n",
"PERMIT_PDP_URL=http://localhost:7766 # or your real deployment\n",
"OPENAI_API_KEY=sk-...\n",
"```\n",
"\n",
"```python\n",
"# Initialize vector store with documents\n",
"docs = [\n",
" Document(\n",
" page_content=\"Confidential project proposal\",\n",
" metadata={\n",
" \"owner\": \"user-123\",\n",
" \"relationships\": [\"team-a\", \"managers\"],\n",
" \"resource_type\": \"document\"\n",
" }\n",
" )\n",
"]\n",
"\n",
"# Create retriever\n",
"rebac_retriever = ReBACSelfQueryRetriever(\n",
" llm=ChatOpenAI(),\n",
" vectorstore=Chroma.from_documents(docs, OpenAIEmbeddings()),\n",
" permit_client=permit_client\n",
")\n",
"## Prerequisites\n",
"\n",
"- A running Permit PDP. See [Permit docs](https://docs.permit.io/) for details on setting up your policy and container.\n",
"- A vector store or multiple retrievers that we can wrap.\n",
"\n",
"# Query with relationship context\n",
"docs = await rebac_retriever.aget_relevant_documents(\n",
" \"Find project proposals\",\n",
" user_context={\"user_id\": \"user-123\", \"relationships\": [\"team-a\"]}\n",
")\n",
"```\n",
"\n",
"### Metadata Schema\n",
"The retriever uses the following metadata fields:\n",
"# PermitSelfQueryRetriever\n",
"\n",
"* owner: Document owner identifier\n",
"* relationships: List of relationship identifiers that have access\n",
"* resource_type: Type of the resource for permission checking\n",
"### Basic Explanation\n",
"\n",
"1. Retrieves permitted document IDs from Permit. \n",
"\n",
"## RBACEnsembleRetriever\n",
"This retriever combines multiple search strategies with role-based and attribute-based access control.\n",
"2. Uses an LLM to parse your query and build a “structured filter,” ensuring only docs with those permitted IDs are considered.\n",
"\n",
"### Basic Usage\n",
"## Basic Usage\n",
"\n",
"```python\n",
"# Initialize with documents\n",
"docs = [\n",
" Document(\n",
" page_content=\"HR Policy Document\",\n",
" metadata={\n",
" \"department\": \"HR\",\n",
" \"classification\": \"internal\",\n",
" \"required_role\": \"hr_staff\"\n",
" }\n",
" )\n",
"]\n",
"\n",
"# Create retrievers\n",
"semantic_retriever = Chroma.from_documents(\n",
" docs, \n",
" OpenAIEmbeddings()\n",
").as_retriever()\n",
"\n",
"permission_retriever = BM25Retriever.from_documents(docs)\n",
"\n",
"# Create ensemble\n",
"rbac_retriever = RBACEnsembleRetriever(\n",
" retrievers=[semantic_retriever, permission_retriever],\n",
" weights=[0.6, 0.4],\n",
" permit_client=permit_client\n",
"from langchain_openai import OpenAIEmbeddings\n",
"from langchain_community.vectorstores import FAISS\n",
"from langchain_permit.retrievers import PermitSelfQueryRetriever\n",
"\n",
"# Step 1: Create / load some documents and build a vector store\n",
"docs = [...]\n",
"embeddings = OpenAIEmbeddings()\n",
"vectorstore = FAISS.from_documents(docs, embeddings)\n",
"\n",
"# Step 2: Initialize the retriever\n",
"retriever = PermitSelfQueryRetriever(\n",
" api_key=\"...\",\n",
" pdp_url=\"...\",\n",
" user={\"key\": \"user-123\"},\n",
" resource_type=\"document\",\n",
" action=\"read\",\n",
" llm=..., # Typically a ChatOpenAI or other LLM\n",
" vectorstore=vectorstore,\n",
" enable_limit=True, # optional\n",
")\n",
"\n",
"# Query with role context\n",
"docs = await rbac_retriever.aget_relevant_documents(\n",
" \"HR policies\",\n",
" user_context={\n",
" \"roles\": [\"hr_staff\"],\n",
" \"attributes\": {\"department\": \"HR\"}\n",
" }\n",
")\n",
"# Step 3: Query\n",
"query = \"Give me docs about cats\"\n",
"results = retriever.get_relevant_documents(query)\n",
"for doc in results:\n",
" print(doc.metadata.get(\"id\"), doc.page_content)\n",
"```\n",
"## Access Control\n",
"The retriever supports:\n",
"\n",
"* Role-Based Access: Using the roles field in user context\n",
"* Attribute-Based Access: Using the attributes field in user context\n",
"* Weighted Results: Combining semantic relevance with permission-based filtering\n",
"# PermitEnsembleRetriever\n",
"### Basic Explanation\n",
"\n",
"1. Uses LangChain’s EnsembleRetriever to gather documents from multiple sub-retrievers (e.g., vector-based, BM25, etc.).\n",
"2. After retrieving documents, it calls filter_objects on Permit to eliminate any docs the user isn’t allowed to see.\n",
"\n",
"## Advanced Features\n",
"### Custom Permission Logic\n",
"Both retrievers support custom permission logic through Permit.io configurations:\n",
"## Basic Usage\n",
"\n",
"```python\n",
"# Example of custom permission check\n",
"allowed = await permit_client.check(\n",
" user=user_context,\n",
"from langchain_community.retrievers import BM25Retriever\n",
"from langchain_core.documents import Document\n",
"from langchain_permit.retrievers import PermitEnsembleRetriever\n",
"\n",
"# Suppose we have two child retrievers: bm25_retriever, vector_retriever\n",
"...\n",
"ensemble_retriever = PermitEnsembleRetriever(\n",
" api_key=\"...\",\n",
" pdp_url=\"...\",\n",
" user=\"user_abc\",\n",
" action=\"read\",\n",
" resource={\n",
" \"type\": doc.metadata.get(\"resource_type\"),\n",
" \"attributes\": doc.metadata\n",
" }\n",
" resource_type=\"document\",\n",
" retrievers=[bm25_retriever, vector_retriever],\n",
" weights=None\n",
")\n",
"```"
"\n",
"docs = ensemble_retriever.get_relevant_documents(\"Query about cats\")\n",
"for doc in docs:\n",
" print(doc.metadata.get(\"id\"), doc.page_content)\n",
"```\n",
"\n",
"# Demo Scripts\n",
"\n",
"For more complete demos, check out the `examples/` folder:\n",
"\n",
"1. `demo_self_query.py` – Demonstrates `PermitSelfQueryRetriever`.\n",
"2. `demo_ensemble.py` – Demonstrates `PermitEnsembleRetriever`.\n",
"\n",
"Each script shows how to build or load documents, configure Permit, and run queries.\n",
"\n",
"# Conclusion\n",
"\n",
"With these custom retrievers, you can seamlessly integrate Permit.io’s permission checks into LangChain’s retrieval workflow. You can keep your application’s vector search logic while ensuring only authorized documents are returned.\n",
"\n",
"For more details on setting up Permit policies, see the official Permit docs. If you want to combine these with other tools (like JWT validation or a broader RAG pipeline), check out our docs/tools.ipynb in the examples folder.\n"
]
}
],
Expand Down
20 changes: 18 additions & 2 deletions docs/tools.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,15 @@
"```bash\n",
"PERMIT_API_KEY=your_permit_api_key\n",
"JWKS_URL=your_jwks_endpoint_url\n",
"PERMIT_PDP_URL=your_permit_pdp_url # Usually http://localhost:7766 for local development\n",
"PERMIT_PDP_URL=your_permit_pdp_url # Usually http://localhost:7766 for local development or your real deployment\n",
"```\n",
"\n",
"## Prerequisites\n",
"\n",
"Make sure your PDP (Policy Decision Point) is running at PERMIT_PDP_URL.\n",
"See [Permit docs](https://docs.permit.io/concepts/pdp/overview/) for details on policy setup and how to launch the PDP container.\n",
"\n",
"\n",
"## JWT Validation Tool\n",
"The JWT Validation tool verifies JWT tokens against a JWKS (JSON Web Key Set) endpoint.\n",
"\n",
Expand Down Expand Up @@ -196,7 +202,17 @@
" return None\n",
"```\n",
"\n",
"This documentation demonstrates the key features and usage patterns of both tools."
"This documentation demonstrates the key features and usage patterns of both tools.\n",
"\n",
"## Additional Demo Scripts\n",
"\n",
"For fully runnable demos, check out the `examples/` folder in this repository. You’ll find:\n",
"\n",
"* `demo_jwt_validation.py` – A quick script showing how to validate JWTs using `LangchainJWTValidationTool`.\n",
"\n",
"* `demo_permissions_check.py` – A script that performs Permit.io permission checks using `LangchainPermissionsCheckTool`.\n",
"\n",
"Just run `python demo_jwt_validation.py` or `python demo_permissions_check.py` (after setting your environment variables) to see these tools in action."
]
}
],
Expand Down
25 changes: 10 additions & 15 deletions langchain_permit/examples/demo_scripts/demo_ensemble.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@
import asyncio
from langchain_core.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_community.retrievers import BM25Retriever
from langchain_openai import OpenAIEmbeddings
from langchain_permit.retrievers import PermitEnsembleRetriever

# Feel free to tailor the policy model (RBAC, ABAC, ReBAC) in Permit for your real environment

# Permissions query configuration for the retriever
# The user ID we want to filter the results for (should be synced to Permit's PDP)
USER = "user_abc"
# The name of the resource in the policy we configured in Permit
RESOURCE_TYPE = "my_resource"
# The particular action we want to filter for (usually read, view, etc.)
ACTION = "view"

async def main():
Expand All @@ -26,30 +27,24 @@ async def main():
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embedding=embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 2})


# 3. Build a BM25 retriever from the same documents
bm25_retriever = BM25Retriever.from_texts(
[d.page_content for d in docs],
metadatas=[d.metadata for d in docs],
k=2,
)

# 4. Initialize the PermitEnsembleRetriever with both retrievers
# 3. Initialize the PermitEnsembleRetriever with the relevant user/resource/action information for filtering
ensemble_retriever = PermitEnsembleRetriever(
api_key=os.getenv("PERMIT_API_KEY", ""), # or a hard-coded string for testing
pdp_url=os.getenv("PERMIT_PDP_URL"), # optional
pdp_url=os.getenv("PERMIT_PDP_URL"),
user=USER,
action=ACTION,
resource_type=RESOURCE_TYPE,
retrievers=[bm25_retriever, vector_retriever],
retrievers=[vector_retriever],
weights=None # or [0.5, 0.5], etc. if you want weighting
)

# 5. Run a query
# 4. Run a query to be performed with the filtering capabilties
query = "Tell me about cats"
results = await ensemble_retriever._aget_relevant_documents(query, run_manager=None)

# 6. Print out the results
# 5. Print out the filtered results
print(f"Query: {query}")
for i, doc in enumerate(results, start=1):
doc_id = doc.metadata.get("id")
Expand Down
29 changes: 29 additions & 0 deletions langchain_permit/examples/demo_scripts/demo_jwt_validation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import os
import asyncio

from langchain_permit.tools import LangchainJWTValidationTool

# Load your JWT token from environment variables (e.g., .env)
TEST_JWT_TOKEN = os.getenv("TEST_JWT_TOKEN")
JWKS_URL = os.getenv("JWKS_URL", "")

async def main():

print("Test Token JWt =====>", JWKS_URL)
# 1. Initialize the JWT validation tool
jwt_validator = LangchainJWTValidationTool(
jwks_url=JWKS_URL
)

# 2. Validate the token
try:
# _arun calls the async JWT validation logic
claims = await jwt_validator._arun(TEST_JWT_TOKEN)
print("✅ Token validated successfully!")
print("Decoded Claims:", claims)
except Exception as e:
print("❌ Token validation failed:", str(e))


if __name__ == "__main__":
asyncio.run(main())
52 changes: 52 additions & 0 deletions langchain_permit/examples/demo_scripts/demo_permissions_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# examples/demo_permissions_check.py

import os
import asyncio

from permit import Permit
from langchain_permit.tools import LangchainPermissionsCheckTool

PERMIT_API_KEY = os.getenv("PERMIT_API_KEY", "")
PERMIT_PDP_URL = os.getenv("PERMIT_PDP_URL", "")
DEFAULT_ACTION = "read"
RESOURCE_TYPE = "Document"

async def main():
# 1. Create a Permit client
permit_client = Permit(
token=PERMIT_API_KEY,
pdp=PERMIT_PDP_URL
)

# 2. Initialize the permission-check tool
permissions_checker = LangchainPermissionsCheckTool(
name="permission_check",
description="Checks if a user can read a document",
permit=permit_client,
)

# 3. Mock a user object and resource
user = {
"key": "user-123",
"firstName": "Harry",
"attributes": {"role": "basic_user"}
}
resource = {
"type": RESOURCE_TYPE,
"key": "doc123",
"tenant": "techcorp"
}

# 4. Use the async _arun to avoid nested event loops
try:
allowed_result = await permissions_checker._arun(
user=user,
action=DEFAULT_ACTION,
resource=resource
)
print(f"✅ Permission check result: {allowed_result}")
except Exception as e:
print(f"❌ Permission check failed: {e}")

if __name__ == "__main__":
asyncio.run(main())
Loading

0 comments on commit b661dc6

Please sign in to comment.