resolve comments and suggestions

permitio · Feb 18, 2025 · b661dc6 · b661dc6
1 parent b4f343a
commit b661dc6
Show file tree

Hide file tree

Showing 8 changed files with 210 additions and 125 deletions.
diff --git a/docs/retrievers.ipynb b/docs/retrievers.ipynb
@@ -2,135 +2,122 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "e1455a0c",
+   "id": "e94b2389",
    "metadata": {},
    "source": [
-    "#Langchain Permit Retrievers\n",
-    "This module provides two specialized retrievers that integrate Permit.io's authorization capabilities with Langchain's retrieval systems:\n",
+    "# Permit + LangChain Retrievers\n",
     "\n",
-    "* ReBACSelfQueryRetriever: Implements Relationship-Based Access Control (ReBAC) using natural language queries\n",
+    "This notebook illustrates how to integrate [Permit.io](https://permit.io/) permissions into LangChain retrievers.\n",
     "\n",
-    "* RBACEnsembleRetriever: Combines semantic search with Role-Based and Attribute-Based Access Control (RBAC/ABAC)\n",
+    "We provide two custom retrievers:\n",
     "\n",
-    "## Prerequisites\n",
+    "- **`PermitSelfQueryRetriever`** – Uses a self-query approach to parse the user’s natural-language prompt, fetch the user’s permitted resource IDs from Permit, and apply that filter automatically in a vector store search. \n",
+    " \n",
+    "- **`PermitEnsembleRetriever`** – Combines multiple underlying retrievers (e.g., BM25 + Vector) via LangChain’s `EnsembleRetriever`, then filters the merged results with Permit.io.\n",
     "\n",
-    "```python\n",
-    "from permit import Permit\n",
-    "from langchain_openai import OpenAIEmbeddings\n",
-    "from langchain_chroma import Chroma\n",
-    "# ... other imports as needed\n",
+    "## Installation\n",
     "\n",
-    "# Initialize Permit client\n",
-    "permit_client = Permit(\n",
-    "    token=\"your_permit_api_key\",\n",
-    "    pdp=\"your_pdp_url\"\n",
-    ")\n",
+    "```bash\n",
+    "pip install langchain-permit\n",
     "```\n",
     "\n",
-    "## ReBACSelfQueryRetriever\n",
-    "This retriever extends Langchain's SelfQueryRetriever to include relationship-based access control through Permit.io integration.\n",
+    "## Environment Variables\n",
     "\n",
-    "### Basic Usage\n",
+    "```bash\n",
+    "PERMIT_API_KEY=your_api_key\n",
+    "PERMIT_PDP_URL=http://localhost:7766    # or your real deployment\n",
+    "OPENAI_API_KEY=sk-...\n",
+    "```\n",
     "\n",
-    "```python\n",
-    "# Initialize vector store with documents\n",
-    "docs = [\n",
-    "    Document(\n",
-    "        page_content=\"Confidential project proposal\",\n",
-    "        metadata={\n",
-    "            \"owner\": \"user-123\",\n",
-    "            \"relationships\": [\"team-a\", \"managers\"],\n",
-    "            \"resource_type\": \"document\"\n",
-    "        }\n",
-    "    )\n",
-    "]\n",
-    "\n",
-    "# Create retriever\n",
-    "rebac_retriever = ReBACSelfQueryRetriever(\n",
-    "    llm=ChatOpenAI(),\n",
-    "    vectorstore=Chroma.from_documents(docs, OpenAIEmbeddings()),\n",
-    "    permit_client=permit_client\n",
-    ")\n",
+    "## Prerequisites\n",
+    "\n",
+    "- A running Permit PDP. See [Permit docs](https://docs.permit.io/) for details on setting up your policy and container.\n",
+    "- A vector store or multiple retrievers that we can wrap.\n",
     "\n",
-    "# Query with relationship context\n",
-    "docs = await rebac_retriever.aget_relevant_documents(\n",
-    "    \"Find project proposals\",\n",
-    "    user_context={\"user_id\": \"user-123\", \"relationships\": [\"team-a\"]}\n",
-    ")\n",
-    "```\n",
     "\n",
-    "### Metadata Schema\n",
-    "The retriever uses the following metadata fields:\n",
+    "# PermitSelfQueryRetriever\n",
     "\n",
-    "* owner: Document owner identifier\n",
-    "* relationships: List of relationship identifiers that have access\n",
-    "* resource_type: Type of the resource for permission checking\n",
+    "### Basic Explanation\n",
     "\n",
+    "1. Retrieves permitted document IDs from Permit.  \n",
     "\n",
-    "## RBACEnsembleRetriever\n",
-    "This retriever combines multiple search strategies with role-based and attribute-based access control.\n",
+    "2. Uses an LLM to parse your query and build a “structured filter,” ensuring only docs with those permitted IDs are considered.\n",
     "\n",
-    "### Basic Usage\n",
+    "## Basic Usage\n",
     "\n",
     "```python\n",
-    "# Initialize with documents\n",
-    "docs = [\n",
-    "    Document(\n",
-    "        page_content=\"HR Policy Document\",\n",
-    "        metadata={\n",
-    "            \"department\": \"HR\",\n",
-    "            \"classification\": \"internal\",\n",
-    "            \"required_role\": \"hr_staff\"\n",
-    "        }\n",
-    "    )\n",
-    "]\n",
-    "\n",
-    "# Create retrievers\n",
-    "semantic_retriever = Chroma.from_documents(\n",
-    "    docs, \n",
-    "    OpenAIEmbeddings()\n",
-    ").as_retriever()\n",
-    "\n",
-    "permission_retriever = BM25Retriever.from_documents(docs)\n",
-    "\n",
-    "# Create ensemble\n",
-    "rbac_retriever = RBACEnsembleRetriever(\n",
-    "    retrievers=[semantic_retriever, permission_retriever],\n",
-    "    weights=[0.6, 0.4],\n",
-    "    permit_client=permit_client\n",
+    "from langchain_openai import OpenAIEmbeddings\n",
+    "from langchain_community.vectorstores import FAISS\n",
+    "from langchain_permit.retrievers import PermitSelfQueryRetriever\n",
+    "\n",
+    "# Step 1: Create / load some documents and build a vector store\n",
+    "docs = [...]\n",
+    "embeddings = OpenAIEmbeddings()\n",
+    "vectorstore = FAISS.from_documents(docs, embeddings)\n",
+    "\n",
+    "# Step 2: Initialize the retriever\n",
+    "retriever = PermitSelfQueryRetriever(\n",
+    "    api_key=\"...\",\n",
+    "    pdp_url=\"...\",\n",
+    "    user={\"key\": \"user-123\"},\n",
+    "    resource_type=\"document\",\n",
+    "    action=\"read\",\n",
+    "    llm=...,                # Typically a ChatOpenAI or other LLM\n",
+    "    vectorstore=vectorstore,\n",
+    "    enable_limit=True,      # optional\n",
     ")\n",
     "\n",
-    "# Query with role context\n",
-    "docs = await rbac_retriever.aget_relevant_documents(\n",
-    "    \"HR policies\",\n",
-    "    user_context={\n",
-    "        \"roles\": [\"hr_staff\"],\n",
-    "        \"attributes\": {\"department\": \"HR\"}\n",
-    "    }\n",
-    ")\n",
+    "# Step 3: Query\n",
+    "query = \"Give me docs about cats\"\n",
+    "results = retriever.get_relevant_documents(query)\n",
+    "for doc in results:\n",
+    "    print(doc.metadata.get(\"id\"), doc.page_content)\n",
     "```\n",
-    "## Access Control\n",
-    "The retriever supports:\n",
     "\n",
-    "* Role-Based Access: Using the roles field in user context\n",
-    "* Attribute-Based Access: Using the attributes field in user context\n",
-    "* Weighted Results: Combining semantic relevance with permission-based filtering\n",
+    "# PermitEnsembleRetriever\n",
+    "### Basic Explanation\n",
+    "\n",
+    "1. Uses LangChain’s EnsembleRetriever to gather documents from multiple sub-retrievers (e.g., vector-based, BM25, etc.).\n",
+    "2. After retrieving documents, it calls filter_objects on Permit to eliminate any docs the user isn’t allowed to see.\n",
     "\n",
-    "## Advanced Features\n",
-    "### Custom Permission Logic\n",
-    "Both retrievers support custom permission logic through Permit.io configurations:\n",
+    "## Basic Usage\n",
     "\n",
     "```python\n",
-    "# Example of custom permission check\n",
-    "allowed = await permit_client.check(\n",
-    "    user=user_context,\n",
+    "from langchain_community.retrievers import BM25Retriever\n",
+    "from langchain_core.documents import Document\n",
+    "from langchain_permit.retrievers import PermitEnsembleRetriever\n",
+    "\n",
+    "# Suppose we have two child retrievers: bm25_retriever, vector_retriever\n",
+    "...\n",
+    "ensemble_retriever = PermitEnsembleRetriever(\n",
+    "    api_key=\"...\",\n",
+    "    pdp_url=\"...\",\n",
+    "    user=\"user_abc\",\n",
     "    action=\"read\",\n",
-    "    resource={\n",
-    "        \"type\": doc.metadata.get(\"resource_type\"),\n",
-    "        \"attributes\": doc.metadata\n",
-    "    }\n",
+    "    resource_type=\"document\",\n",
+    "    retrievers=[bm25_retriever, vector_retriever],\n",
+    "    weights=None\n",
     ")\n",
-    "```"
+    "\n",
+    "docs = ensemble_retriever.get_relevant_documents(\"Query about cats\")\n",
+    "for doc in docs:\n",
+    "    print(doc.metadata.get(\"id\"), doc.page_content)\n",
+    "```\n",
+    "\n",
+    "# Demo Scripts\n",
+    "\n",
+    "For more complete demos, check out the `examples/` folder:\n",
+    "\n",
+    "1. `demo_self_query.py` – Demonstrates `PermitSelfQueryRetriever`.\n",
+    "2. `demo_ensemble.py` – Demonstrates `PermitEnsembleRetriever`.\n",
+    "\n",
+    "Each script shows how to build or load documents, configure Permit, and run queries.\n",
+    "\n",
+    "# Conclusion\n",
+    "\n",
+    "With these custom retrievers, you can seamlessly integrate Permit.io’s permission checks into LangChain’s retrieval workflow. You can keep your application’s vector search logic while ensuring only authorized documents are returned.\n",
+    "\n",
+    "For more details on setting up Permit policies, see the official Permit docs. If you want to combine these with other tools (like JWT validation or a broader RAG pipeline), check out our docs/tools.ipynb in the examples folder.\n"
    ]
   }
  ],

diff --git a/docs/tools.ipynb b/docs/tools.ipynb
@@ -26,9 +26,15 @@
     "```bash\n",
     "PERMIT_API_KEY=your_permit_api_key\n",
     "JWKS_URL=your_jwks_endpoint_url\n",
-    "PERMIT_PDP_URL=your_permit_pdp_url  # Usually http://localhost:7766 for local development\n",
+    "PERMIT_PDP_URL=your_permit_pdp_url  # Usually http://localhost:7766 for local development or your real deployment\n",
     "```\n",
     "\n",
+    "## Prerequisites\n",
+    "\n",
+    "Make sure your PDP (Policy Decision Point) is running at PERMIT_PDP_URL.\n",
+    "See [Permit docs](https://docs.permit.io/concepts/pdp/overview/) for details on policy setup and how to launch the PDP container.\n",
+    "\n",
+    "\n",
     "## JWT Validation Tool\n",
     "The JWT Validation tool verifies JWT tokens against a JWKS (JSON Web Key Set) endpoint.\n",
     "\n",
@@ -196,7 +202,17 @@
     "        return None\n",
     "```\n",
     "\n",
-    "This documentation demonstrates the key features and usage patterns of both tools."
+    "This documentation demonstrates the key features and usage patterns of both tools.\n",
+    "\n",
+    "## Additional Demo Scripts\n",
+    "\n",
+    "For fully runnable demos, check out the `examples/` folder in this repository. You’ll find:\n",
+    "\n",
+    "* `demo_jwt_validation.py` – A quick script showing how to validate JWTs using `LangchainJWTValidationTool`.\n",
+    "\n",
+    "* `demo_permissions_check.py` – A script that performs Permit.io permission checks using `LangchainPermissionsCheckTool`.\n",
+    "\n",
+    "Just run `python demo_jwt_validation.py` or `python demo_permissions_check.py` (after setting your environment variables) to see these tools in action."
    ]
   }
  ],

diff --git a/langchain_permit/examples/demo_scripts/demo_ensemble.py b/langchain_permit/examples/demo_scripts/demo_ensemble.py
@@ -2,14 +2,15 @@
 import asyncio
 from langchain_core.documents import Document
 from langchain_community.vectorstores import FAISS
-from langchain_community.retrievers import BM25Retriever
 from langchain_openai import OpenAIEmbeddings
 from langchain_permit.retrievers import PermitEnsembleRetriever
 
-# Feel free to tailor the policy model (RBAC, ABAC, ReBAC) in Permit for your real environment
-
+# Permissions query configuration for the retriever
+# The user ID we want to filter the results for (should be synced to Permit's PDP)
 USER = "user_abc"
+# The name of the resource in the policy we configured in Permit
 RESOURCE_TYPE = "my_resource"
+# The particular action we want to filter for (usually read, view, etc.)
 ACTION = "view"
 
 async def main():
@@ -26,30 +27,24 @@ async def main():
     embeddings = OpenAIEmbeddings()
     vectorstore = FAISS.from_documents(docs, embedding=embeddings)
     vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
+
 
-    # 3. Build a BM25 retriever from the same documents
-    bm25_retriever = BM25Retriever.from_texts(
-        [d.page_content for d in docs],
-        metadatas=[d.metadata for d in docs],
-        k=2,
-    )
-
-    # 4. Initialize the PermitEnsembleRetriever with both retrievers
+    # 3. Initialize the PermitEnsembleRetriever with the relevant user/resource/action information for filtering
     ensemble_retriever = PermitEnsembleRetriever(
         api_key=os.getenv("PERMIT_API_KEY", ""),  # or a hard-coded string for testing
-        pdp_url=os.getenv("PERMIT_PDP_URL"),      # optional
+        pdp_url=os.getenv("PERMIT_PDP_URL"),   
         user=USER,
         action=ACTION,
         resource_type=RESOURCE_TYPE,
-        retrievers=[bm25_retriever, vector_retriever],
+        retrievers=[vector_retriever],
         weights=None  # or [0.5, 0.5], etc. if you want weighting
     )
 
-    # 5. Run a query
+    # 4. Run a query to be performed with the filtering capabilties
     query = "Tell me about cats"
     results = await ensemble_retriever._aget_relevant_documents(query, run_manager=None)
 
-    # 6. Print out the results
+    # 5. Print out the filtered results
     print(f"Query: {query}")
     for i, doc in enumerate(results, start=1):
         doc_id = doc.metadata.get("id")

diff --git a/langchain_permit/examples/demo_scripts/demo_jwt_validation.py b/langchain_permit/examples/demo_scripts/demo_jwt_validation.py
@@ -0,0 +1,29 @@
+import os
+import asyncio
+
+from langchain_permit.tools import LangchainJWTValidationTool
+
+# Load your JWT token from environment variables (e.g., .env)
+TEST_JWT_TOKEN = os.getenv("TEST_JWT_TOKEN")
+JWKS_URL = os.getenv("JWKS_URL", "")
+
+async def main():
+
+    print("Test Token JWt =====>", JWKS_URL)
+    # 1. Initialize the JWT validation tool
+    jwt_validator = LangchainJWTValidationTool(
+        jwks_url=JWKS_URL
+    )
+
+    # 2. Validate the token
+    try:
+        # _arun calls the async JWT validation logic
+        claims = await jwt_validator._arun(TEST_JWT_TOKEN)
+        print("✅ Token validated successfully!")
+        print("Decoded Claims:", claims)
+    except Exception as e:
+        print("❌ Token validation failed:", str(e))
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
diff --git a/langchain_permit/examples/demo_scripts/demo_permissions_check.py b/langchain_permit/examples/demo_scripts/demo_permissions_check.py
@@ -0,0 +1,52 @@
+# examples/demo_permissions_check.py
+
+import os
+import asyncio
+
+from permit import Permit
+from langchain_permit.tools import LangchainPermissionsCheckTool
+
+PERMIT_API_KEY = os.getenv("PERMIT_API_KEY", "")
+PERMIT_PDP_URL = os.getenv("PERMIT_PDP_URL", "")
+DEFAULT_ACTION = "read"
+RESOURCE_TYPE = "Document"
+
+async def main():
+    # 1. Create a Permit client
+    permit_client = Permit(
+        token=PERMIT_API_KEY,
+        pdp=PERMIT_PDP_URL
+    )
+
+    # 2. Initialize the permission-check tool
+    permissions_checker = LangchainPermissionsCheckTool(
+        name="permission_check",
+        description="Checks if a user can read a document",
+        permit=permit_client,
+    )
+
+    # 3. Mock a user object and resource
+    user = {
+        "key": "user-123",
+        "firstName": "Harry",
+        "attributes": {"role": "basic_user"}
+    }
+    resource = {
+        "type": RESOURCE_TYPE,
+        "key": "doc123",
+        "tenant": "techcorp"
+    }
+
+    # 4. Use the async _arun to avoid nested event loops
+    try:
+        allowed_result = await permissions_checker._arun(
+            user=user,
+            action=DEFAULT_ACTION,
+            resource=resource
+        )
+        print(f"✅ Permission check result: {allowed_result}")
+    except Exception as e:
+        print(f"❌ Permission check failed: {e}")
+
+if __name__ == "__main__":
+    asyncio.run(main())