Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions backend/app/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.env
8 changes: 6 additions & 2 deletions backend/app/logging/logging_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,12 @@ def setup_logger(name: str) -> logging.Logger:
datefmt="%Y-%m-%d %H:%M:%S"
)

# Console Handler
console_handler = logging.StreamHandler(sys.stdout)
# Console Handler (force UTF-8 to prevent UnicodeEncodeError on Windows with emoji)
try:
utf8_stream = open(sys.stdout.fileno(), mode="w", encoding="utf-8", closefd=False)
console_handler = logging.StreamHandler(utf8_stream)
except Exception:
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setLevel(logging.INFO)
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)
Expand Down
7 changes: 3 additions & 4 deletions backend/app/modules/bias_detection/check_bias.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,9 @@

load_dotenv()

client = Groq(api_key=os.getenv("GROQ_API_KEY"))


def check_bias(text):
def check_bias(text, api_key: str, groq_model: str = "llama-3.3-70b-versatile"):
client = Groq(api_key=api_key)
try:
Comment on lines +36 to 38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard client initialization inside error-handling path.

At Line 37, Groq(api_key=api_key) executes before the try, so init/validation errors bypass your structured error response.

Proposed fix
 def check_bias(text, api_key: str, groq_model: str = "llama-3.3-70b-versatile"):
-    client = Groq(api_key=api_key)
     try:
+        if not api_key:
+            raise ValueError("Missing Groq API key")
+        client = Groq(api_key=api_key)
         logger.debug(f"Raw article text: {text}")
         logger.debug(f"JSON dump of text: {json.dumps(text)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def check_bias(text, api_key: str, groq_model: str = "llama-3.3-70b-versatile"):
client = Groq(api_key=api_key)
try:
def check_bias(text, api_key: str, groq_model: str = "llama-3.3-70b-versatile"):
try:
if not api_key:
raise ValueError("Missing Groq API key")
client = Groq(api_key=api_key)
logger.debug(f"Raw article text: {text}")
logger.debug(f"JSON dump of text: {json.dumps(text)}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/modules/bias_detection/check_bias.py` around lines 36 - 38, The
Groq client is being constructed before the try block in check_bias, so
initialization/validation errors escape your structured error handling; move the
Groq(api_key=api_key) call inside the try (or add a surrounding try/except that
wraps the instantiation) and handle exceptions there (using the same error
response path you use for subsequent calls), referencing the check_bias function
and the Groq client instantiation to ensure any instantiation errors are caught
and returned in the structured error response.

logger.debug(f"Raw article text: {text}")
logger.debug(f"JSON dump of text: {json.dumps(text)}")
Expand All @@ -61,7 +60,7 @@ def check_bias(text):
"content": (f"Give bias score to the following article \n\n{text}"),
},
],
model="gemma2-9b-it",
model=groq_model,
temperature=0.3,
max_tokens=512,
)
Expand Down
21 changes: 14 additions & 7 deletions backend/app/modules/chat/llm_processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@

load_dotenv()

client = Groq(api_key=os.getenv("GROQ_API_KEY"))


def build_context(docs):
return "\n".join(
Expand All @@ -42,10 +40,19 @@ def build_context(docs):
)


def ask_llm(question, docs):
context = build_context(docs)
logger.debug(f"Generated context for LLM:\n{context}")
prompt = f"""You are an assistant that answers based on context.
def ask_llm(question, docs, api_key: str, groq_model: str = "llama-3.3-70b-versatile", article_text: str = ""):
client = Groq(api_key=api_key)
pinecone_context = build_context(docs)
logger.debug(f"Generated context for LLM:\n{pinecone_context}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid logging full LLM context payloads.

Line 46 logs raw context text, which can leak user/article content into logs and create very large log entries. Log size/metadata instead of full text.

🔧 Suggested change
-    logger.debug(f"Generated context for LLM:\n{pinecone_context}")
+    logger.debug("Generated LLM context from Pinecone notes (chars=%d)", len(pinecone_context))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.debug(f"Generated context for LLM:\n{pinecone_context}")
logger.debug("Generated LLM context from Pinecone notes (chars=%d)", len(pinecone_context))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/modules/chat/llm_processing.py` at line 46, The current
logger.debug call logs the full LLM context in pinecone_context which can leak
sensitive text and create huge logs; modify the logging in llm_processing.py to
avoid printing raw content by replacing the logger.debug(f"Generated context for
LLM:\n{pinecone_context}") call with a metadata-only log: record the length/size
(e.g., len(pinecone_context) or number of chunks), an optional content hash
(e.g., SHA256) or first N characters only if needed for debugging, and any
token/count metrics instead of the full pinecone_context string; update the code
paths that reference logger.debug and ensure pinecone_context is never logged in
full.


context_parts = []
if article_text:
context_parts.append(f"=== Full Article ===\n{article_text}")
if pinecone_context:
context_parts.append(f"=== Fact-Check Notes ===\n{pinecone_context}")
context = "\n\n".join(context_parts) or "No context available."

prompt = f"""You are an assistant that answers questions about a news article.

Context:
{context}
Expand All @@ -55,7 +62,7 @@ def ask_llm(question, docs):
"""

response = client.chat.completions.create(
model="gemma2-9b-it",
model=groq_model,
messages=[
{"role": "system", "content": "Use only the context to answer."},
{"role": "user", "content": prompt},
Expand Down
10 changes: 5 additions & 5 deletions backend/app/modules/facts_check/llm_processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,9 @@

load_dotenv()

client = Groq(api_key=os.getenv("GROQ_API_KEY"))


def run_claim_extractor_sdk(state):
client = Groq(api_key=state["groq_api_key"])
try:
Comment on lines +39 to 40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard API key access inside try in claim extractor.

Line 39 can throw KeyError before structured error handling starts.

Suggested fix
 def run_claim_extractor_sdk(state):
-    client = Groq(api_key=state["groq_api_key"])
     try:
+        api_key = state.get("groq_api_key")
+        if not api_key:
+            raise ValueError("Missing 'groq_api_key' in state")
+        client = Groq(api_key=api_key)
         text = state.get("cleaned_text")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/modules/facts_check/llm_processing.py` around lines 39 - 40, The
code instantiates Groq with state["groq_api_key"] outside the try block which
can raise KeyError before your structured error handling runs; inside the claim
extractor function, move the API key access into the existing try (or guard it
with state.get and raise a controlled error) and then construct client =
Groq(api_key=api_key) within that try so any missing-key exception is caught and
handled (or explicitly check for the key first and call your error/log path if
absent). Ensure you reference the same identifiers (state, Groq, client) and
preserve the existing error logging/handling flow.

text = state.get("cleaned_text")
if not text:
Expand All @@ -63,7 +62,7 @@ def run_claim_extractor_sdk(state):
),
},
],
model="gemma2-9b-it",
model=state.get("groq_model", "llama-3.3-70b-versatile"),
temperature=0.3,
max_tokens=512,
)
Expand All @@ -87,7 +86,8 @@ def run_claim_extractor_sdk(state):
}


def run_fact_verifier_sdk(search_results):
def run_fact_verifier_sdk(search_results, api_key: str, groq_model: str = "llama-3.3-70b-versatile"):
client = Groq(api_key=api_key)
try:
results_list = []

Expand Down Expand Up @@ -128,7 +128,7 @@ def run_fact_verifier_sdk(search_results):
),
},
],
model="gemma2-9b-it",
model=groq_model,
temperature=0.3,
max_tokens=256,
)
Expand Down
2 changes: 2 additions & 0 deletions backend/app/modules/facts_check/web_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ def search_google(query):
f"https://www.googleapis.com/customsearch/v1?key={GOOGLE_SEARCH}&cx=f637ab77b5d8b4a3c&q={query}"
)
res = results.json()
if "items" not in res:
return []
first = {}
first["title"] = res["items"][0]["title"]
first["link"] = res["items"][0]["link"]
Expand Down
2 changes: 2 additions & 0 deletions backend/app/modules/langgraph_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ class MyState(TypedDict):
score: int
retries: int
status: str
groq_api_key: str
groq_model: str
Comment on lines +55 to +56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find and read the langgraph_builder.py file
fd langgraph_builder.py

Repository: AOSSIE-Org/Perspective

Length of output: 106


🏁 Script executed:

# Once we have the file, let's examine the context around lines 55-56
head -80 backend/app/modules/langgraph_builder.py | tail -40

Repository: AOSSIE-Org/Perspective

Length of output: 1105


🏁 Script executed:

# Search for groq_api_key usage throughout the codebase
rg -n 'groq_api_key' --type py -B2 -A2

Repository: AOSSIE-Org/Perspective

Length of output: 2957


🏁 Script executed:

# Search for groq_model usage
rg -n 'groq_model' --type py -B2 -A2

Repository: AOSSIE-Org/Perspective

Length of output: 7629


🏁 Script executed:

# Search for potential persistence/storage operations
rg -n 'store_and_send|JSONResponse|return\s+\{|insert|upsert|save|persist|log' backend/app/modules/langgraph_builder.py -B2 -A2

Repository: AOSSIE-Org/Perspective

Length of output: 1230


🏁 Script executed:

# Check for any JSON/dict serialization that might expose the key
rg -n 'json\.|dumps\|to_dict\|serialize' backend/app/modules/langgraph_builder.py -B2 -A2

Repository: AOSSIE-Org/Perspective

Length of output: 48


🏁 Script executed:

# Find store_and_send.py
fd store_and_send.py

Repository: AOSSIE-Org/Perspective

Length of output: 119


🏁 Script executed:

# Read the store_and_send implementation
cat backend/app/modules/langgraph_nodes/store_and_send.py

Repository: AOSSIE-Org/Perspective

Length of output: 1815


🏁 Script executed:

# Check routes.py to see what data is returned from API endpoints
cat backend/app/routes/routes.py

Repository: AOSSIE-Org/Perspective

Length of output: 3748


🏁 Script executed:

# Search for any logging statements that might expose groq_api_key
rg -n 'logger\.|print\(' backend/app/modules/langgraph_builder.py backend/app/modules/langgraph_nodes/ backend/app/modules/pipeline.py -A2

Repository: AOSSIE-Org/Perspective

Length of output: 3450


🏁 Script executed:

# Search for return statements that might include the entire state
rg -n 'return\s+\{|return\s+result|return\s+data' backend/app/routes/routes.py -B3 -A3

Repository: AOSSIE-Org/Perspective

Length of output: 750


🏁 Script executed:

# Find chunk_rag_data.py
fd chunk_rag_data.py

Repository: AOSSIE-Org/Perspective

Length of output: 116


🏁 Script executed:

# Read the chunk_rag_data implementation
cat backend/app/modules/vector_store/chunk_rag_data.py

Repository: AOSSIE-Org/Perspective

Length of output: 3488


🏁 Script executed:

# Check what fields are actually needed by chunk_rag_data
rg -n 'state\[|state\.get' backend/app/modules/vector_store/chunk_rag_data.py

Repository: AOSSIE-Org/Perspective

Length of output: 48


Strip groq_api_key and groq_model from state before returning from store_and_send node.

The API key is currently exposed in two critical ways:

  1. Logged in store_and_send.py (line 29): logger.debug(f"Received state for vector storage: {state}") logs the entire state including the API key to application logs.

  2. Returned to clients in HTTP response: The /process endpoint returns the entire workflow state (via pipeline.pyroutes.py line 84), which includes groq_api_key in the JSON response body. This violates BYOK principles by leaking user credentials to clients.

Fix: In store_and_send.py line 53, filter the returned state before returning:

return {k: v for k, v in state.items() if k not in ("groq_api_key", "groq_model")}, "status": "success"}

Also remove the debug log at line 29 or exclude sensitive fields from it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/modules/langgraph_builder.py` around lines 55 - 56, The
store_and_send node is leaking sensitive fields (groq_api_key and groq_model) by
logging the full state and returning it; update the store_and_send function to
(1) remove or redact these keys before logging (or remove the debug log
entirely) and (2) filter them out of the returned state so the HTTP response
from the /process pipeline does not include groq_api_key or groq_model; locate
the store_and_send implementation and replace the return value with a sanitized
dict that excludes "groq_api_key" and "groq_model" (and ensure
pipeline.py/routes.py continue to receive only the sanitized state).



def build_langgraph():
Expand Down
10 changes: 4 additions & 6 deletions backend/app/modules/langgraph_nodes/fact_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,10 @@ def run_fact_check(state):
verifications, error_message = run_fact_check_pipeline(state)

if error_message:
logger.error(f"Error in fact-checking: {error_message}")
return {
"status": "error",
"error_from": "fact_checking",
"message": f"{error_message}",
}
# Soft failure — web search quota/key issue. Continue with empty facts
# so the rest of the pipeline (generate_perspective, store_and_send) still runs.
logger.warning(f"Fact-checking skipped (non-fatal): {error_message}")
verifications = []
Comment on lines +35 to +38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid logging raw fact-check error payloads.

Line 37 logs error_message verbatim. This can leak sensitive upstream details into logs. Prefer a sanitized/structured message without raw provider text.

Suggested fix
-            logger.warning(f"Fact-checking skipped (non-fatal): {error_message}")
+            logger.warning("Fact-checking skipped (non-fatal). reason=%s", "upstream_fact_check_failure")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Soft failure — web search quota/key issue. Continue with empty facts
# so the rest of the pipeline (generate_perspective, store_and_send) still runs.
logger.warning(f"Fact-checking skipped (non-fatal): {error_message}")
verifications = []
# Soft failure — web search quota/key issue. Continue with empty facts
# so the rest of the pipeline (generate_perspective, store_and_send) still runs.
logger.warning("Fact-checking skipped (non-fatal). reason=%s", "upstream_fact_check_failure")
verifications = []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/modules/langgraph_nodes/fact_check.py` around lines 35 - 38, The
warning currently logs the raw upstream payload via
logger.warning(f"Fact-checking skipped (non-fatal): {error_message}") which can
leak sensitive provider text; change the logging in fact_check.py to emit a
sanitized/structured message instead (e.g., include a fixed description,
optional provider identifier or error code if available, and omit or truncate
the raw error_message) while preserving the soft-failure behavior that sets
verifications = [] so the pipeline continues; update the logger call to use the
sanitized fields rather than logging error_message verbatim.


except Exception as e:
logger.exception(f"Unexpected error in fact-checking: {e}")
Expand Down
19 changes: 6 additions & 13 deletions backend/app/modules/langgraph_nodes/generate_perspective.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,6 @@ class PerspectiveOutput(BaseModel):
perspective: str = Field(..., description="Generated opposite perspective")


my_llm = "llama-3.3-70b-versatile"

llm = ChatGroq(model=my_llm, temperature=0.7)

structured_llm = llm.with_structured_output(PerspectiveOutput)


chain = prompt | structured_llm


def generate_perspective(state):
try:
Expand All @@ -56,17 +47,19 @@ def generate_perspective(state):

if not text:
raise ValueError("Missing or empty 'cleaned_text' in state")
elif not facts:
raise ValueError("Missing or empty 'facts' in state")

llm = ChatGroq(model=state["groq_model"], temperature=0.7, api_key=state["groq_api_key"])
chain = prompt | llm.with_structured_output(PerspectiveOutput)

# facts may be empty if web search failed — generate perspective without them
facts_str = "\n".join(
[
f"Claim: {f['original_claim']}\n"
"Verdict: {f['verdict']}\nExplanation: "
"{f['explanation']}"
for f in state["facts"]
for f in (facts or [])
Comment on lines 55 to +60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix facts prompt interpolation; verdict/explanation are currently literal text.

In this block, only the first line is an f-string; Line 58 and Line 59 placeholders are not interpolated.

Suggested fix
         facts_str = "\n".join(
             [
                 f"Claim: {f['original_claim']}\n"
-                "Verdict: {f['verdict']}\nExplanation: "
-                "{f['explanation']}"
+                f"Verdict: {f['verdict']}\n"
+                f"Explanation: {f['explanation']}"
                 for f in (facts or [])
             ]
         ) or "No verified facts available."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/modules/langgraph_nodes/generate_perspective.py` around lines 55
- 60, The facts_str construction in generate_perspective.py is using only the
first line as an f-string so "{f['verdict']}" and "{f['explanation']}" are not
interpolated; update the list comprehension that builds facts_str (the
expression assigning facts_str) so each item is a single f-string that includes
f['original_claim'], f['verdict'], and f['explanation'] (or alternatively
concatenate three f-strings per item) and ensure the comprehension iterates over
(facts or [])—i.e., replace the current mix of plain and f-strings with a single
f"Claim: {f['original_claim']}\nVerdict: {f['verdict']}\nExplanation:
{f['explanation']}" for f in (facts or []).

]
)
) or "No verified facts available."

result = chain.invoke(
{
Expand Down
13 changes: 6 additions & 7 deletions backend/app/modules/langgraph_nodes/judge.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,13 @@
logger = setup_logger(__name__)

# Init once
groq_llm = ChatGroq(
model="gemma2-9b-it",
temperature=0.0,
max_tokens=10,
)


def judge_perspective(state):
groq_llm = ChatGroq(
model=state.get("groq_model", "llama-3.3-70b-versatile"),
temperature=0.0,
max_tokens=10,
api_key=state["groq_api_key"],
)
try:
Comment on lines +27 to 33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Move ChatGroq construction inside try and validate key presence.

At Line 31, direct indexing (state["groq_api_key"]) plus pre-try initialization can raise uncaught exceptions and bypass your node-level error response.

Proposed fix
 def judge_perspective(state):
-    groq_llm = ChatGroq(
-        model=state.get("groq_model", "llama-3.3-70b-versatile"),
-        temperature=0.0,
-        max_tokens=10,
-        api_key=state["groq_api_key"],
-    )
     try:
+        api_key = state.get("groq_api_key")
+        if not api_key:
+            raise ValueError("Missing 'groq_api_key' in state")
+
+        groq_llm = ChatGroq(
+            model=state.get("groq_model", "llama-3.3-70b-versatile"),
+            temperature=0.0,
+            max_tokens=10,
+            api_key=api_key,
+        )
         perspective_obj = state.get("perspective")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/modules/langgraph_nodes/judge.py` around lines 27 - 33, The
ChatGroq instance is created before the try block and uses direct indexing
state["groq_api_key"], which can raise uncaught KeyError; move the ChatGroq
construction (groq_llm = ChatGroq(...)) inside the existing try block and first
validate the key (e.g., check "groq_api_key" in state and that it is not empty)
before instantiation; update references to groq_llm in this scope so errors are
caught and return the node-level error response on failure.

perspective_obj = state.get("perspective")
text = getattr(perspective_obj, "perspective", "").strip()
Expand Down
5 changes: 2 additions & 3 deletions backend/app/modules/langgraph_nodes/sentiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,9 @@

load_dotenv()

client = Groq(api_key=os.getenv("GROQ_API_KEY"))


def run_sentiment_sdk(state):
client = Groq(api_key=state["groq_api_key"])
try:
Comment on lines +28 to 29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Move API-key access inside guarded error handling.

Line 28 can raise KeyError before the try block, bypassing your structured error response path.

Suggested fix
-def run_sentiment_sdk(state):
-    client = Groq(api_key=state["groq_api_key"])
-    try:
+def run_sentiment_sdk(state):
+    try:
+        api_key = state.get("groq_api_key")
+        if not api_key:
+            raise ValueError("Missing 'groq_api_key' in state")
+        client = Groq(api_key=api_key)
         text = state.get("cleaned_text")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/modules/langgraph_nodes/sentiment.py` around lines 28 - 29, The
Groq client instantiation uses state["groq_api_key"] outside the try block so a
missing key raises KeyError before your error handler; move the API-key access
into the existing try/except (or use state.get("groq_api_key") inside the try)
so the client = Groq(api_key=...) call is executed under the same error
handling, and add a clear error/raise path if the key is None—referencing the
Groq constructor, state dict and the current try block in sentiment.py.

text = state.get("cleaned_text")
if not text:
Expand All @@ -49,7 +48,7 @@ def run_sentiment_sdk(state):
),
},
],
model="gemma2-9b-it",
model=state.get("groq_model", "llama-3.3-70b-versatile"),
temperature=0.2,
max_tokens=3,
)
Expand Down
4 changes: 2 additions & 2 deletions backend/app/modules/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ def run_scraper_pipeline(url: str) -> dict:
return result


def run_langgraph_workflow(state: dict):
def run_langgraph_workflow(state: dict, api_key: str, groq_model: str = "llama-3.3-70b-versatile"):
"""Execute the pre-compiled LangGraph workflow."""
result = _LANGGRAPH_WORKFLOW.invoke(state)
result = _LANGGRAPH_WORKFLOW.invoke({**state, "groq_api_key": api_key, "groq_model": groq_model})
logger.info("LangGraph workflow executed successfully.")
return result
34 changes: 23 additions & 11 deletions backend/app/routes/routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
"""


from fastapi import APIRouter
from fastapi import APIRouter, Request, HTTPException
from pydantic import BaseModel
from app.modules.pipeline import run_scraper_pipeline
from app.modules.pipeline import run_langgraph_workflow
Expand All @@ -52,6 +52,7 @@ class URlRequest(BaseModel):

class ChatQuery(BaseModel):
message: str
article_text: str = ""


@router.get("/")
Expand All @@ -60,26 +61,37 @@ async def home():


@router.post("/bias")
async def bias_detection(request: URlRequest):
content = await asyncio.to_thread(run_scraper_pipeline, (request.url))
bias_score = await asyncio.to_thread(check_bias, (content))
async def bias_detection(url_request: URlRequest, request: Request):
api_key = request.headers.get("x-byok-api-key")
if not api_key:
raise HTTPException(status_code=401, detail="Missing X-BYOK-Api-Key header")
groq_model = request.headers.get("x-byok-model", "llama-3.3-70b-versatile")
content = await asyncio.to_thread(run_scraper_pipeline, url_request.url)
bias_score = await asyncio.to_thread(check_bias, content, api_key, groq_model)
Comment on lines +69 to +70
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pass cleaned article text to bias checker, not the whole scraper dict.

run_scraper_pipeline() returns a dict (cleaned_text, keywords), but check_bias() expects raw text. Current call sends the whole object, which can skew the prompt and output quality.

🔧 Suggested change
-    content = await asyncio.to_thread(run_scraper_pipeline, url_request.url)
-    bias_score = await asyncio.to_thread(check_bias, content, api_key, groq_model)
+    content = await asyncio.to_thread(run_scraper_pipeline, url_request.url)
+    cleaned_text = content.get("cleaned_text", "")
+    if not cleaned_text:
+        raise HTTPException(status_code=422, detail="Scraper returned empty article text")
+    bias_score = await asyncio.to_thread(check_bias, cleaned_text, api_key, groq_model)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
content = await asyncio.to_thread(run_scraper_pipeline, url_request.url)
bias_score = await asyncio.to_thread(check_bias, content, api_key, groq_model)
content = await asyncio.to_thread(run_scraper_pipeline, url_request.url)
cleaned_text = content.get("cleaned_text", "")
if not cleaned_text:
raise HTTPException(status_code=422, detail="Scraper returned empty article text")
bias_score = await asyncio.to_thread(check_bias, cleaned_text, api_key, groq_model)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/routes/routes.py` around lines 69 - 70, The call passes the
entire scraper result dict (content) into check_bias causing prompt issues;
update the route to extract the text field returned by run_scraper_pipeline
(e.g., use content["cleaned_text"] or content.get("cleaned_text")) and pass that
string to check_bias along with api_key and groq_model so check_bias receives
raw article text rather than the whole dict.

logger.info(f"Bias detection result: {bias_score}")
return bias_score


@router.post("/process")
async def run_pipelines(request: URlRequest):
article_text = await asyncio.to_thread(run_scraper_pipeline, (request.url))
async def run_pipelines(url_request: URlRequest, request: Request):
api_key = request.headers.get("x-byok-api-key")
if not api_key:
raise HTTPException(status_code=401, detail="Missing X-BYOK-Api-Key header")
groq_model = request.headers.get("x-byok-model", "llama-3.3-70b-versatile")
article_text = await asyncio.to_thread(run_scraper_pipeline, url_request.url)
logger.debug(f"Scraper output: {json.dumps(article_text, indent=2, ensure_ascii=False)}")
data = await asyncio.to_thread(run_langgraph_workflow, (article_text))
data = await asyncio.to_thread(run_langgraph_workflow, article_text, api_key, groq_model)
return data


@router.post("/chat")
async def answer_query(request: ChatQuery):
query = request.message
async def answer_query(chat_request: ChatQuery, request: Request):
api_key = request.headers.get("x-byok-api-key")
if not api_key:
raise HTTPException(status_code=401, detail="Missing X-BYOK-Api-Key header")
groq_model = request.headers.get("x-byok-model", "llama-3.3-70b-versatile")
query = chat_request.message
results = search_pinecone(query)
answer = ask_llm(query, results)
answer = ask_llm(query, results, api_key, groq_model, chat_request.article_text)
Comment on lines 94 to +95
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find backend -name "routes.py" -type f | head -5

Repository: AOSSIE-Org/Perspective

Length of output: 94


🏁 Script executed:

cat backend/app/routes/routes.py | head -100

Repository: AOSSIE-Org/Perspective

Length of output: 3748


🏁 Script executed:

cat backend/app/modules/chat/get_rag_data.py

Repository: AOSSIE-Org/Perspective

Length of output: 1533


🏁 Script executed:

cat backend/app/modules/chat/llm_processing.py

Repository: AOSSIE-Org/Perspective

Length of output: 2247


Move blocking network calls to thread pool in /chat route.

Lines 94-95 execute synchronous network I/O calls (search_pinecone and ask_llm) directly in an async function, which blocks the event loop under concurrent load. Wrap both calls with asyncio.to_thread() to prevent blocking, matching the pattern already used in the /bias and /process routes in the same file.

Suggested change
-    results = search_pinecone(query)
-    answer = ask_llm(query, results, api_key, groq_model, chat_request.article_text)
+    results = await asyncio.to_thread(search_pinecone, query)
+    answer = await asyncio.to_thread(
+        ask_llm, query, results, api_key, groq_model, chat_request.article_text
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
results = search_pinecone(query)
answer = ask_llm(query, results)
answer = ask_llm(query, results, api_key, groq_model, chat_request.article_text)
results = await asyncio.to_thread(search_pinecone, query)
answer = await asyncio.to_thread(
ask_llm, query, results, api_key, groq_model, chat_request.article_text
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/routes/routes.py` around lines 94 - 95, In the /chat route, avoid
blocking the event loop by running the synchronous network calls in the thread
pool: replace direct calls to search_pinecone(query) and ask_llm(query, results,
api_key, groq_model, chat_request.article_text) with asyncio.to_thread(...)
calls (i.e., await asyncio.to_thread(search_pinecone, query) and await
asyncio.to_thread(ask_llm, query, results, api_key, groq_model,
chat_request.article_text)); ensure asyncio is imported if not already and
preserve returned values as before.

logger.info(f"Chat answer generated: {answer}")

return {"answer": answer}
2 changes: 1 addition & 1 deletion backend/app/utils/fact_check_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,5 +76,5 @@ def run_fact_check_pipeline(state):
return [], "All claim searches failed or returned no results."

# Step 3: Verify facts using LLM
final = run_fact_verifier_sdk(search_results)
final = run_fact_verifier_sdk(search_results, state["groq_api_key"], state.get("groq_model", "llama-3.3-70b-versatile"))
return final.get("verifications", []), None
1 change: 1 addition & 0 deletions backend/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ dependencies = [
"groq>=0.28.0",
"langchain>=0.3.25",
"langchain-community>=0.3.25",
"langchain-google-genai>=2.1.12",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for actual usages of langchain-google-genai in Python sources
rg -n --type=py -C2 'langchain_google_genai|ChatGoogleGenerativeAI|GoogleGenerativeAI'

Repository: AOSSIE-Org/Perspective

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Also search for imports from langchain packages to get broader context
rg -n --type=py 'from langchain.*import|import langchain' | head -50

Repository: AOSSIE-Org/Perspective

Length of output: 428


🏁 Script executed:

#!/bin/bash
# Check the pyproject.toml to see all dependencies and any optional groups
cat backend/pyproject.toml

Repository: AOSSIE-Org/Perspective

Length of output: 780


🏁 Script executed:

#!/bin/bash
# Check for any references to google or genai (case-insensitive) in Python files
# to catch indirect references or commented code
rg -in --type=py 'google.*genai|genai.*google'

Repository: AOSSIE-Org/Perspective

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check if there are any imports in __init__.py or similar that might re-export it
fd -t f '\.py$' | xargs grep -l 'google_genai\|GoogleGenerativeAI' 2>/dev/null || echo "No matches found"

Repository: AOSSIE-Org/Perspective

Length of output: 82


Remove the unused langchain-google-genai dependency from backend/pyproject.toml.

The langchain-google-genai package (line 16) is not imported or used anywhere in the codebase. Since the PR's focus is on Groq BYOK integration (which uses langchain_groq), this dependency should be removed to reduce unnecessary supply-chain surface area.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/pyproject.toml` at line 16, Remove the unused dependency string
"langchain-google-genai>=2.1.12" from the project dependencies (the entry shown
in the diff) and update the project's dependency lockfile (poetry.lock or
equivalent) so the removed package is no longer pinned; ensure no import or
usage of langchain_google_genai remains in the codebase before committing the
change.

"langchain-groq>=0.3.2",
"langgraph>=0.4.8",
"logging>=0.4.9.6",
Expand Down
Loading