Skip to content

Commit 7f846b0

Browse files
authored
Merge pull request #32 from mad-cat-lon/performance-improvements
Performance enhancements
2 parents bbef6ae + 8e0f169 commit 7f846b0

File tree

6 files changed

+237
-67
lines changed

6 files changed

+237
-67
lines changed

core/models.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@ class URL(BaseModel):
66
url: str
77

88

9+
class ScrapedURLs(BaseModel):
10+
urls: List[str]
11+
source_url: str
12+
13+
914
class SourceDocument(BaseModel):
1015
service: str
1116
url: str

core/prompts.py

Lines changed: 56 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
from langchain.prompts import StringPromptTemplate
22
from pydantic import BaseModel, validator
33

4-
PROMPT = """
4+
RAG_PROMPT = """
55
<|system|>
6-
You are an expert lawyer analyzing terms of service agreements. Given a statement about the service and 4 pieces of text extracted from its documents, pick the number of the text that directly answers the query in its entirety. Output a valid JSON object containing the choice of text and concise reasoning. If none of the texts can explicitly answer the statement, return 0. If there is a text that answers the question, set the "answer" field to true. In all other cases, set it to false.
7-
Here are some examples:
6+
You are an expert lawyer analyzing terms of service agreements for a website (called "service") Given a query statement and 4 pieces of text extracted from the service's documents, pick the number of the text that directly answers the query in its entirety. Output a valid JSON object containing the choice of text and concise reasoning. If none of the texts can explicitly answer the statement, return 0. If there is a text that answers the question, set the "answer" field to true. In all other cases, set it to false. DO NOT IMPLY ANYTHING NOT GIVEN IN THE TEXT.
7+
8+
Here are some examples:
89
910
Given the statement "You sign away all moral rights", which of the following texts, if any, answer it fully?
1011
@@ -41,9 +42,6 @@
4142
* Location information
4243
* Log data
4344
* Information from cookie data and similar technologies (To find out more about how we use cookies, please see our Cookie Policy)
44-
* Device information
45-
* Usage data and inferences
46-
* User choices
4745
```
4846
2)
4947
```
@@ -55,9 +53,6 @@
5553
When we use cookies to learn about your behavior on or off of our services, we
5654
or our partners will obtain consent that we may need under applicable law. To
5755
find out more about how we use cookies, please see our Cookie Policy.
58-
Additional Info for EEA, Swiss and UK Data Subjects: Legal bases we rely on
59-
where we use your information
60-
The below section only applies for residents in the EEA, Switzerland, and UK.
6156
```
6257
4)
6358
```
@@ -81,7 +76,7 @@
8176
}}
8277
</s>
8378
<|user|>
84-
Given the statement "{query}", which text provides enough context to explicitly answer the entire statement? Do not infer or imply anything not provided in the texts. Answer with a single JSON object as demonstrated above.
79+
Given the statement "{query}", which text provides enough context to explicitly answer the entire statement? Answer with a single JSON object as demonstrated above. DO NOT IMPLY ANYTHING NOT GIVEN IN THE TEXT.
8580
1)
8681
```
8782
{result1}
@@ -102,7 +97,56 @@
10297
<|assistant|>
10398
"""
10499

105-
n_results = 4
100+
DOC_PROMPT = """
101+
<|user|>
102+
Respond with a JSON object with all the URLs that are likely to contain the terms and conditions,
103+
user agreements, cookie policy, privacy policy etc. for {source} like so:
104+
{{
105+
"valid_urls": ["https://example.com/terms", "https://example.com/legal/cookies"]
106+
}}
107+
Here are the URLs.
108+
{urls}
109+
</s>
110+
<|assistant|>
111+
"""
112+
113+
VERIFY_PROMPT = """
114+
<|user|>
115+
Given a statement about the service {service} and a piece of text that answers it, respond with a JSON object indicating if the statement is true or false like so:
116+
{{
117+
"statement": bool
118+
}}
119+
Statement:
120+
{statement}
121+
Text:
122+
{text}
123+
</s>
124+
<|assistant|>
125+
"""
126+
127+
128+
class VerifyStatementPromptTemplate(StringPromptTemplate, BaseModel):
129+
def format(self, **kwargs) -> str:
130+
prompt = VERIFY_PROMPT.format(
131+
service=kwargs["service"],
132+
statement=kwargs["case"],
133+
text=kwargs["text"]
134+
)
135+
return prompt
136+
137+
138+
class DocClassifierPromptTemplate(StringPromptTemplate, BaseModel):
139+
"""
140+
Determine from the title and source domain of a document discovered by the linkFinder content script
141+
whether is is likely to be a terms and conditions document or not
142+
"""
143+
def format(self, **kwargs) -> str:
144+
prompt = DOC_PROMPT.format(
145+
urls=kwargs["urls"],
146+
source=kwargs["source"]
147+
)
148+
return prompt
149+
106150

107151

108152
class RAGQueryPromptTemplate(StringPromptTemplate, BaseModel):
@@ -112,7 +156,7 @@ class RAGQueryPromptTemplate(StringPromptTemplate, BaseModel):
112156
"""
113157

114158
def format(self, **kwargs) -> str:
115-
prompt = PROMPT.format(
159+
prompt = RAG_PROMPT.format(
116160
query=kwargs["query"],
117161
result1=kwargs["results"][0],
118162
result2=kwargs["results"][1],

0 commit comments

Comments
 (0)