You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Think step by step and describe in 10-20 words your reasoning for choosing these scores.
236
-
Tasks are totally independant. Today is ${today_is()}.
235
+
- Think step by step and describe in 10-20 words your reasoning for choosing these scores.
236
+
- Tasks are totally independant.
237
+
- Today is ${today_is()}.
237
238
238
239
## Task: Identify the Building of the Chunk
240
+
239
241
${
240
242
context.query?.named_entities.building!==null
241
-
? `
242
-
243
-
Assess whether the chunk explicitly or implicitly refers to a building (e.g., building, house, residence, housing program, etc.) with a name resembling “${context.query?.named_entities.building}”. Be cautious of cases where the name might also belong to a person, such as distinguishing between a person named “Jean Racine” and a house called “Racine”:
244
-
243
+
? `Assess whether the chunk explicitly or implicitly refers to a building (e.g., building, house, residence, housing program, etc.) with a name resembling “${context.query?.named_entities.building}”. Be cautious of cases where the name might also belong to a person, such as distinguishing between a person named “Jean Racine” and a house called “Racine”:
245
244
- Score 0: The chunk mention another building.
246
245
- Score 1000: The chunk explicitly and unambiguously discusses the building in question.
247
246
- Score 1-999: The reference to a building is unclear, ambiguous, or only loosely connected, requiring further clarification.
@@ -253,10 +252,7 @@ ${
253
252
254
253
${
255
254
context.query?.named_entities.process!==null
256
-
? `
257
-
258
-
Evaluate whether the chunk pertains to the process “${context.query?.named_entities.process}”, specifically focusing on the company’s procedures, workflows, or standard ways of working. This includes assessing if the chunk describes, references, or aligns with the operational or organizational methods associated with the specified process:
259
-
255
+
? `Evaluate whether the chunk pertains to the process “${context.query?.named_entities.process}”, specifically focusing on the company’s procedures, workflows, or standard ways of working. This includes assessing if the chunk describes, references, or aligns with the operational or organizational methods associated with the specified process:
260
256
- Score 0: The chunk mention another process or no process is evocated
261
257
- Score 1000: The chunk explicitly addresses the process with high accuracy.
262
258
- Score 1-999: Partial or nuanced relevance (e.g., related processes but not an exact match). Example: “Une panne d’ascenseur” is distinct from “Un locataire bloqué dans l’ascenseur”.
@@ -266,9 +262,10 @@ Evaluate whether the chunk pertains to the process “${context.query?.named_ent
266
262
267
263
## Task: Evaluate Overall Relevance
268
264
269
-
Assess the chunk’s alignment with the user query, considering that relevant answers might be explicit, implicit, or require interpretation from the context. Keep in mind partial relevance is still highly valuable. Use the following criteria:
270
-
1. Answer Precision
271
-
- Determines if the chunk addresses the query intent, even partially.
265
+
Assess the chunk’s alignment with the user query, considering that relevant answers might be explicit, implicit, or require interpretation from the context. Use the following criteria:
266
+
267
+
1. Direct Answer Precision
268
+
- Determines if the chunk directly addresses the query intent.
272
269
- Evaluates how clear, comprehensive, and unambiguous the response is.
273
270
2. Semantic Matching
274
271
- Examines the degree of alignment between the query’s intent and the chunk’s content.
@@ -278,11 +275,13 @@ Assess the chunk’s alignment with the user query, considering that relevant an
278
275
- Assesses how actionable, precise, and unambiguous the information is, even if the answer is partially hidden within broader context.
279
276
280
277
Scoring Scale:
281
-
- **1000 (Perfect Match)**: The chunk fully addresses the query, offers clear and comprehensive information, provides multiple confirmatory points, and includes actionable insights.
282
-
- **500-999 (Partial Match)**: The chunk contains at least part of the answer to the query. The score should reflect the degree of alignment, detail, and clarity. **If the chunk contains any relevant part of the answer, it must score at least 500**.
283
-
- **0-499 (Low or No Match)**: The chunk shows limited or no connection to the query. Scores in this range should be reserved for cases where relevance is unclear or missing entirely. Before assigning a score below 500, carefully re-check the chunk for overlooked relevance or implicit connections.
278
+
- 1000 (Perfect Match): Fully answers the query, provides multiple confirmatory points, comprehensive explanation, and actionable information.
279
+
- 1-999 (Partial Match): Varies based on the degree of alignment, detail, and clarity.
280
+
- 0 (No Match): No relevant connection to the query.
281
+
282
+
**For low scores (<500), re-check the chunk for **overlooked** relevance.**
284
283
285
-
Please proceed with your analysis and evaluation of the given query and chunk
284
+
Please proceed with your analysis and evaluation of the given query and chunk.
0 commit comments