Experimentation and Evaluation

This folder contains the experiments and evaluation of the OpenAI Prompt that are performed using the Azure ML PromptFlow.

This contains the following folders:

Data: Contains the ground truth data that is used for experiments and bulk evaluation.
Evaluation: Contains the evaluation results of the OpenAI Prompt, which contains three Out of the Box evaluations provided by the PromptFlow.
- F1 Score: Standard F1 score calculated in the basis of the ground truth data and the result given by the OpenAI Prompt.
- GPT Similarity: GPT Similarity is a metric that measures the similarity between the ground truth data and the result given by the OpenAI Prompt.
- GPT Relevance: GPT Relevance is a metric that is used when getting ground truth data is difficult. It measures the relevance of the result given by the OpenAI Prompt. For example the OpenAI Prompt is given a news context and and a question related to news, then the score is calculated based on the relevance of the answer given by the OpenAI Prompt and the context given.

Provide feedback