This folder contains the experiments and evaluation of the OpenAI Prompt that are performed using the Azure ML PromptFlow.
This contains the following folders:
- Data: Contains the ground truth data that is used for experiments and bulk evaluation.
- Evaluation: Contains the evaluation results of the OpenAI Prompt, which contains three Out of the Box evaluations provided by the PromptFlow.
F1 Score
: Standard F1 score calculated in the basis of the ground truth data and the result given by the OpenAI Prompt.GPT Similarity
: GPT Similarity is a metric that measures the similarity between the ground truth data and the result given by the OpenAI Prompt.GPT Relevance
: GPT Relevance is a metric that is used when getting ground truth data is difficult. It measures the relevance of the result given by the OpenAI Prompt. For example the OpenAI Prompt is given a news context and and a question related to news, then the score is calculated based on the relevance of the answer given by the OpenAI Prompt and the context given.