Add initial version of sales insights workshop

rashedtalukder · Sep 18, 2024 · 4da72af · 4da72af
1 parent ee8c25a
commit 4da72af
Show file tree

Hide file tree

Showing 30 changed files with 765 additions and 0 deletions.
diff --git a/sales_insights/README.md b/sales_insights/README.md
@@ -0,0 +1,137 @@
+# Semantic Kernel Workshop: Natural Language (NL) to SQL Query Generation using Azure OpenAI (GPT-4 model)
+
+## This workshop will focus on below customer use-case:
+
+A contoso company would like to buid and AI Assistant that can be used by their non technical business users to get information on their product sales data residing in a SQL database using a natural language interaction. So for the purpose of this workshop we will use an Azure SQL databse which will host some sample product sales data that we will query with natural language using SK and the power of LLMs.  
+
+In this reworkshop we demonstrate how to use [Semantic Kernel](https://github.com/microsoft/semantic-kernel) to convert Natural Language (NL) to SQL Query using Azure OpenAI (GPT-4 model).
+
+Semantic Kernel is an exciting framework and a powerful tool that can be used for several applications, including chatbots, virtual assistants, and more. 
+
+This is a great way to make your data more accessible to non-technical users, and to make your applications more user-friendly.
+
+Below are the main components of the Semantic Kernel:
+
+![Orchestrating plugins with planner](./images/sk-kernel.png)
+
+In the example of this repo, we developed the following plugin:
+
+- **nlpToSqlPlugin**: This plugin is responsible for converting the Natural Language (NL) to SQL Query using Azure OpenAI (GPT-4 model).
+
+As part of the plugin, we developed skills throught the use of [prompts](https://learn.microsoft.com/en-us/semantic-kernel/prompts/). The following skills were developed:
+
+- **ConvertNLPToSQL**: This skill is responsible for converting the Natural Language (NL) to SQL Query using Azure OpenAI (GPT-4 model).
+- **MakeSQLCompatible**: This skill is responsible for making the SQL Query compatible with the Transact-SQL syntax.
+- **WriteResponse**: This skill is responsible for writing the response to the user.
+
+We also developed a [Native Function](https://learn.microsoft.com/en-us/semantic-kernel/agents/plugins/using-the-kernelfunction-decorator?tabs=python) to be able to interact with the database:
+
+- **QueryDb**: This function is responsible for querying the database and returning the result.
+
+With that, we can create a "Copilot like" experience, where the user can ask questions and the system will generate the SQL Query and return the result.
+
+As our plugin has a lot of skills, we also developed a [Sequential Planner](https://learn.microsoft.com/en-us/semantic-kernel/agents/plugins/using-the-kernelfunction-decorator?tabs=python) to orchestrate the skills:
+
+With the planner, we can orchestrate the skills in a sequence, so the system can generate the SQL Query and return the result.
+
+The final result is a system that can convert Natural Language (NL) to SQL Query using Azure OpenAI (GPT-4 model).
+
+## Requirements
+
+- You must have a Pay-As-You-Go Azure account with administrator - or contributor-level access to your subscription. If you don't have an account, you can sign up for an account following the instructions.
+- Get Access to [Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/overview)
+- Once got approved create an Azure OpenAI in you Azure's subcription.
+- Python 3.11
+- You must have an Azure SQL Database with the tables and data you want to query. In this repo, we will use the a Sample database with some tables.
+- You can use [generate-sample-sql-data](sql-data/generate-sample-sql-data.py) script to create and populate the tables with some sample data.
+- Make sure you use both SQL Authentication and Microsoft Entra suthentication for your SQL Server for the purpose of this demo. You can disable Microsoft Entra auth only option under settings --> Microsoft Entra ID. Just uncheck the "Support only Microsoft Entra authentication for this server" option. 
+
+## Install Required Libraries (Python Dependencies)
+Install required libraries as listed in the requirements.txt. The libraries here provide accelerated development by reusing code.
+
+1. With the repository open in VSCode, and from within the terminal/command prompt window in VSCode, navigate to **your project** directory.
+2. Making sure that your conda environment is active first, enter the the following to read the contents of requirements.txt and install using the pip package mangement tool for Python:
+```
+pip install -r requirements.txt
+```
+
+
+```python
+semantic-kernel==0.5.1.dev0
+python-dotenv==1.0.0
+openai==1.12.0
+Faker==23.2.1
+pyodbc==5.1.0
+```
+
+## Create .env file
+
+```
+CONNECTION_STRING=
+AZURE_OPENAI_DEPLOYMENT_NAME=
+AZURE_OPENAI_ENDPOINT=
+AZURE_OPENAI_API_KEY=
+```
+
+*Make sure that the CONNECTION_STRING you pick is the one for the ODBC connection. It should start with Driver={ODBC Driver 18 for SQL Server};
+
+## Quick Start
+
+- Run `sql-data/generate-sample-sql-data.py` script to create and populate the tables with some sample data
+- Run `main.py` to run the sample asks/questions detailed below.
+
+## Sample - Questions/Asks
+
+Below are some sample questions/asks that can be asked to the system and the responses that the system will generate.
+These responses can be different based on the data in the database. If you use our generate-fake-data script, you may have different responses given the random nature of the data.
+
+**Question/Ask 01**: I want to know how many transactions in the last 3 months
+
+*Response*: According to the database query, the number of transactions is 26 (actual nuber will vary for your run).
+
+---
+
+**Question/Ask 02**: Give me the name of the best seller in terms of sales volume in the whole period
+
+*Response*: The seller's name according to the database query is John Doe.
+
+---
+
+**Question/Ask 03**: Which product has the highest sales volume in the last month
+
+*Response*: According to the database query, the total sales volume for the product 'Nike Air Force 1' is 28.
+
+---
+
+## Adapt to your own data
+
+Feel free to adapt the code to your own data. You can use your own data and modify the code to fit your needs.
+
+- Replace the connection string in the `.env` file with your own connection string.
+- Replace the Azure OpenAI API key and endpoint in the `.env` file with your own API key and endpoint.
+- Replace the table's structure in [ConvertNLPToSQL](plugins/nlpToSqlPlugin/ConvertNLPToSQL/skprompt.txt) Plugin with your own table's structure.
+
+
+## Bonus Excersise: Create Prompt Flow to run and evaluate Semantic Kernal Planner
+Once the setup is complete, you can conveniently convert your existing Semantic Kernel planner to a prompt flow by following the steps below:
+
+- Create a folder "promptflow" under the current directory. You can name the folder of your choice
+- Right click the folder and select new flow in this directory and create a blank or standard flow
+- Select the + Python icon to create a new Python node.
+- Name it "rag-on-sql-sk-planner" or the planner name of your choice
+- use the sk_rag_on_sql_planner.py as the code file
+- copy the "plugins" directory from the project root directory here to use the plugins as reference
+- create a custom connetion and name it "custom_connection"
+- Add "AZURE_OPENAI_API_BASE", "AZURE_OPENAI_API_KEY" and "SQL_CONNECTION_STRING" to the custom connection and save it. 
+- Define the input and output of the planner node.
+- Set the flow input and output (refer flow.dag.yaml) 
+- test the flow with single test run with some default input
+
+
+
+## References
+
+- <https://learn.microsoft.com/en-us/semantic-kernel/overview/>
+- <https://techcommunity.microsoft.com/t5/analytics-on-azure-blog/revolutionizing-sql-queries-with-azure-open-ai-and-semantic/ba-p/3913513>
+- <https://github.com/microsoft/semantic-kernel>
+- <https://medium.com/@ranjithkumar.panjabikesanind/orchestrate-ai-and-achieve-goals-combining-semantic-kernel-sequential-planner-openai-chatgpt-d23cf5c8f98d>
diff --git a/sales_insights/images/semantic-function-explainer.png b/sales_insights/images/semantic-function-explainer.png
diff --git a/sales_insights/images/sk-kernel.png b/sales_insights/images/sk-kernel.png
diff --git a/sales_insights/images/the-planner.png b/sales_insights/images/the-planner.png
diff --git a/sales_insights/main.py b/sales_insights/main.py
@@ -0,0 +1,64 @@
+import semantic_kernel as sk
+import asyncio
+from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
+from semantic_kernel.planning import SequentialPlanner
+from dotenv import load_dotenv
+import os
+from plugins.QueryDb import queryDb as plugin
+from semantic_kernel.planning import StepwisePlanner
+import json
+
+# Take environment variables from .env.
+load_dotenv()
+
+async def main(nlp_input):
+
+    kernel = sk.Kernel()
+
+    # Get AOAI settings from .env
+    deployment, api_key, endpoint = sk.azure_openai_settings_from_dot_env()
+
+    # Set the deployment name to the value of your chat model   
+    azure_text_service = AzureChatCompletion(deployment_name=deployment, endpoint=endpoint, api_key=api_key)
+    kernel.add_text_completion_service("azure_text_completion", azure_text_service)
+
+    # Immport NLP to SQL Plugin
+    plugins_directory = "plugins"
+    kernel.import_semantic_plugin_from_directory(plugins_directory, "nlpToSqlPlugin")
+    kernel.import_plugin(plugin.QueryDbPlugin(os.getenv("CONNECTION_STRING")), plugin_name="QueryDbPlugin")
+
+    # create an instance of sequential planner
+    planner = SequentialPlanner(kernel)    
+
+    # Create a plan with the NLP input (the ask for which the sequential planner is going to find a relevant function.)
+    ask = f"Create a SQL query according to the following request: {nlp_input} and query the database to get the result."   
+
+    #ask the sequential planner to identify a suitable function from the list of functions available.
+    plan = await planner.create_plan(goal=ask)   
+
+    # Invoke the plan and get the result
+    result = await plan.invoke(kernel=kernel)   
+
+
+    print('/n')
+    print(f'User ASK: {nlp_input}')
+    print(f'Response: {result}')    
+    print('/n')
+
+    # Print each step of the plan and its result
+    for index, step in enumerate(plan._steps):
+        print("Step:", index)
+        print("Description:", step.description)
+        print("Function:", step.plugin_name + "." + step._function.name)
+        if len(step._outputs) > 0:
+            print("  Output:\n", str.replace(result[step._outputs[0]], "\n", "\n  "))
+            print("\n\n")
+
+
+# Run the main function
+if __name__ == "__main__":
+    import asyncio
+
+    #asyncio.run(main("I want to know how many transactions in the last 3 months"))
+    asyncio.run(main("Give me the name of the best seller in terms of sales volume in the whole period"))
+    #asyncio.run(main("Which product has the highest sales volume in the last month"))
diff --git a/sales_insights/plugins/QueryDb/__init__.py b/sales_insights/plugins/QueryDb/__init__.py
diff --git a/sales_insights/plugins/QueryDb/queryDb.py b/sales_insights/plugins/QueryDb/queryDb.py
@@ -0,0 +1,53 @@
+from semantic_kernel.plugin_definition import kernel_function, kernel_function_context_parameter
+from semantic_kernel import KernelContext
+
+import pyodbc
+
+
+class QueryDbPlugin:
+    """
+    Description: Get the result of a SQL query
+    """
+    def __init__(self, connection_string: str) -> None:
+        self._connection_string = connection_string
+
+    @staticmethod
+    def __clean_sql_query__(sql_query):
+        sql_query = sql_query.replace(";", "")
+        sql_query = sql_query.replace("/n ", " ")
+
+        return sql_query  
+
+    @kernel_function(name="query_db",
+                     description="Query a database using a SQL query")
+    @kernel_function_context_parameter(name="input",
+                                       description="SQL Query to be executed")
+
+    def query_db(self, context: KernelContext) -> str:         
+        # Connect to the SQL Server database
+        conn = pyodbc.connect(self._connection_string)
+
+        # Create a cursor object to execute SQL queries
+        cursor = conn.cursor()
+
+        try:            
+            cursor.execute(self.__clean_sql_query__(context["input"]))
+            #result = cursor.fetchone()
+
+            # Get the column names from cursor.description
+            columns = [column[0] for column in cursor.description]
+
+            # Initialize an empty list to store the results as dictionaries
+            results = []
+
+            # Fetch all rows and create dictionaries
+            for row in cursor.fetchall():
+                results.append(dict(zip(columns, row)))
+
+            context["result"] = results
+        except Exception as e:
+            return f"Error: {e}"
+        cursor.close()
+        conn.close()
+
+        return str(results)
diff --git a/sales_insights/plugins/nlpToSqlPlugin/ConvertNLPToSQL/config.json b/sales_insights/plugins/nlpToSqlPlugin/ConvertNLPToSQL/config.json
@@ -0,0 +1,22 @@
+{
+    "schema": 1,
+    "description": "Write SQL queries given a Natural Language description",
+    "execution_settings": {
+      "default": {
+        "max_tokens": 4000,
+        "temperature": 0.0,
+        "top_p": 0.0,
+        "presence_penalty": 0.0,
+        "frequency_penalty": 0.0
+      }
+    },
+    "input": {
+      "parameters": [
+        {
+          "name": "input",
+          "description": "Define the Natural Language input text",
+          "defaultValue": ""
+        }
+      ]
+    }
+  }
diff --git a/sales_insights/plugins/nlpToSqlPlugin/ConvertNLPToSQL/skprompt.txt b/sales_insights/plugins/nlpToSqlPlugin/ConvertNLPToSQL/skprompt.txt
@@ -0,0 +1,16 @@
+You are an expert at writing SQL queries throught a given Natural Language description of the OBJECTIVE. 
+---
+{{$input}}
+---
+
+You will generate a SQL SELECT query that is compatible with Transact-SQL and achieves the given OBJECTIVE. 
+You use only the tables and views described in following SCHEMA:
+
+CREATE TABLE products (product_id INT PRIMARY KEY, product_name VARCHAR(100), product_description TEXT, product_price DECIMAL(10, 2), product_category VARCHAR(50), in_stock BIT);
+
+CREATE TABLE sellers (seller_id INT PRIMARY KEY, seller_name VARCHAR(100), seller_email VARCHAR(100), seller_contact_number VARCHAR(15), seller_address TEXT);
+
+CREATE TABLE sales_transaction (transaction_id INT PRIMARY KEY, product_id INT, seller_id INT, quantity INT, transaction_date DATE, FOREIGN KEY (product_id) REFERENCES products(product_id), FOREIGN KEY (seller_id) REFERENCES sellers(seller_id));
+
+The output must be a SQL SELECT query that achieves the OBJECTIVE.
+Use Transact-SQL syntax to write the query compatible with Microsoft SQL Server and Azure SQL Database.
diff --git a/sales_insights/plugins/nlpToSqlPlugin/MakeSQLCompatible/config.json b/sales_insights/plugins/nlpToSqlPlugin/MakeSQLCompatible/config.json
@@ -0,0 +1,22 @@
+{
+    "schema": 1,
+    "description": "Convert SQL queries in any ANSI SQL dialect to a Transact-SQL dialect",
+    "execution_settings": {
+      "default": {
+        "max_tokens": 4000,
+        "temperature": 0.0,
+        "top_p": 0.0,
+        "presence_penalty": 0.0,
+        "frequency_penalty": 0.0
+      }
+    },
+    "input": {
+      "parameters": [
+        {
+          "name": "input",
+          "description": "Define the Natural Language input text",
+          "defaultValue": ""
+        }
+      ]
+    }
+  }
diff --git a/sales_insights/plugins/nlpToSqlPlugin/MakeSQLCompatible/skprompt.txt b/sales_insights/plugins/nlpToSqlPlugin/MakeSQLCompatible/skprompt.txt
@@ -0,0 +1,12 @@
+You are an expert at converting SQL queries in any ANSI SQL dialect to a Transact-SQL dialect. 
+---
+{{$input}}
+---
+You will be presented with a SQL query in any ANSI SQL dialect and you will need to convert it to Transact-SQL dialect.
+You can convert SQL queries from any dialect to Transact-SQL dialect, for example, from MySQL to SQL Server, from Oracle to SQL Server, from PostgreSQL to  SQL Server, etc.
+You always need to convert the SQL query to the latest version Transact-SQL dialect compatible with Microsoft SQL Server and Azure SQL Database.
+If the given SQL query is already in Transact-SQL dialect, you only return the same query.
+---
+Use the following format to return the SQL query in Transact-SQL dialect:
+T-SQL: SELECT * FROM table_name;
+T-SQL: 
diff --git a/sales_insights/plugins/nlpToSqlPlugin/WriteResponse/config.json b/sales_insights/plugins/nlpToSqlPlugin/WriteResponse/config.json
@@ -0,0 +1,22 @@
+{
+    "schema": 1,
+    "description": "Write a friendly response given a database query result",
+    "execution_settings": {
+      "default": {
+        "max_tokens": 4000,
+        "temperature": 0.0,
+        "top_p": 0.0,
+        "presence_penalty": 0.0,
+        "frequency_penalty": 0.0
+      }
+    },
+    "input": {
+      "parameters": [
+        {
+          "name": "input",
+          "description": "Database query result",
+          "defaultValue": ""
+        }
+      ]
+    }
+  }
diff --git a/sales_insights/plugins/nlpToSqlPlugin/WriteResponse/skprompt.txt b/sales_insights/plugins/nlpToSqlPlugin/WriteResponse/skprompt.txt
@@ -0,0 +1,12 @@
+The user has provided a Natural Language description of the OBJECTIVE
+---
+{{$input}}
+---
+Your goal is to create a response to the end user based on the OBJECTIVE.
+The response should be formulated based on the information returned from the database and the original user input.
+
+Ex: 
+Response: [{'NumberOfTransactions': 30}]
+Message -> According to the database query the number of transactions is 30.
+
+"""
diff --git a/sales_insights/promptflow/rag-on-sql-sk-planner/.gitignore b/sales_insights/promptflow/rag-on-sql-sk-planner/.gitignore
@@ -0,0 +1,5 @@
+.env
+__pycache__/
+.promptflow/*
+!.promptflow/flow.tools.json
+.runs/