diff --git a/ling508/demos/demo_clex_importer.html b/ling508/demos/demo_clex_importer.html new file mode 100644 index 0000000..5f41a47 --- /dev/null +++ b/ling508/demos/demo_clex_importer.html @@ -0,0 +1,9139 @@ + + + + + +demo_clex_importer + + + + + + + + + + + + +
+
+ +
+
+ +
+
+ +
+
+ +
+
+ +
+
+ +
+
+ +
+
+ +
+
+ +
+
+ +
+
+ +
+
+ +
+ +
+
+ +
+
+ +
+ + +
+
+ +
+
+ +
+ + +
+
+ +
+ + +
+
+ +
+ + +
+
+ +
+ + +
+
+ +
+ + +
+
+ +
+
+ +
+ + +
+
+ +
+
+ +
+ + +
+
+ +
+ + +
+ +
+
+ +
+
+ +
+ + +
+
+ +
+ + +
+
+ +
+ + +
+
+ +
+ + +
+
+ +
+ + +
+ + +
+ + +
+
+ +
+
+ +
+
+ + diff --git a/ling508/demos/demo_clex_importer.ipynb b/ling508/demos/demo_clex_importer.ipynb index 234b9c5..7e65b7d 100644 --- a/ling508/demos/demo_clex_importer.ipynb +++ b/ling508/demos/demo_clex_importer.ipynb @@ -2,16 +2,24 @@ "cells": [ { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "# Demonstration of STIX-D's Clex Importer Tool" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "notes" + } + }, "source": [ - "## Introduction\n", + "**Speaker Notes for the Title Slide:**\n", "\n", "Welcome to this demonstration of the Clex Importer tool. The Clex Importer is a utility designed to populate the `lexicon` table in the STIX-D Corpus Database with entries from the Attempto Controlled English (ACE) common lexicon. ACE is a controlled natural language, enabling precise language processing for applications that require unambiguous interpretation by both humans and machines.\n", "\n", @@ -20,7 +28,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "## Agenda\n", "\n", @@ -39,24 +51,81 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "notes" + } + }, "source": [ - "## 1. Use \n", + "**Speaker Notes for the Agenda Slide:**\n", + "\n", + "- **Introduction**: This slide outlines the key points we'll cover in today's demonstration.\n", + " \n", + "- **Use Case**: We'll start by discussing the specific problem this tool addresses and the context in which it operates.\n", + "\n", + "- **Project Design**: Next, we'll dive into the overall architecture and design principles that guided the development of the Clex Importer tool.\n", + "\n", + "- **Code Interaction with the Database**: We'll explore how the tool interacts with the database to manage lexicon entries, focusing on the service and abstraction layers.\n", + "\n", + "- **Test Cases**: We'll review the comprehensive testing strategy, including unit tests, integration tests, and end-to-end tests, to ensure the tool's reliability.\n", + "\n", + "- **Code Execution**: Finally, we'll demonstrate how to run the tool, both via the command line and through a web interface, showcasing its functionality in different environments." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Use Case\n", + "\n", + "TODO: Add Bulleted List or Diagram" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "notes" + } + }, + "source": [ + "**Speaker Notes for the Use Case Slide:**\n", "\n", "The STIX-D Use Case L1 involves seeding the `stixd_corpus.lexicon` database table with lexical entries from the ACE Common Lexicon (Clex) or similar files. An administrator provides a URI to the lexicon file, and the system connects to the local database via the `mysql_repository.py` module. For each line in the lexicon file, the system extracts relevant character strings to create a word tag and form, generates a SHA256 hash of these components, and checks for the hash in the `lexicon` table. If the hash exists, it links the existing entry with the source ID; if not, it creates a new entry. The system also imports additional arguments into appropriate fields and outputs summary information or error messages as necessary.\n" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ - "## 2. Project Design\n", + "## Project Design\n", "\n", - "### Project Overview\n", + "TODO: Add Bulleted List or Diagram\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "notes" + } + }, + "source": [ + "**Speaker Notes for the Project Design Slide:**\n", + "\n", + "***Project Overview***\n", "\n", "The Clex Importer tool imports lexical entries from the Attempto Controlled English (ACE) lexicon file, stored as Prolog facts, into the `lexicon` table of the STIX-D MySQL database. This tool is accessible via the command line or a web form served by a Flask API, where users input a URL pointing to an ACE lexicon file. The system then parses each Prolog fact, maps it to the appropriate attributes in the `lexicon` table, and creates relevant entries in the `stix_objects` table (i.e., source documents) and the `obj_lex_jt` junction table.\n", "\n", - "### OOP Principles in the Project\n", + "***OOP Principles in the Project***\n", "\n", "This project employs object-oriented programming (OOP) principles to create a modular, extensible, and maintainable system. Key OOP principles include:\n", "\n", @@ -65,7 +134,7 @@ "- **Inheritance**: The project uses inheritance to create a hierarchy of classes with shared behavior. For example, `MySQLRepository` inherits from `Repository` to reuse common database interaction methods.\n", "- **Polymorphism**: The project uses polymorphism to allow different classes to be used interchangeably. For example, the `Repository` interface allows different types of repositories to be used with the `ClexImporter`.\n", "\n", - "### Key Modules and Their OOP Design\n", + "***Key Modules and Their OOP Design***\n", "\n", "The project consists of the following key modules, each designed using OOP principles:\n", "\n", @@ -98,21 +167,38 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ - "## 3. Code Interaction with the Database\n", + "## Interaction with the Database\n", "\n", - "### Database Interaction via `clex_importer.py` and the `MySQLRepository` Class\n", + "TODO: Add Bulleted List or Diagram\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "notes" + } + }, + "source": [ + "**Speaker Notes for the Data Interaction Slide:**\n", + "\n", + "***Database Interaction via `clex_importer.py` and the `MySQLRepository` Class***\n", "\n", "The `clex_importer.py` module interacts with the STIX-D Corpus Database through the `MySQLRepository` class, which abstracts the complexities of SQL operations and provides a streamlined interface for database tasks. This interaction ensures that lexicon entries are accurately imported and managed within the database, leveraging both services and abstraction layers.\n", "\n", - "#### A. **The `ClexImporter` Class as a Service Layer:**\n", + "A. **The `ClexImporter` Class as a Service Layer:**\n", " - **Purpose**: The `ClexImporter` class is responsible for reading and processing the ACE Common Lexicon (Clex) file and inserting or updating entries in the database.\n", " - **Workflow**:\n", " - The process begins by reading the Clex file and parsing each entry to extract relevant data.\n", " - `ClexImporter` interacts with the `MySQLRepository` to insert new records or update existing ones based on the parsed data.\n", "\n", - "#### B. **The `MySQLRepository` Class as an Abstraction Layer:**\n", + "B. **The `MySQLRepository` Class as an Abstraction Layer:**\n", " - **Purpose**: The `MySQLRepository` class abstracts the SQL operations needed to interact with the MySQL database, providing a clean interface for essential tasks like inserting records, querying data, and linking entries across tables.\n", " - **Services Provided**:\n", " - **Inserting STIX Objects**: The `save_stix_object` method inserts metadata about the Clex import into the `stix_objects` table, helping track the provenance and context of the lexicon entries.\n", @@ -120,12 +206,12 @@ " - **Linking Entries**: The `link_entry_with_stix` method associates lexicon entries with STIX objects by inserting records into the `obj_lex_jt` junction table, maintaining relationships between different data entities.\n", " - **Checking for Existing Entries**: The `find_entry_by_id` method checks the `lexicon` table to determine if an entry (identified by a unique hash) already exists, helping to avoid duplicates and maintain data integrity.\n", "\n", - "#### C. **Abstract Layers and Their Benefits:**\n", + "C. **Abstract Layers and Their Benefits:**\n", " - **Separation of Concerns**: By utilizing the `MySQLRepository` class, the `clex_importer.py` module does not directly handle SQL queries or database connections. Instead, it relies on high-level methods provided by the repository, allowing for easier modification or extension of database interactions without altering the core business logic.\n", " - **Reusability and Maintainability**: The abstraction provided by `MySQLRepository` facilitates the reuse of database interaction methods across different parts of the application, reducing code duplication and enhancing maintainability.\n", " - **Error Handling**: The repository class encapsulates error handling for database operations. If a SQL operation fails, the class handles exceptions gracefully, enabling `ClexImporter` to focus on the overall import process rather than the intricacies of database management.\n", "\n", - "#### D. **Example Workflow:**\n", + "D. **Example Workflow:**\n", " - **Inserting a New Lexicon Entry**:\n", " - `ClexImporter` processes a line from the Clex file, generating a unique hash for the entry.\n", " - It uses `find_entry_by_id` to check if the entry already exists in the `lexicon` table.\n", @@ -137,31 +223,28 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ - "## 4. Test Cases\n", - "\n", - "This section provides an overview of the testing strategy employed in the project, which includes unit, integration, and end-to-end tests. These tests ensure the reliability and correctness of the code by validating individual components, interactions between modules, and the complete system workflow from front-end to back-end.\n", + "## Test Cases\n", "\n", - "### Setup Notebook Environment\n", - "\n", - "Before running the tests, ensure that the necessary packages are installed and the required modules are imported. If running this notebook for the first time, uncomment and execute the provided code cell to install the necessary dependencies.\n" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# %pip install -r ../demos/requirements.txt" + "This section provides an overview of the testing strategy employed in the project, which includes unit, integration, and end-to-end tests. These tests ensure the reliability and correctness of the code by validating individual components, interactions between modules, and the complete system workflow from front-end to back-end.\n" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ - "### Import Necessary Libraries & Set Global Variables" + "### Setup Notebook Environment\n", + "\n", + "Before running the tests, ensure that the necessary packages are installed and the required modules are imported. If running this notebook for the first time, uncomment and execute the provided code cell to install the necessary dependencies.\n" ] }, { @@ -170,6 +253,8 @@ "metadata": {}, "outputs": [], "source": [ + "# %pip install -r ../demos/requirements.txt\n", + "\n", "# Import Standard Libraries\n", "import os, sys, pytest\n", "# from IPython.display import IFrame, display\n", @@ -181,30 +266,39 @@ "# Append the stixd directory to the Python path\n", "sys.path.append(stixd_path)\n", "\n", - "# Load Jupyter Notebook extensions\n", - "%load_ext sql\n", - "\n", "# Define Global Variables\n", "TEST_DIR = os.path.join(os.getcwd(), '../tests')\n", "VERBOSITY = '-q' # Quiet\n", - "TRACEBACK = '--tb=line' # One line" + "TRACEBACK = '--tb=line' # One line\n", + "\n", + "# Load Jupyter Notebook SQL extensions\n", + "%load_ext sql\n", + "# Connect to the database\n", + "%sql mysql+mysqlconnector://your_username:your_password@localhost:3306/stixd_corpus" ] }, { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ - "# Connect to the database\n", - "%sql mysql+mysqlconnector://your_username:your_password@localhost:3306/stixd_corpus" + "### All Test Cases\n", + "\n", + "TODO: Add Bulleted List or Diagram\n" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "notes" + } + }, "source": [ - "### All Test Cases\n", + "**Speaker Notes for the Data Interaction Slide:**\n", "\n", "You can execute all the tests in the STIX-D project using the command below. This command will run every test case in the test directory, providing a comprehensive check of the entire system in just 30-60 seconds. This is an efficient way to ensure all components of the project function as expected, especially after significant code changes.\n", "\n", @@ -242,14 +336,22 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "### Unit Tests" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "#### Test Case 1: `gen_clex_uuid`\n", "\n", @@ -298,7 +400,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "#### Test Case 2: `mysql_repo`\n", "\n", @@ -347,7 +453,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "#### Test Case 3: `lexicon_manager`\n", "\n", @@ -397,7 +507,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "#### Test Case 4: `clex_importer_local`\n", "\n", @@ -448,7 +562,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "#### Test Case 5: `clex_importer_ci`\n", "\n", @@ -496,7 +614,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "### Integration Tests\n", "\n", @@ -557,7 +679,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "### End-to-End Tests\n", "\n", @@ -566,7 +692,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "#### Test Case 7: `e2e_local`\n", "\n", @@ -616,7 +746,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "#### Test Case 8: `e2e_ci`\n", "\n", @@ -673,12 +807,26 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "## 5. Code \n", "\n", "In this section, we will explore how to execute the Clex Importer tool using both the command line interface (within this notebook) and a web interface (externally). The CLI provides direct control over the tool, while the web interface offers a more accessible way to import lexicon entries via a form. Before proceeding, we need to reset the database to ensure a clean state.\n", - "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ "### Reset the Database\n", "\n", " When running the code cell below to reset the database, you may encounter `Error: 1064 (42000)`. This error occurs because the MySQL `DELIMITER` command is not recognized by the `mysql.connector` library used in Python. Despite this error, the SQL script executes as intended. The error can be safely ignored.\n" @@ -707,7 +855,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "### Show Reset Database \n", "\n", @@ -750,7 +902,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "### Execution from Command Line\n", "\n", @@ -1028,7 +1184,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "### Show Database State After Code Execution\n", "\n", @@ -1077,7 +1237,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, "source": [ "We will also display a sample of the first five entries in each table to demonstrate the successful importation of Clex entries." ] @@ -1335,7 +1499,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "### Executing the Clex Importer via Web Form\n", "\n", @@ -1355,7 +1523,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "## Conclusion\n", "\n", diff --git a/ling508/demos/demo_clex_importer.pdf b/ling508/demos/demo_clex_importer.pdf new file mode 100644 index 0000000..1734dc2 Binary files /dev/null and b/ling508/demos/demo_clex_importer.pdf differ diff --git a/ling508/demos/demo_stixd_full.ipynb b/ling508/demos/drafts/demo_stixd_full.ipynb similarity index 100% rename from ling508/demos/demo_stixd_full.ipynb rename to ling508/demos/drafts/demo_stixd_full.ipynb diff --git a/ling508/demos/requirements.txt b/ling508/demos/requirements.txt index a01f97e..973ea78 100644 --- a/ling508/demos/requirements.txt +++ b/ling508/demos/requirements.txt @@ -8,4 +8,5 @@ nbconvert>=7.0.0 # For converting notebooks to HTML ipywidgets>=8.0.0 # For interactive widgets notebook>=7.0.0 # The Jupyter Notebook package nbformat>=5.0.0 # To manipulate notebook files +RISE>=5.7.0 # For creating interactive slideshows voila>=0.4.0 # (Optional) For rendering interactive notebooks as web apps