AI-Powered Software License Identification
This project is designed to identify and extract relevant chunks of text that pertain to software licenses using large language models (LLMs). It is particularly useful for developers and legal teams who need to quickly identify licensing information in large bodies of text.
To run this project, you need to install the required dependencies. The project uses Python 3.8+.
git clone https://github.com/Hero2323/GSoC-24.git
cd GSoC-24.git
You can install the required packages using the requirements.txt
file:
pip install -r requirements.txt
The project relies on the following libraries:
- loguru==0.7.2
- openai==1.31.1
- pandas==2.0.3
- tenacity==8.3.0
- ratelimit==2.2.1
- Nirjas==1.0.1
- fuzzywuzzy[speedup]>=0.18.0
- langchain_groq
- tiktoken
Ensure these are installed correctly before proceeding.
To run the LLM models, you need to set up your API keys for the relevant services. The project primarily uses the OpenAI API for its LLM capabilities.
- Visit GroQ, TogetherAI, etc. and sign up for an account if you don’t have one.
You can add your API key to your environment variables to keep it secure:
For Linux/MacOS:
export OPENAI_API_KEY='your-api-key-here'
For Windows:
set OPENAI_API_KEY='your-api-key-here'
Alternatively, you can create a .env
file in the project root directory and add your API key there:
OPENAI_API_KEY=your-api-key-here
from helpers.llm_client import LLMClient
from helpers.models import *
client = LLMClient()
client._infer(model = Models.GEMMA_2_9b, prompt = 'Hey, How are you?', temperature = 0.1)
More details can be found in the project-showcase notebook.