Semantic Search with Language Model

This repository contains a Python script for concatenating broken sentences in a text and performing text retrieval using embeddings. It also demonstrates the use of the Cohere API for embeddings and reranking.

Prerequisites

Before using this code, make sure you have the following prerequisites:

Python 3.x
Cohere API Key
Annoy Python library (pip install annoy)
Pandas Python library (pip install pandas)
Numpy Python library (pip install numpy)
Cohere Python library (pip install cohere-api)

Getting Started

Clone this repository to your local machine: git clone https://github.com/ttng51/LanguageModel.git
Install the required Python libraries listed in the Prerequisites section.
Replace the api_key variable with your Cohere API key.
Replace the text variable with your desired text for analysis.
Run the Python script: jupyter notebook DenseRetrieval&ReRanking.ipynb

Usage

Sentence Concatenation

The script takes a text containing broken sentences and concatenates them to form complete sentences. It does this by checking if a sentence ends with specific characters (e.g., digits or a dollar sign) and combines it with the following sentence if necessary.

Text Retrieval

The script performs text retrieval using the Cohere API for embeddings. It creates an Annoy index based on the embeddings of the sentences and allows you to search for similar sentences to a query.

Run the search(query) function to perform a dense retrieval of similar sentences based on the query.

Run the query and MODEL_NAME variables to perform query and rerank using the Cohere API. This demonstrates how to use Cohere for text reranking.

Code Explanation

concatenate_broken_sentences: A function that concatenates broken sentences in a text.

co.embed: Uses the Cohere API to obtain embeddings for the text.

AnnoyIndex: Creates an Annoy index for text retrieval based on embeddings.

search(query): Searches for similar sentences to a query using the Annoy index.

co.rerank: Reranks the text using the Cohere API.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DenseRetrieval&ReRanking.ipynb		DenseRetrieval&ReRanking.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic Search with Language Model

Table of Contents

Prerequisites

Getting Started

Usage

Sentence Concatenation

Text Retrieval

Code Explanation

License

About

Uh oh!

Releases

Packages

Languages

License

ttng51/LanguageModel

Folders and files

Latest commit

History

Repository files navigation

Semantic Search with Language Model

Table of Contents

Prerequisites

Getting Started

Usage

Sentence Concatenation

Text Retrieval

Code Explanation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages