Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add comments and badges #35

Merged
merged 4 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@
# Serverless ChatGPT with RAG using LangChain.js

[![Open project in GitHub Codespaces](https://img.shields.io/badge/Codespaces-Open-blue?style=flat-square&logo=github)](https://codespaces.new/Azure-Samples/serverless-chat-langchainjs?hide_repo_select=true&ref=main)
![Node version](https://img.shields.io/badge/Node.js->=20-grass?style=flat-square)
[![Build Status](https://img.shields.io/github/actions/workflow/status/Azure-Samples/serverless-chat-langchainjs/build-test.yaml?style=flat-square&label=Build)](https://github.com/Azure-Samples/serverless-chat-langchainjs/actions)
![Node version](https://img.shields.io/badge/Node.js->=20-3c873a?style=flat-square)
[![Ollama + Mistral](https://img.shields.io/badge/Ollama-Mistral-ff7000?style=flat-square)](https://ollama.com/library/mistral)
[![TypeScript](https://img.shields.io/badge/TypeScript-blue?style=flat-square&logo=typescript&logoColor=white)](https://www.typescriptlang.org)
[![License](https://img.shields.io/badge/License-MIT-orange?style=flat-square)](LICENSE)
[![License](https://img.shields.io/badge/License-MIT-yellow?style=flat-square)](LICENSE)

<!-- [![Build Status](https://img.shields.io/github/actions/workflow/status/Azure-Samples/serverless-chat-langchainjs/build?style=flat-square)](https://github.com/Azure-Samples/serverless-chat-langchainjs/actions) -->
<!-- [![Watch how to use this sample on YouTube](https://img.shields.io/badge/YouTube-Watch-d95652.svg?style=flat-square&logo=youtube)]() -->

:star: If you like this sample, star it on GitHub — it helps a lot!
Expand Down
5 changes: 5 additions & 0 deletions packages/api/src/functions/chat-post.ts
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ export async function postChat(request: HttpRequest, context: InvocationContext)
let store: VectorStore;

if (azureOpenAiEndpoint) {
// Initialize models and vector database
embeddings = new AzureOpenAIEmbeddings();
model = new AzureChatOpenAI();
store = new AzureCosmosDBVectorStore(embeddings, {});
Expand All @@ -66,6 +67,7 @@ export async function postChat(request: HttpRequest, context: InvocationContext)
store = await FaissStore.load(faissStoreFolder, embeddings);
}

// Create the chain that combines the prompt with the documents
const combineDocsChain = await createStuffDocumentsChain({
llm: model,
prompt: ChatPromptTemplate.fromMessages([
Expand All @@ -74,6 +76,8 @@ export async function postChat(request: HttpRequest, context: InvocationContext)
]),
documentPrompt: PromptTemplate.fromTemplate('{filename}: {page_content}\n'),
});

// Create the chain to retrieve the documents from the database
const chain = await createRetrievalChain({
retriever: store.asRetriever(),
combineDocsChain,
Expand All @@ -96,6 +100,7 @@ export async function postChat(request: HttpRequest, context: InvocationContext)
}
}

// Transform the response chunks into a JSON stream
function createStream(chunks: AsyncIterable<{ context: Document[]; answer: string }>) {
const buffer = new Readable({
read() {},
Expand Down
1 change: 1 addition & 0 deletions packages/api/src/functions/documents-get.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ async function getDocument(request: HttpRequest, context: InvocationContext): Pr
let fileData: Uint8Array;

if (connectionString && containerName) {
// Retrieve the file from Azure Blob Storage
context.log(`Reading blob from: "${containerName}/${fileName}"`);
const blobServiceClient = BlobServiceClient.fromConnectionString(connectionString);
const containerClient = blobServiceClient.getContainerClient(containerName);
Expand Down
4 changes: 4 additions & 0 deletions packages/api/src/functions/documents-post.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,21 @@ export async function postDocuments(request: HttpRequest, context: InvocationCon
const file = parsedForm.get('file') as File;
const filename = file.name;

// Extract text from the PDF
const loader = new PDFLoader(file, {
splitPages: false,
});
const rawDocument = await loader.load();
rawDocument[0].metadata.filename = filename;

// Split the text into smaller chunks
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1500,
chunkOverlap: 100,
});
const documents = await splitter.splitDocuments(rawDocument);

// Generate embeddings and save in database
if (azureOpenAiEndpoint) {
const store = await AzureCosmosDBVectorStore.fromDocuments(documents, new AzureOpenAIEmbeddings(), {});
await store.createIndex();
Expand All @@ -50,6 +53,7 @@ export async function postDocuments(request: HttpRequest, context: InvocationCon
}

if (connectionString && containerName) {
// Upload the PDF file to Azure Blob Storage
context.log(`Uploading file to blob storage: "${containerName}/${filename}"`);
const blobServiceClient = BlobServiceClient.fromConnectionString(connectionString);
const containerClient = blobServiceClient.getContainerClient(containerName);
Expand Down
7 changes: 7 additions & 0 deletions scripts/upload-documents.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
import fs from 'node:fs/promises';
import path from 'node:path';

// This script uploads all PDF files from the 'data' folder to the ingestion API.
// It does a Node.js equivalent of this bash script:
// ```
// for file in data/*.pdf; do
// curl -X POST -F "file=@$file" <api_url>/api/documents
// done
// ```
async function uploadDocuments(apiUrl, dataFolder) {
try {
const files = await fs.readdir(dataFolder);
Expand Down