LangChain Ujeebu Integration (Node.js/TypeScript)

Official LangChain integration for Ujeebu Extract API - Extract clean, structured content from news articles and blog posts for use with Large Language Models (LLMs) and AI applications.

Features

Easy Integration: Seamlessly integrate Ujeebu Extract API with LangChain agents and chains
Document Loaders: Load articles as LangChain Documents for use with vector stores and retrievers
Agent Tools: Use Ujeebu Extract as a tool in LangChain agents
Rich Metadata: Extract article text, HTML, author, publication date, images, and more
Quick Mode: Optional fast extraction mode (30-60% faster)
TypeScript Support: Full TypeScript types and interfaces

What is Ujeebu Extract?

Ujeebu Extract converts news and blog articles into clean, structured JSON data. It extracts:

Clean article text and HTML
Author and publication date
Title and summary
Images and media
RSS feeds
Site metadata

Perfect for RAG (Retrieval-Augmented Generation) applications, content analysis, and LLM training data.

Installation

npm install @ujeebu-org/langchain
# or
yarn add @ujeebu-org/langchain
# or
pnpm add @ujeebu-org/langchain

Requirements

Node.js 16.0 or higher
@langchain/core 0.3.0 or higher
An Ujeebu API key (Get one here)

Quick Start

Set up your API key

export UJEEBU_API_KEY="your-api-key"

Or set it programmatically:

process.env.UJEEBU_API_KEY = "your-api-key";

Using as an Agent Tool

npm install @ujeebu-org/langchain @langchain/core @langchain/openai @langchain/langgraph langchain

import { UjeebuExtractTool } from '@ujeebu-org/langchain';
import { createAgent } from 'langchain';
import { ChatOpenAI } from '@langchain/openai';

// Initialize the tool
const ujeebuTool = new UjeebuExtractTool();

// Create an agent
const model = new ChatOpenAI({ temperature: 0 });
const agent = createAgent({ model, tools: [ujeebuTool] });

// Use the agent
const response = await agent.invoke({
  messages: [{ role: 'user', content: 'Extract the article from https://example.com/article and summarize it' }],
});
console.log(response.messages[response.messages.length - 1].content);

Using the Document Loader

npm install @ujeebu-org/langchain @langchain/core @langchain/openai @langchain/community

import { UjeebuLoader } from '@ujeebu-org/langchain';
import { FaissStore } from '@langchain/community/vectorstores/faiss';
import { OpenAIEmbeddings } from '@langchain/openai';

// Load articles
const loader = new UjeebuLoader({
  urls: [
    'https://example.com/article1',
    'https://example.com/article2',
    'https://example.com/article3'
  ]
});
const documents = await loader.load();

// Create a vector store
const embeddings = new OpenAIEmbeddings();
const vectorStore = await FaissStore.fromDocuments(documents, embeddings);

// Query the documents
const results = await vectorStore.similaritySearch('What are the main topics?');

Usage Examples

Basic Article Extraction

import { UjeebuExtractTool } from '@ujeebu-org/langchain';

const tool = new UjeebuExtractTool();
const result = await tool.invoke({
  url: 'https://example.com/article',
  text: true,
  author: true,
  pub_date: true
});
console.log(result);

Extract with Images

import { UjeebuExtractTool } from '@ujeebu-org/langchain';

const tool = new UjeebuExtractTool();
const result = await tool.invoke({
  url: 'https://example.com/article',
  images: true  // Extract article images
});

Quick Mode for Faster Extraction

import { UjeebuLoader } from '@ujeebu-org/langchain';

const loader = new UjeebuLoader({
  urls: ['https://example.com/article'],
  quickMode: true  // 30-60% faster, slightly less accurate
});
const documents = await loader.load();

Load with HTML Content

import { UjeebuLoader } from '@ujeebu-org/langchain';

const loader = new UjeebuLoader({
  urls: ['https://example.com/article'],
  extractHtml: true,   // Include HTML content
  extractImages: true  // Include images
});
const documents = await loader.load();

// Access metadata
const doc = documents[0];
console.log(`Title: ${doc.metadata.title}`);
console.log(`Author: ${doc.metadata.author}`);
console.log(`Images: ${doc.metadata.images}`);

Build a QA System

import { UjeebuLoader } from '@ujeebu-org/langchain';
import { FaissStore } from '@langchain/community/vectorstores/faiss';
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';

// Load articles
const loader = new UjeebuLoader({
  urls: [
    'https://example.com/article1',
    'https://example.com/article2'
  ]
});
const documents = await loader.load();

// Create vector store
const embeddings = new OpenAIEmbeddings();
const vectorStore = await FaissStore.fromDocuments(documents, embeddings);
const retriever = vectorStore.asRetriever();

// Retrieve and answer
const query = 'What are the main points?';
const relevantDocs = await retriever.invoke(query);
const context = relevantDocs.map((doc) => doc.pageContent).join('\n\n');

const prompt = ChatPromptTemplate.fromMessages([
  ['system', 'Answer based on the following context:\n\n{context}'],
  ['human', '{question}'],
]);

const chain = prompt.pipe(new ChatOpenAI({ temperature: 0 })).pipe(new StringOutputParser());
const answer = await chain.invoke({ context, question: query });
console.log(answer);

API Reference

UjeebuExtractTool

A LangChain tool for extracting article content.

Constructor Parameters:

apiKey (string, optional): Ujeebu API key. Defaults to UJEEBU_API_KEY environment variable.
baseUrl (string, optional): Custom API endpoint URL.

Tool Parameters (UjeebuExtractInput):

url (string, required): URL of the article to extract
text (boolean, optional): Extract article text (default: true)
html (boolean, optional): Extract article HTML (default: false)
author (boolean, optional): Extract article author (default: true)
pub_date (boolean, optional): Extract publication date (default: true)
images (boolean, optional): Extract images (default: false)
quick_mode (boolean, optional): Use quick mode for faster extraction (default: false)

UjeebuLoader

A LangChain document loader for articles.

Constructor Parameters (UjeebuLoaderParams):

urls (string[], required): List of article URLs to load
apiKey (string, optional): Ujeebu API key
extractText (boolean, optional): Extract article text (default: true)
extractHtml (boolean, optional): Extract article HTML (default: false)
extractAuthor (boolean, optional): Extract author (default: true)
extractPubDate (boolean, optional): Extract publication date (default: true)
extractImages (boolean, optional): Extract images (default: false)
quickMode (boolean, optional): Use quick mode (default: false)
baseUrl (string, optional): Custom API endpoint URL

Methods:

load(): Promise<Document[]> - Load all documents

Document Metadata:

source: Original URL
url: Resolved URL
canonical_url: Canonical URL
title: Article title
author: Article author
pub_date: Publication date
language: Article language
site_name: Site name
summary: Article summary
image: Main image URL
images: List of all image URLs (if extractImages=true)

Advanced Usage

Custom API Endpoint

import { UjeebuLoader } from '@ujeebu-org/langchain';

const loader = new UjeebuLoader({
  urls: ['https://example.com/article'],
  baseUrl: 'https://custom-api.ujeebu.com/extract'
});

Error Handling

import { UjeebuLoader } from '@ujeebu-org/langchain';

const loader = new UjeebuLoader({
  urls: ['https://example.com/article']
});

try {
  const documents = await loader.load();
  console.log(`Loaded ${documents.length} documents`);
} catch (error) {
  if (error instanceof Error) {
    console.error(`Error: ${error.message}`);
  }
}

Testing

Run the test suite:

# Install dependencies
npm install

# Build the package
npm run build

# Run tests
npm test

# Run tests with coverage
npm run test:coverage

# Run linting
npm run lint

# Format code
npm run format

Examples

Check out the examples directory for more usage examples:

agent-example.ts - Using Ujeebu with LangChain agents
document-loader-example.ts - Using the document loader with vector stores

Pricing

Ujeebu Extract API pricing is based on usage. Check the pricing page for details.

Support

Documentation: https://ujeebu.com/docs/extract
API Reference: https://ujeebu.com/docs
Support: support@ujeebu.com
GitHub Issues: Report a bug

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Related Projects

LangChain - Build applications with LLMs through composability
Ujeebu API - Web scraping and content extraction API

Changelog

0.2.0

Upgrade to LangChain 1.x compatibility
Import BaseDocumentLoader from @langchain/core instead of langchain
Only @langchain/core required as peer dependency (not langchain)
Update examples to use @langchain/langgraph for agents

0.1.0

Initial release
UjeebuExtractTool for LangChain agents
UjeebuLoader document loader
Full TypeScript support
Comprehensive test coverage
Complete documentation

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
examples		examples
scripts		scripts
src		src
tests		tests
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.cjs.json		tsconfig.cjs.json
tsconfig.esm.json		tsconfig.esm.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

LangChain Ujeebu Integration (Node.js/TypeScript)

Features

What is Ujeebu Extract?

Installation

Requirements

Quick Start

Set up your API key

Using as an Agent Tool

Using the Document Loader

Usage Examples

Basic Article Extraction

Extract with Images

Quick Mode for Faster Extraction

Load with HTML Content

Build a QA System

API Reference

UjeebuExtractTool

UjeebuLoader

Advanced Usage

Custom API Endpoint

Error Handling

Testing

Examples

Pricing

Support

Contributing

License

Related Projects

Changelog

0.2.0

0.1.0

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages