An application to recommend articles from a Wordpress blog based on user's query using semantic search

Install Poetry - Version 1.4.2

Virtual environment management with Poetry

Create Python virtual environment using Poetry: poetry install
Start Poetry shell: poetry shell
Exit Poetry shell: exit

How to run the application

See the make commands as described in the steps below

Pre-requisites:

Python and Poetry are set up
make is installed to be able to run commands from terminal, else run the commands from the make file directly in the terminal after supplying the required arguments
Adjust the below environment variables before running any of the below steps if required:
- DATA_LOC: Location where all the application data will be stored. By default, it is stored in a data folder in the current directory. This folder will be created if it does not exist during creation.
- HuggingFace key (free) to use their inference endpoints to generate embeddings using Sentence Transformer models. Export the key to an environment variable called HF_KEY before creating or querying the index as shown in the steps below.
- Override the default HuggingFace embeddings model by specifying the name in the environment variable HF_EMBEDDINGS_MODEL_NAME. If not provided, the sentence-transformers/all-MiniLM-l6-v2 model is used to generate embeddings.

Executing tests:

Set the values of the environment variables inside a .env file by copying from the template file
Execute all tests via: poetry run pytest tests/

Steps

Download all content from the target Wordpress blog, e.g.: make download-content sitemap_url="https://learnwoo.com/post-sitemap1.xml"
Create index after content is downloaded, e.g.: export HF_KEY="hf_ywerwe..." then make generate-index sitemap_url="https://learnwoo.com/post-sitemap1.xml"
Query index, e.g.: export HF_KEY="hf_ywerwe..." then make query-index sitemap_url="https://learnwoo.com/post-sitemap1.xml" query="What is web scraping mostly used for?" top=3
TODO

Roadmap

Implement download content given a Wordpress sitemap and write downloaded content to a CSV file for further processing
Use a combination of langchain for content/text splitting options, sentence transformers to generate embeddings and Chroma to store the embeddings with metadata. Apply this on the downloaded content and store the Chroma index to disk for now.
Create a query application, that given some metadata filters and text query will provide a ranked list of articles (e.g. top 5) from the Blog
Evaluate the ranked list of articles that are returned by the approach
Enhance the application to create social media posts using Large Language Models that use the contents from the blog
Create a hosted product that can be used by others via Wordpress plugins

Technical enhancements

Publish as a package to PyPI
Add tests
Write a build pipeline and use GitHub actions to trigger build process on code or tag push and PR

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
src/wordpress_recommender		src/wordpress_recommender
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An application to recommend articles from a Wordpress blog based on user's query using semantic search

Install Poetry - Version 1.4.2

Virtual environment management with Poetry

How to run the application

Pre-requisites:

Executing tests:

Steps

Roadmap

Technical enhancements

About

Releases

Packages

Languages

License

ace-racer/wordpress-recommender

Folders and files

Latest commit

History

Repository files navigation

An application to recommend articles from a Wordpress blog based on user's query using semantic search

Install Poetry - Version 1.4.2

Virtual environment management with Poetry

How to run the application

Pre-requisites:

Executing tests:

Steps

Roadmap

Technical enhancements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages