Skip to content

Latest commit

 

History

History
47 lines (30 loc) · 2.35 KB

README.md

File metadata and controls

47 lines (30 loc) · 2.35 KB

Information Retrieval Project 📚🔍

GitHub stars GitHub forks GitHub issues

Welcome to the Information Retrieval repository! This project focuses on web scraping from Wildberries and implementing advanced techniques for content vectorization and multimodal embeddings.

🌟 Features

  • Wildberries Scraper: Utilizes web scraping techniques to extract data from Wildberries, as detailed in wb_scraper.ipynb.
  • Content Vectorization: Implements methods to convert textual content into numerical vectors for machine learning.
  • Multimodal Embeddings: Creates embeddings that combine different types of data (text, images, etc.) for richer representations.

🛠️ Getting Started

To get started with the Information Retrieval project, follow these steps:

  1. Clone the Repository:
    git clone https://github.com/ivanovsdesign/information_retrieval.git
    
  2. Navigate to the Project Directory:
cd information_retrieval
  1. Explore the Notebooks:

    • Open wb_scraper.ipynb to learn how to scrape data from Wildberries.

    • Open wb_content_vect_colab.ipynb to understand the workflow for content vectorization and creating multimodal embeddings.

📜 Disclaimer

This project is intended for educational and research purposes. The author and contributors do not condone or support the misuse of this scraper to violate the terms of service of Wildberries. Users are solely responsible for ensuring their use of this tool complies with all applicable laws and terms of service.

🤝 Contributing

Contributions are welcome! Please read the CONTRIBUTING.md for details on how to contribute to this project.

📄 License

This project is licensed under the MIT License.

📬 Contact

For questions or feedback, please open an issue on GitHub.

🌈 Thank you for visiting the repository! If you find this project helpful, please consider starring it to show your support. Happy coding! 🚀