feat: implement RandomForest-based sales forecasting with multi-format data extraction by VasanSoundararajan · Pull Request #44 · Vaibhav2154/PyEveryday

VasanSoundararajan · 2025-08-26T08:24:14Z

Added RandomForestRegressor as the primary model for sales forecasting
(replacing Linear Regression) to better capture non-linear patterns.
Implemented extraction functions for CSV, PDF, DOCX, and image files
(via pdfminer.six, python-docx, and pytesseract).
Added robust parsing with regex patterns for multiple date formats and sales numbers.
Introduced CLI interface with commands for handling different file types.
Added forecast plotting with matplotlib (historical vs predicted sales).
Improved logging for better traceability of data extraction, parsing, and training steps.

…ssion

github-actions · 2025-08-26T08:24:24Z

Thanks for creating a PR for your Issue! ☺️

We'll review it as soon as possible.
In the meantime, please double-check the file changes and ensure that all commits are accurate.

If there are any unresolved review comments, feel free to resolve them. 🙌🏼

vercel · 2025-08-26T08:24:29Z

@VasanSoundararajan is attempting to deploy a commit to the vaibhav2154's projects Team on Vercel.

A member of the Team first needs to authorize it.

aleenaharoldpeter

@VasanSoundararajan Hey! I think it might be cleaner to put the read functions in PyEveryday/tree/main/backend/scripts/data_tools/. That way, you can just call them in your ML code and focus only on the ML logic. @Vaibhav2154 Please correct me if I’m off here!

aleenaharoldpeter

Nice work on this refactor, @VasanSoundararajan 🙌, the split definitely makes SalesPredictor cleaner already.

Quick thought (@Vaibhav2154 ): do you think it might be even cleaner if we also moved PDF/DOCX/Image extractors into data_tools, so that all data ingestion lives in one place and SalesPredictor only handles training/prediction logic? That would make the separation of concerns super clear and keep ML logic focused.

That said, Vasan — please don’t make any changes yet 🙏 since @Vaibhav2154 hasn’t had a chance to weigh in. Just wanted to put this out there for discussion once everyone’s had a chance to look.

VasanSoundararajan · 2025-08-26T17:31:09Z

Ok I will do it sooner @aleenaharoldpeter but i think so the data tool is for all implementing little stuff that can be inside of each files

aleenaharoldpeter · 2025-08-26T17:34:38Z

Nice work on this refactor, @VasanSoundararajan 🙌, the split definitely makes SalesPredictor cleaner already.

Quick thought (@Vaibhav2154 ): do you think it might be even cleaner if we also moved PDF/DOCX/Image extractors into data_tools, so that all data ingestion lives in one place and SalesPredictor only handles training/prediction logic? That would make the separation of concerns super clear and keep ML logic focused.

That said, Vasan — please don’t make any changes yet 🙏 since @Vaibhav2154 hasn’t had a chance to weigh in. Just wanted to put this out there for discussion once everyone’s had a chance to look.

Ok I will do it sooner

Hey, btw I just noticed I had a normal CSV read in converter and chunked in processor — I’ll refactor that so it’s consistent. So give me a little time on that. And yeah, I think you’re right, just wanted @Vaibhav2154’s input too so we’re all on the same page.

Add feature of ML Prediction for a sales data from Random foret regre…

a606e95

…ssion

aleenaharoldpeter reviewed Aug 26, 2025

View reviewed changes

feat: recorrected the file importations from predefined data_tools

92bac87

aleenaharoldpeter reviewed Aug 26, 2025

View reviewed changes

aleenaharoldpeter mentioned this pull request Aug 26, 2025

Data tools cleanup: consolidate read methods + chunking logic #47

Closed

Vaibhav2154 added 2 ⭐⭐ medium OSCI labels Aug 27, 2025

Vaibhav2154 merged commit 8998e5f into Vaibhav2154:main Aug 27, 2025
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement RandomForest-based sales forecasting with multi-format data extraction#44

feat: implement RandomForest-based sales forecasting with multi-format data extraction#44
Vaibhav2154 merged 2 commits intoVaibhav2154:mainfrom
VasanSoundararajan:MLFeaures

VasanSoundararajan commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

vercel bot commented Aug 26, 2025

Uh oh!

aleenaharoldpeter left a comment

Uh oh!

aleenaharoldpeter left a comment •

edited

Loading

Uh oh!

VasanSoundararajan commented Aug 26, 2025 •

edited

Loading

Uh oh!

aleenaharoldpeter commented Aug 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

VasanSoundararajan commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025

Thanks for creating a PR for your Issue! ☺️

Uh oh!

vercel bot commented Aug 26, 2025

Uh oh!

aleenaharoldpeter left a comment

Choose a reason for hiding this comment

Uh oh!

aleenaharoldpeter left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VasanSoundararajan commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aleenaharoldpeter commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aleenaharoldpeter left a comment •

edited

Loading

VasanSoundararajan commented Aug 26, 2025 •

edited

Loading

aleenaharoldpeter commented Aug 26, 2025 •

edited

Loading