Skip to content

feat: implement RandomForest-based sales forecasting with multi-format data extraction#44

Merged
Vaibhav2154 merged 2 commits intoVaibhav2154:mainfrom
VasanSoundararajan:MLFeaures
Aug 27, 2025
Merged

feat: implement RandomForest-based sales forecasting with multi-format data extraction#44
Vaibhav2154 merged 2 commits intoVaibhav2154:mainfrom
VasanSoundararajan:MLFeaures

Conversation

@VasanSoundararajan
Copy link
Contributor

  • Added RandomForestRegressor as the primary model for sales forecasting
    (replacing Linear Regression) to better capture non-linear patterns.
  • Implemented extraction functions for CSV, PDF, DOCX, and image files
    (via pdfminer.six, python-docx, and pytesseract).
  • Added robust parsing with regex patterns for multiple date formats and sales numbers.
  • Introduced CLI interface with commands for handling different file types.
  • Added forecast plotting with matplotlib (historical vs predicted sales).
  • Improved logging for better traceability of data extraction, parsing, and training steps.

@github-actions
Copy link

Thanks for creating a PR for your Issue! ☺️

We'll review it as soon as possible.
In the meantime, please double-check the file changes and ensure that all commits are accurate.

If there are any unresolved review comments, feel free to resolve them. 🙌🏼

@vercel
Copy link

vercel bot commented Aug 26, 2025

@VasanSoundararajan is attempting to deploy a commit to the vaibhav2154's projects Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@aleenaharoldpeter aleenaharoldpeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VasanSoundararajan Hey! I think it might be cleaner to put the read functions in PyEveryday/tree/main/backend/scripts/data_tools/. That way, you can just call them in your ML code and focus only on the ML logic. @Vaibhav2154 Please correct me if I’m off here!

Copy link
Contributor

@aleenaharoldpeter aleenaharoldpeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this refactor, @VasanSoundararajan 🙌, the split definitely makes SalesPredictor cleaner already.

Quick thought (@Vaibhav2154 ): do you think it might be even cleaner if we also moved PDF/DOCX/Image extractors into data_tools, so that all data ingestion lives in one place and SalesPredictor only handles training/prediction logic? That would make the separation of concerns super clear and keep ML logic focused.

That said, Vasan — please don’t make any changes yet 🙏 since @Vaibhav2154 hasn’t had a chance to weigh in. Just wanted to put this out there for discussion once everyone’s had a chance to look.

@VasanSoundararajan
Copy link
Contributor Author

VasanSoundararajan commented Aug 26, 2025

Ok I will do it sooner @aleenaharoldpeter but i think so the data tool is for all implementing little stuff that can be inside of each files

@aleenaharoldpeter
Copy link
Contributor

aleenaharoldpeter commented Aug 26, 2025

Nice work on this refactor, @VasanSoundararajan 🙌, the split definitely makes SalesPredictor cleaner already.

Quick thought (@Vaibhav2154 ): do you think it might be even cleaner if we also moved PDF/DOCX/Image extractors into data_tools, so that all data ingestion lives in one place and SalesPredictor only handles training/prediction logic? That would make the separation of concerns super clear and keep ML logic focused.

That said, Vasan — please don’t make any changes yet 🙏 since @Vaibhav2154 hasn’t had a chance to weigh in. Just wanted to put this out there for discussion once everyone’s had a chance to look.

Ok I will do it sooner

Hey, btw I just noticed I had a normal CSV read in converter and chunked in processor — I’ll refactor that so it’s consistent. So give me a little time on that. And yeah, I think you’re right, just wanted @Vaibhav2154’s input too so we’re all on the same page.

@Vaibhav2154 Vaibhav2154 merged commit 8998e5f into Vaibhav2154:main Aug 27, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants