feat: implement RandomForest-based sales forecasting with multi-format data extraction#44
Conversation
Thanks for creating a PR for your Issue!
|
|
@VasanSoundararajan is attempting to deploy a commit to the vaibhav2154's projects Team on Vercel. A member of the Team first needs to authorize it. |
aleenaharoldpeter
left a comment
There was a problem hiding this comment.
@VasanSoundararajan Hey! I think it might be cleaner to put the read functions in PyEveryday/tree/main/backend/scripts/data_tools/. That way, you can just call them in your ML code and focus only on the ML logic. @Vaibhav2154 Please correct me if I’m off here!
There was a problem hiding this comment.
Nice work on this refactor, @VasanSoundararajan 🙌, the split definitely makes SalesPredictor cleaner already.
Quick thought (@Vaibhav2154 ): do you think it might be even cleaner if we also moved PDF/DOCX/Image extractors into data_tools, so that all data ingestion lives in one place and SalesPredictor only handles training/prediction logic? That would make the separation of concerns super clear and keep ML logic focused.
That said, Vasan — please don’t make any changes yet 🙏 since @Vaibhav2154 hasn’t had a chance to weigh in. Just wanted to put this out there for discussion once everyone’s had a chance to look.
|
Ok I will do it sooner @aleenaharoldpeter but i think so the data tool is for all implementing little stuff that can be inside of each files |
Hey, btw I just noticed I had a normal CSV read in converter and chunked in processor — I’ll refactor that so it’s consistent. So give me a little time on that. And yeah, I think you’re right, just wanted @Vaibhav2154’s input too so we’re all on the same page. |
(replacing Linear Regression) to better capture non-linear patterns.
(via pdfminer.six, python-docx, and pytesseract).