Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python code to Generate Report on Validation and Credibility of Datasets #108

Open
Gladwin001 opened this issue Oct 2, 2023 · 9 comments

Comments

@Gladwin001
Copy link
Collaborator

Description about Issue

As users downloads dataset for their project, We try to give more understanding and clear overview about the datasets they are downloading in a Report format thus the user be feed with idea about how to use dataset for their own project in effective way.

Expected Behavior

we expect,

  1. More Statistical Analysis about Datasets
  2. How it's values are present and their Distributions over plot
  3. Check for corruption and Mismatch of data
  4. Suggestions to which kind of project the dataset will suit
  5. Suggestions on preprocessing of datasets for effective usage in project.

Expect to generate report with respect to it's format like CSV,JSON,txt etc...

Current Behavior

In Validation folder in Main.py we implement some of the previously mentioned, you can also view Report.txt for sample report we generated.

Contributions

You can Implement features one by one and then make a pull request to us.
Expect your Valuable Contributions and collaborations

@Ayushlion8
Copy link
Contributor

Ok from your issue description I understood that you want

  1. the code line which would give statistical information to the user regarding all the features of the dataset like mean, count, etc.

  2. Visualization of the features based on the target variable on a plot.

  3. Any kinds of missing values, format issues basically feature engineering to improve the dataset.

  4. On the basis of the features and the target, judging the projects for which the dataset would be useful.

So if I get your intentions right, can you please assign this issue to me:)

@Gladwin001
Copy link
Collaborator Author

Thank you for your Volunteer @Ayushlion8 , You can try out with any single features at start

@Ayushlion8
Copy link
Contributor

Ok @Gladwin001 you mean to say I have to do all sorts of feature engineering and data preprocessing on one independent feature

So from a dataset I'll choose one feature and write LOC for that and then add that file into one folder or directly create a PR for that..

@VigneshRamanathan101
Copy link
Contributor

Thank you for your Volunteer @Ayushlion8 , You can try out with any single features at start

I would suggest breaking this issue into small issues so it can be handled by 2 or 3 contributors.

I also interested in contributing to this issue.

@neokd
Copy link
Owner

neokd commented Oct 2, 2023

@VigneshRamanathan101 and @Ayushlion8 you can break this issue into smaller issues and proceed

@Bchass
Copy link
Contributor

Bchass commented Oct 2, 2023

@Ayushlion8 @VigneshRamanathan101 started on this before the issue was originally created. Feel free to work off what I've already done: #105

@neokd
Copy link
Owner

neokd commented Oct 4, 2023

@Ayushlion8 @VigneshRamanathan101 any updates on the issue?

@Ayushlion8
Copy link
Contributor

@neokd modifications are going on, will update you soon with the PR.
Thanks for your patience :)

@neokd
Copy link
Owner

neokd commented Oct 6, 2023

@Ayushlion8 Yeah sure

@Ayushlion8 Ayushlion8 mentioned this issue Oct 6, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants