Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: stabilize Table class #979

Merged
merged 78 commits into from
Jan 12, 2025
Merged

feat: stabilize Table class #979

merged 78 commits into from
Jan 12, 2025

Conversation

lars-reimann
Copy link
Member

@lars-reimann lars-reimann commented Jan 12, 2025

Closes #875
Closes #877
Closes partially #977

Summary of Changes

Stabilize the API of the Table class. This PR introduces several breaking changes to this class:

  • All optional parameters are now keyword-only, so we can reposition them later.
  • The data parameter of __init__ is now required.
  • Rename remove_columns_except to select_columns
    • The new method can also be called with a callback that determines which columns to select.
  • Rename add_table_as_columns to add_tables_as_columns
    • Multiple tables can now be passed at once.
  • Rename add_table_as_rows to add_tables_as_rows
    • Multiple tables can now be passed at once.

It also adds new functionality throughout the library:

  • New method Table.add_index_column to add a new column with auto-incrementing integer values to a table.
  • New method Table.filter_rows to keep only the rows matched by some predicate.
  • New method Table.filter_rows_by_column to keep only the rows that have a value in a specific column that matches some predicate.
  • New parameter random_seed for Table.shuffle_rows and Table.split_rows to control the pseudorandom number generator. Previously, the methods were deterministic, but the seed was hidden.
  • New parameter missing_value_ratio_threshold of Table.remove_columns_with_missing_values to be able to keep columns with only a few missing values.
  • Various static factory methods under ColumnType to instantiate column types. This prepares for Overwrite specified schema (selectively) #754.

Finally, the methods Table.summarize_statistics and Column.summarize_statistics are now considerably faster.

@lars-reimann lars-reimann requested a review from a team as a code owner January 12, 2025 18:17
@lars-reimann lars-reimann linked an issue Jan 12, 2025 that may be closed by this pull request
56 tasks
@lars-reimann lars-reimann changed the title feat; stabilize Table class feat: stabilize Table class Jan 12, 2025
Copy link
Contributor

github-actions bot commented Jan 12, 2025

🦙 MegaLinter status: ✅ SUCCESS

Descriptor Linter Files Fixed Errors Elapsed time
✅ JSON jsonlint 3 0 0.17s
✅ JSON npm-package-json-lint yes no 0.34s
✅ JSON prettier 3 0 0 1.22s
✅ JSON v8r 3 0 2.94s
✅ PYTHON black 277 0 0 5.96s
✅ PYTHON mypy 277 0 6.33s
✅ PYTHON ruff 277 0 0 0.33s
✅ REPOSITORY git_diff yes no 0.38s

See detailed report in MegaLinter reports
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security

Copy link

codecov bot commented Jan 12, 2025

Codecov Report

Attention: Patch coverage is 96.08939% with 21 lines in your changes missing coverage. Please review.

Project coverage is 94.99%. Comparing base (29fdefa) to head (c973ead).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/safeds/ml/classical/regression/_regressor.py 0.00% 6 Missing ⚠️
.../safeds/ml/classical/classification/_classifier.py 16.66% 5 Missing ⚠️
src/safeds/ml/classical/_supervised_model.py 50.00% 4 Missing ⚠️
src/safeds/ml/nn/_model.py 50.00% 2 Missing ⚠️
src/safeds/_validation/_check_schema_module.py 98.03% 1 Missing ⚠️
...ds/data/tabular/transformation/_one_hot_encoder.py 80.00% 1 Missing ⚠️
src/safeds/ml/metrics/_classification_metrics.py 50.00% 1 Missing ⚠️
src/safeds/ml/metrics/_regression_metrics.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #979      +/-   ##
==========================================
+ Coverage   94.42%   94.99%   +0.57%     
==========================================
  Files         121      123       +2     
  Lines        7459     7696     +237     
==========================================
+ Hits         7043     7311     +268     
+ Misses        416      385      -31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lars-reimann lars-reimann merged commit db85617 into main Jan 12, 2025
12 checks passed
@lars-reimann lars-reimann deleted the 877-improve-tests-for-table branch January 12, 2025 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve tests for Table Improve tests for classes related to typing of tabular data
2 participants