Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize the API #977

Open
9 of 73 tasks
lars-reimann opened this issue Jan 6, 2025 · 0 comments
Open
9 of 73 tasks

Stabilize the API #977

lars-reimann opened this issue Jan 6, 2025 · 0 comments
Assignees
Labels
breaking change ⚡ May break client code documentation 📖 Improvements or additions to documentation testing 🧪 Additional automated tests
Milestone

Comments

@lars-reimann
Copy link
Member

lars-reimann commented Jan 6, 2025

Stabilize the API to prepare for a 1.0.0 release. This includes checking the design, adding missing tests, and adding missing documentation. If a part of the API is not stable yet, it should be marked clearly in the documentation (#654).

Tables

  • Cell
  • Column
  • Row
  • Table
  • DataType -> ColumnType
  • Schema
  • TemporalCell -> DatetimeOperations
  • DurationOperations
  • MathOperations
  • StringCell -> StringOperations

➡️ New release

  • TableTransformer
  • InvertibleTableTransformer
  • FunctionalTableTransformer
  • Discretizer
  • KNearestNeighborsImputer
  • LabelEncoder
  • OneHotEncoder
  • RangeScaler
  • RobustScaler
  • SimpleImputer
  • StandardScaler
  • SequentialTableTransformer

➡️ New release

  • ColumnPlotter
  • TablePlotter

➡️ New release

ML on Tables

  • Dataset
  • TabularDataset
  • SupervisedModel
  • Classifier
  • AdaBoostClassifier
  • BaselineClassifier
  • DecisionTreeClassifier
  • GradientBoostingClassifier
  • KNearestNeighborsClassifier
  • LogisticClassifier
  • RandomForestClassifier
  • SupportVectorClassifier
  • Regressor
  • AdaBoostRegressor
  • BaselineRegressor
  • DecisionTreeRegressor
  • GradientBoostingRegressor
  • KNearestNeighborsRegressor
  • LinearRegressor
  • RandomForestRegressor
  • SupportVectorRegressor
  • Choice (self.elements should be a property)
  • ClassificationMetrics
  • ClassifierMetrics (confusing to have both)
  • RegressionMetrics
  • RegressorMetrics (confusing to have both)

Other

  • All exceptions

Everything that follows could be marked as experimental, and we could proceed with a 1.0.0 release.

Images

  • Image
  • ImageList
  • ImageSize
  • ImageDataset
  • ConstantImageSize
  • ModelImageSize
  • VariableImageSize

Time Series

  • TimeSeriesDataset
  • ArimaModelRegressor (should be separated from the rest)

NNs

  • NeuralNetworkClassifier
  • NeuralNetworkRegressor (should be combined with NeuralNetworkClassifier)
  • InputConversion + subclasses (should be hidden)
  • AveragePooling2DLayer
  • Convolutional2DLayer
  • ConvolutionalTranspose2DLayer
  • DropoutLayer
  • FlattenLayer
  • ForwardLayer
  • GRULayer
  • LSTMLayer
  • Layer (maybe add new subclasses RecurrentLayer, ConvolutionalLayer?)
  • MaxPooling2DLayer
@github-project-automation github-project-automation bot moved this to Backlog in Library Jan 6, 2025
@lars-reimann lars-reimann self-assigned this Jan 11, 2025
@lars-reimann lars-reimann moved this from Backlog to In Progress in Library Jan 11, 2025
lars-reimann added a commit that referenced this issue Jan 12, 2025
Closes #875
Closes #877
Closes partially #977

### Summary of Changes

Stabilize the API of the `Table` class. This PR introduces several
breaking changes to this class:

- All optional parameters are now keyword-only, so we can reposition
them later.
- The `data` parameter of `__init__` is now required.
- Rename `remove_columns_except` to `select_columns`
- The new method can also be called with a callback that determines
which columns to select.
- Rename `add_table_as_columns` to `add_tables_as_columns`
  - Multiple tables can now be passed at once.
- Rename `add_table_as_rows` to `add_tables_as_rows`
  - Multiple tables can now be passed at once.

It also adds new functionality throughout the library:

- New method `Table.add_index_column` to add a new column with
auto-incrementing integer values to a table.
- New method `Table.filter_rows` to keep only the rows matched by some
predicate.
- New method `Table.filter_rows_by_column` to keep only the rows that
have a value in a specific column that matches some predicate.
- New parameter `random_seed` for `Table.shuffle_rows` and
`Table.split_rows` to control the pseudorandom number generator.
Previously, the methods were deterministic, but the seed was hidden.
- New parameter `missing_value_ratio_threshold` of
`Table.remove_columns_with_missing_values` to be able to keep columns
with only a few missing values.
- Various static factory methods under `ColumnType` to instantiate
column types. This prepares for #754.

Finally, the methods `Table.summarize_statistics` and
`Column.summarize_statistics` are now considerably faster.

---------

Co-authored-by: megalinter-bot <129584137+megalinter-bot@users.noreply.github.com>
lars-reimann added a commit that referenced this issue Jan 13, 2025
Closes partially #977

### Summary of Changes

- Narrow types of parameters and results for better type checking.
- Improve tests and documentation.
lars-reimann added a commit that referenced this issue Jan 14, 2025
Closes partially #754
Closes partially #977

### Summary of Changes

- Improve documentation for all methods of `Column`.
- Add the option to specify the column type when calling the
constructor. If omitted, it is still inferred from the data.
@lars-reimann lars-reimann added this to the v1.0.0 milestone Jan 15, 2025
@lars-reimann lars-reimann added enhancement 💡 breaking change ⚡ May break client code documentation 📖 Improvements or additions to documentation testing 🧪 Additional automated tests labels Jan 15, 2025
lars-reimann added a commit that referenced this issue Jan 15, 2025
Closes partially #977

### Summary of Changes

- Add method `Cell.constant` to create a cell with a constant value from
a Python literal
- Add method `Cell.date` to create a cell with a date
- Add method `Cell.time` to create a cell with a time
- Add method `Cell.datetime` to create a cell with a datetime
- Add method `Cell.duration` to create a cell with a duration
- Add method `Cell.cast` to cast a cell to another type
- Improve type hints for cell operations

---------

Co-authored-by: megalinter-bot <129584137+megalinter-bot@users.noreply.github.com>
@lars-reimann lars-reimann changed the title Epic: Stabilize the API Stabilize the API Jan 16, 2025
lars-reimann added a commit that referenced this issue Jan 19, 2025
Closes partially #977

### Summary of Changes

- Operations are now wrapped into the classes
  - `DatetimeOperations` (formerly `TemporalCell`)
  - `DurationOperations`
  - `MathOperations`
  - `StringOperations` (formerly `StringCell`)
- Stabilize operations on datetime/date/time
- Stabilize operations on durations

---------

Co-authored-by: megalinter-bot <129584137+megalinter-bot@users.noreply.github.com>
lars-reimann added a commit that referenced this issue Jan 20, 2025
Closes partially #977

### Summary of Changes

Add many common mathematical operations (e.g. square root, logarithm,
sine) to the `Cell.math` namespace.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change ⚡ May break client code documentation 📖 Improvements or additions to documentation testing 🧪 Additional automated tests
Projects
Status: In Progress
Development

No branches or pull requests

1 participant