Skip to content

Commit

Permalink
CU-8693htp2c: Add note to splitter regarding narrow scope
Browse files Browse the repository at this point in the history
  • Loading branch information
mart-r committed Jan 18, 2024
1 parent a451f50 commit dff513d
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions medcat/2_train_model/1_unsupervised_training/splitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,17 @@ def split(self, in_file: str):


def split_file(in_file: str, nr_of_lines: int, out_file_format: str) -> None:
"""Splits a file into multiple files of the specified number of lines (or close to it).
PS! This splitting is currently only designed for a narrow type of CSV files.
This was created to split the MIMIC-III notes into parts. It may work with
later MIMIC releases but is unlikely to work for other datasets.
Args:
in_file (str): _description_
nr_of_lines (int): _description_
out_file_format (str): _description_
"""
opts = SplitOptions(lines_at_a_time=nr_of_lines,
out_file_format=out_file_format)
split_identifier = SplitIdentifier()
Expand Down

0 comments on commit dff513d

Please sign in to comment.