Skip to content
This repository has been archived by the owner on Feb 22, 2020. It is now read-only.

Add Training Process for Nodule Detection and Classification - added customized datasets #300

Closed
wants to merge 11 commits into from

Conversation

swarm-ai
Copy link
Contributor

Description

Using the documented process in the Training/Readme, a developer can prepare custom datasets from radiologists who have annotated series of CT scans. The data should have lesion box annotations in a .csv file using the format specified. An exmaple using a CT scan data set from a Taiwan-based clinic is included. The data should also have labels for cancer/non-cancer as well.

Reference to official issue

Issue #130
Issue #131

Motivation and Context

The motivation is to increase the available training examples so that the concept-to-clinic classifier can handle complex lung cancer cases besides those in the Luna and LIDC data sets. We have seen improved model accuracy with a preliminary run using additional data sets. A new model is currently being trained and is on epoch 80 now

How Has This Been Tested?

We have run the training process using Luna, LIDC, and NSCLC-Radiomics Data sets. The NSCLC-Radiomics data set contains 422 cases of lung cancer type non-small cell lung cancer. We label these data sets with lesion location information and cancer/non-cancer labels using the software Horos. We then import this data for training in concurrence with the Luna16 and LIDC data sets. Here is a reference link to download the data sets: http://www.cibl-harvard.org/data

CLA

  • I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

… the grt algorithm with improvements for handling custom data.

Using the documented process in the Readme, a developer can prepare custom datasets from radiologists who have annotated
series of CT scans. The data should have lesion box annotations in a .csv file using the format specified. An exmaple
using a CT scan data set from a Taiwan-based clinic is included. The data should also have labels for cancer/non-cancer
as well.
def load_scan(dirpath):
print('loading scan %s' % dirpath)

if dirpath.startswith('s3://'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you see urlparse ooi?

@@ -0,0 +1,41 @@
function AddSegmentation(SegmentDataFolder, FolderDelimiter, BatchSize, ParFor_flag, IgnoreExisting_flag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for using another language here than Python?

return bw


def all_slice_analysis(bw, spacing, cut_num=0, vol_limit=[0.68, 8.2], area_th=6e3, dist_th=62):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a few docstrings so it's easier to grasp what the functions are expecting and doing? :)

end_time = time.time()

print('elapsed time is %3.2f seconds' % (end_time - start_time))
print
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can achieve the same by appending two '\n' to the previous print statement :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, that last print statement will error in py3 since print is a function.

@WGierke
Copy link
Contributor

WGierke commented Jan 26, 2018

Would you mind converting your code to comply with PEP8? There are a few things that need to be fixed according to flake8 and pycodestyle :)

@lamby
Copy link
Contributor

lamby commented Jan 26, 2018

Thanks for the review @WGierke :)

@WGierke
Copy link
Contributor

WGierke commented Jan 27, 2018

All data are resized to 1x1x1 mm, the luminance is clipped between -1200 and 600, scaled to 0-255 and converted to uint8. A mask that include the lungs is calculated, luminance of every pixel outside the mask is set to 170. The results will be stored in 'preprocess_result_path' defined in config_training.py along with their corresponding detection labels.

I think we already have that preprocessing steps. Converting the data to voxels, clipping the Hounsfield units that are soft tissue and rescaling the image is a very common practice among the top solutions. Could you have a look at lung_segmentation.py and improved_lung_segmentation.py? There already is lots of logic that might be useful for the steps you defined I think :)

@isms
Copy link
Contributor

isms commented Jan 28, 2018

@swarm-ai We'll need quite a bit more context for the PR. This is a big PR with very little reference to any of the pieces of the existing project.

The data should have lesion box annotations

What are these? Where do they come from? Could they be expected to come from new CT imagery without hand labeling?

An exmaple using a CT scan data set from a Taiwan-based clinic is included.

This is extremely interesting, but is hard to envision how to integrate this when it comes right before the end of the last phase.

@isms
Copy link
Contributor

isms commented Jan 29, 2018

We've discussed internally, and have concluded that both of the following points are true:

  1. There is some really interesting and potentially helpful stuff in this PR.
  2. We can't accept the PR as-is and there is no apparent roadmap to acceptance.

We're going to close the PR but we encourage community members to use this as a resource to help inform model training and potentially other pieces of the application. The submission will be recognized for this aspect of contribution under the "Community" heading.

@isms isms closed this Jan 29, 2018
@swarm-ai
Copy link
Contributor Author

Hi @isms Can you give me 1-2 days to work on resolving these issues and only just saw these comments?

@isms
Copy link
Contributor

isms commented Jan 29, 2018

@swarm-ai You are more than welcome to keep working on the PR if you'd like but at this point it won't result in additional points. Feel free to email us directly if you have questions or concerns.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants