Pandas multi index #11

robmarkcole · 2017-03-09T09:43:09Z

Hi I have written a short script (pyEclipseDVH.py) to parse dvh.txt files exported from Eclipse into Pandas multiindex dataframe objects. I have found this to be a very efficient way to work with dvh data from multiple patients, as required for my study. I mention this since it would be a nice addition to dicompyler-core to provide a convenience function to return dvh data in the multiindex dataframe format for these kinds of study.
Example of its use here https://github.com/robmarkcole/Useful-python-for-medical-physics/blob/master/Experiments%20in%20ipython%20notebooks/pyEclipseDVH/MultiIndex%203-3-17/Demo%20pyEclipseDVH_v2%203-3-2017.ipynb
Cheers

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

bastula · 2017-03-09T17:42:44Z

What a coincidence, I was just thinking about this last night. I'll definitely take a look at your code and see if we can roll it in.

Another question, what do you think is the best format for storing DVH data on disk from multiple sources (i.e. Eclipse txt, DICOM, Raystation DVH). Should it be CSV, Feather (not recommended for long term storage: https://github.com/wesm/feather) Numpy arrays to disk or others?
See: http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization for an overview of options.

robmarkcole · 2017-03-09T18:07:38Z

I initially investigated pandas dataframe panels, but a feature of multi-index that I prefer is that it is a flat structure which can be written to csv. There is a bit of an issue if you start merging multiindex dataframes with different indexes, which can result in lots of NaN. Therefore in my DVH import function I interpolate the DVH data to put them all on a common index.

I usually just write data to .csv and this has been fine to files containing data of up to 20 patients. Perhaps these files get unstable at much larger sizes and another format would be preferable, but I haven't had a need to investigate that. I like .csv since it is easy to review the data in excel and to share with colleagues. I have used pickle for storing numpy data in the past.

bastula · 2017-03-09T18:14:10Z

Yes, I have done something in the past like yours and ended up interpolating on a common index which made it a lot easier to plot. Ah, I was thinking to store 1 CSV per patient and do the pd.merge or pd.concat within Python. So CSV is probably the way to go. Thanks!

…

On Thu, Mar 9, 2017 at 12:07 PM, Robin ***@***.***> wrote: I initially investigated pandas dataframe panels, but a feature of multi-index that I prefer is that it is a flat structure which can be written to csv. There is a bit of an issue if you start merging multiindex dataframes with different indexes, which can result in lots of NaN. Therefore in my DVH import function I interpolate the DVH data to put them all on a common index. I usually just write data to .csv and this has been fine to files containing data of up to 20 patients. Perhaps these files get unstable at much larger sizes and another format would be preferable, but I haven't had a need to investigate that. I like .csv since it is easy to review the data in excel and to share with colleagues. I have used pickle for storing numpy data in the past. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADv3gvowxeIBjdQrQDEGaOQCzS51SOhks5rkD_qgaJpZM4MX2gc> .

robmarkcole · 2017-03-13T14:21:58Z

I think a nice way to go would be to have a function within dicompyler to export DVH as .csv in a common format regardless of whether the data was loaded from .dcm, .txt etc. That format should be a flat .csv which when loaded into pandas comes in as a multi-index. The user could choose whether to export the original data or interpolated onto a common index. Im happy for you to use my code for parsing the Eclipse DVH.txt files if thats functionality that doesn't exist.
Cheers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas multi index #11

Pandas multi index #11

robmarkcole commented Mar 9, 2017 •

edited by bastula

Loading

bastula commented Mar 9, 2017

robmarkcole commented Mar 9, 2017

bastula commented Mar 9, 2017 via email

robmarkcole commented Mar 13, 2017

Pandas multi index #11

Pandas multi index #11

Comments

robmarkcole commented Mar 9, 2017 • edited by bastula Loading

bastula commented Mar 9, 2017

robmarkcole commented Mar 9, 2017

bastula commented Mar 9, 2017 via email

robmarkcole commented Mar 13, 2017

robmarkcole commented Mar 9, 2017 •

edited by bastula

Loading