Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas multi index #11

Open
robmarkcole opened this issue Mar 9, 2017 · 4 comments
Open

Pandas multi index #11

robmarkcole opened this issue Mar 9, 2017 · 4 comments

Comments

@robmarkcole
Copy link

robmarkcole commented Mar 9, 2017

Hi I have written a short script (pyEclipseDVH.py) to parse dvh.txt files exported from Eclipse into Pandas multiindex dataframe objects. I have found this to be a very efficient way to work with dvh data from multiple patients, as required for my study. I mention this since it would be a nice addition to dicompyler-core to provide a convenience function to return dvh data in the multiindex dataframe format for these kinds of study.
Example of its use here https://github.com/robmarkcole/Useful-python-for-medical-physics/blob/master/Experiments%20in%20ipython%20notebooks/pyEclipseDVH/MultiIndex%203-3-17/Demo%20pyEclipseDVH_v2%203-3-2017.ipynb
Cheers


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@bastula
Copy link
Member

bastula commented Mar 9, 2017

What a coincidence, I was just thinking about this last night. I'll definitely take a look at your code and see if we can roll it in.

Another question, what do you think is the best format for storing DVH data on disk from multiple sources (i.e. Eclipse txt, DICOM, Raystation DVH). Should it be CSV, Feather (not recommended for long term storage: https://github.com/wesm/feather) Numpy arrays to disk or others?
See: http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization for an overview of options.

@robmarkcole
Copy link
Author

I initially investigated pandas dataframe panels, but a feature of multi-index that I prefer is that it is a flat structure which can be written to csv. There is a bit of an issue if you start merging multiindex dataframes with different indexes, which can result in lots of NaN. Therefore in my DVH import function I interpolate the DVH data to put them all on a common index.

I usually just write data to .csv and this has been fine to files containing data of up to 20 patients. Perhaps these files get unstable at much larger sizes and another format would be preferable, but I haven't had a need to investigate that. I like .csv since it is easy to review the data in excel and to share with colleagues. I have used pickle for storing numpy data in the past.

@bastula
Copy link
Member

bastula commented Mar 9, 2017 via email

@robmarkcole
Copy link
Author

I think a nice way to go would be to have a function within dicompyler to export DVH as .csv in a common format regardless of whether the data was loaded from .dcm, .txt etc. That format should be a flat .csv which when loaded into pandas comes in as a multi-index. The user could choose whether to export the original data or interpolated onto a common index. Im happy for you to use my code for parsing the Eclipse DVH.txt files if thats functionality that doesn't exist.
Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants