-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas multi index #11
Comments
What a coincidence, I was just thinking about this last night. I'll definitely take a look at your code and see if we can roll it in. Another question, what do you think is the best format for storing DVH data on disk from multiple sources (i.e. Eclipse txt, DICOM, Raystation DVH). Should it be CSV, Feather (not recommended for long term storage: https://github.com/wesm/feather) Numpy arrays to disk or others? |
I initially investigated pandas dataframe panels, but a feature of multi-index that I prefer is that it is a flat structure which can be written to csv. There is a bit of an issue if you start merging multiindex dataframes with different indexes, which can result in lots of NaN. Therefore in my DVH import function I interpolate the DVH data to put them all on a common index. I usually just write data to .csv and this has been fine to files containing data of up to 20 patients. Perhaps these files get unstable at much larger sizes and another format would be preferable, but I haven't had a need to investigate that. I like .csv since it is easy to review the data in excel and to share with colleagues. I have used pickle for storing numpy data in the past. |
Yes, I have done something in the past like yours and ended up
interpolating on a common index which made it a lot easier to plot.
Ah, I was thinking to store 1 CSV per patient and do the pd.merge or
pd.concat within Python. So CSV is probably the way to go. Thanks!
…On Thu, Mar 9, 2017 at 12:07 PM, Robin ***@***.***> wrote:
I initially investigated pandas dataframe panels, but a feature of
multi-index that I prefer is that it is a flat structure which can be
written to csv. There is a bit of an issue if you start merging multiindex
dataframes with different indexes, which can result in lots of NaN.
Therefore in my DVH import function I interpolate the DVH data to put them
all on a common index.
I usually just write data to .csv and this has been fine to files
containing data of up to 20 patients. Perhaps these files get unstable at
much larger sizes and another format would be preferable, but I haven't had
a need to investigate that. I like .csv since it is easy to review the data
in excel and to share with colleagues. I have used pickle for storing numpy
data in the past.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADv3gvowxeIBjdQrQDEGaOQCzS51SOhks5rkD_qgaJpZM4MX2gc>
.
|
I think a nice way to go would be to have a function within dicompyler to export DVH as .csv in a common format regardless of whether the data was loaded from .dcm, .txt etc. That format should be a flat .csv which when loaded into pandas comes in as a multi-index. The user could choose whether to export the original data or interpolated onto a common index. Im happy for you to use my code for parsing the Eclipse DVH.txt files if thats functionality that doesn't exist. |
Hi I have written a short script (pyEclipseDVH.py) to parse dvh.txt files exported from Eclipse into Pandas multiindex dataframe objects. I have found this to be a very efficient way to work with dvh data from multiple patients, as required for my study. I mention this since it would be a nice addition to dicompyler-core to provide a convenience function to return dvh data in the multiindex dataframe format for these kinds of study.
Example of its use here https://github.com/robmarkcole/Useful-python-for-medical-physics/blob/master/Experiments%20in%20ipython%20notebooks/pyEclipseDVH/MultiIndex%203-3-17/Demo%20pyEclipseDVH_v2%203-3-2017.ipynb
Cheers
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: