-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor evaluation metrics to output Legolas table rows #45
Conversation
Great---this would be a huge improvement! As discussed in person, we'll try to get to this at some point in the not too distant future, and in the interim count not having this implementation yet as tech debt. |
working on this now |
|
@@ -1,10 +1,12 @@ | |||
name = "Lighthouse" | |||
uuid = "ac2c24cd-07f0-4848-96b2-1b82c3ea0e59" | |||
authors = ["Beacon Biosignals, Inc."] | |||
version = "0.13.4" | |||
version = "0.14.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bumping the version b/c we've changed the expected output for "failure" cases from missing
to NaN
for any metrics output types that show up in vectors (to aid in Arrow serialization).
TensorBoardLogger = "0.1" | ||
julia = "1.5" | ||
julia = "1.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're making a breaking version bump, taking the opportunity to additionally bump the lowest supported Julia version.
src/Lighthouse.jl
Outdated
@@ -18,6 +20,9 @@ export confusion_matrix, accuracy, binary_statistics, cohens_kappa, calibration_ | |||
include("classifier.jl") | |||
export AbstractClassifier | |||
|
|||
include("row.jl") | |||
# TODO: export EvaluationRow ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we export EvaluationRow?
# TODO: export EvaluationRow ? | |
export EvaluationRow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, because it’s needed to fix Arrow serialization issues - maybe we should add a comment about that too. Since normally it's not important to pass through the Row
type after reading from Arrow, but in this case it is, since we're using it to work around serialization issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, although I'm not sure I understand the rationale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so normally, if you do
table = Legolas.read(path_to_table)
then you get a table
that serves your needs; you don't need to do any further processing to it (though you could pass to a DataFrame
for convenience, for example). But for a table of evaluation rows, your confusion matrices will be vectors in this case! So to get a "good" table, you should do
table = (EvaluationRow(row) for row in Tables.rows(Legolas.read(path_to_table)))
Then you've got a (row-oriented) table whose confusion matrix column has actual matrices. Then you can pass that to a DataFrame
if you want.
Usually you don't need to pass through the Row
type, but since we're using it to fix stuff, you do in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha! Thanks for the explanation.
(isnothing(votes) || size(votes, 2) < 2) && return nothing # no votes given or only one expert | ||
# no votes given or only one expert: | ||
(isnothing(votes) || size(votes, 2) < 2) && | ||
return (; per_class_IRA_kappas=missing, multiclass_IRA_kappas=missing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case, still okay to return missing
not NaN
b/c we aren't serializing these as part of a vector---the schema accounts for them
@@ -303,11 +326,9 @@ end | |||
|
|||
# Test NaN spearman due to unranked input | |||
votes = [1; 2; 2] | |||
predicted_soft = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...juliaformatter got hold of these matrices, and I didn't have the heart to undo the updates.
@ericphanson, I can't officially request your review here (since you were the originating author!), but want to take a review anyway? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very excited for this! I think this will make things much nicer. (Cannot approve bc it's technically my own PR, but ✅ )
src/Lighthouse.jl
Outdated
@@ -18,6 +20,9 @@ export confusion_matrix, accuracy, binary_statistics, cohens_kappa, calibration_ | |||
include("classifier.jl") | |||
export AbstractClassifier | |||
|
|||
include("row.jl") | |||
# TODO: export EvaluationRow ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, because it’s needed to fix Arrow serialization issues - maybe we should add a comment about that too. Since normally it's not important to pass through the Row
type after reading from Arrow, but in this case it is, since we're using it to work around serialization issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted something like this for an internal project but ended up just processing the output myself. But maybe something like this would be useful here.
The idea is to return a
Legolas.@row("lighthouse.evaluation@1",...)
row instead of aDict
fromevaluation_metrics
. Then if you run Lighthouse evaluation on e.g. several models with the same dataset or one model with several datasets, you can easily collect the results as rows in a structured table.See also #42 / #41 for other comparison stuff.
Status: I think the tests are failing and this started to feel like a waste of time so I switched focus, but could be picked up.