Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor evaluation metrics to output Legolas table rows #45

Merged
merged 24 commits into from
Mar 18, 2022

Conversation

ericphanson
Copy link
Member

@ericphanson ericphanson commented Dec 2, 2021

I wanted something like this for an internal project but ended up just processing the output myself. But maybe something like this would be useful here.

The idea is to return a Legolas.@row("lighthouse.evaluation@1",...) row instead of a Dict from evaluation_metrics. Then if you run Lighthouse evaluation on e.g. several models with the same dataset or one model with several datasets, you can easily collect the results as rows in a structured table.

See also #42 / #41 for other comparison stuff.

Status: I think the tests are failing and this started to feel like a waste of time so I switched focus, but could be picked up.

@hannahilea
Copy link
Contributor

Great---this would be a huge improvement! As discussed in person, we'll try to get to this at some point in the not too distant future, and in the interim count not having this implementation yet as tech debt.

@hannahilea hannahilea self-assigned this Mar 15, 2022
@hannahilea
Copy link
Contributor

working on this now

@ericphanson ericphanson mentioned this pull request Mar 15, 2022
2 tasks
@hannahilea
Copy link
Contributor

hannahilea commented Mar 17, 2022

@@ -1,10 +1,12 @@
name = "Lighthouse"
uuid = "ac2c24cd-07f0-4848-96b2-1b82c3ea0e59"
authors = ["Beacon Biosignals, Inc."]
version = "0.13.4"
version = "0.14.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping the version b/c we've changed the expected output for "failure" cases from missing to NaN for any metrics output types that show up in vectors (to aid in Arrow serialization).

TensorBoardLogger = "0.1"
julia = "1.5"
julia = "1.6"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're making a breaking version bump, taking the opportunity to additionally bump the lowest supported Julia version.

@@ -18,6 +20,9 @@ export confusion_matrix, accuracy, binary_statistics, cohens_kappa, calibration_
include("classifier.jl")
export AbstractClassifier

include("row.jl")
# TODO: export EvaluationRow ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we export EvaluationRow?

Suggested change
# TODO: export EvaluationRow ?
export EvaluationRow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, because it’s needed to fix Arrow serialization issues - maybe we should add a comment about that too. Since normally it's not important to pass through the Row type after reading from Arrow, but in this case it is, since we're using it to work around serialization issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, although I'm not sure I understand the rationale?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so normally, if you do

table = Legolas.read(path_to_table)

then you get a table that serves your needs; you don't need to do any further processing to it (though you could pass to a DataFrame for convenience, for example). But for a table of evaluation rows, your confusion matrices will be vectors in this case! So to get a "good" table, you should do

table = (EvaluationRow(row) for row in Tables.rows(Legolas.read(path_to_table)))

Then you've got a (row-oriented) table whose confusion matrix column has actual matrices. Then you can pass that to a DataFrame if you want.

Usually you don't need to pass through the Row type, but since we're using it to fix stuff, you do in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha! Thanks for the explanation.

(isnothing(votes) || size(votes, 2) < 2) && return nothing # no votes given or only one expert
# no votes given or only one expert:
(isnothing(votes) || size(votes, 2) < 2) &&
return (; per_class_IRA_kappas=missing, multiclass_IRA_kappas=missing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, still okay to return missing not NaN b/c we aren't serializing these as part of a vector---the schema accounts for them

@@ -303,11 +326,9 @@ end

# Test NaN spearman due to unranked input
votes = [1; 2; 2]
predicted_soft = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...juliaformatter got hold of these matrices, and I didn't have the heart to undo the updates.

@hannahilea hannahilea marked this pull request as ready for review March 18, 2022 14:50
@hannahilea
Copy link
Contributor

@ericphanson, I can't officially request your review here (since you were the originating author!), but want to take a review anyway?

Copy link
Member Author

@ericphanson ericphanson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very excited for this! I think this will make things much nicer. (Cannot approve bc it's technically my own PR, but ✅ )

@@ -18,6 +20,9 @@ export confusion_matrix, accuracy, binary_statistics, cohens_kappa, calibration_
include("classifier.jl")
export AbstractClassifier

include("row.jl")
# TODO: export EvaluationRow ?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, because it’s needed to fix Arrow serialization issues - maybe we should add a comment about that too. Since normally it's not important to pass through the Row type after reading from Arrow, but in this case it is, since we're using it to work around serialization issues.

Project.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@hannahilea hannahilea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants