Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aDNA metadata calculations #374

Open
MattiasSealander opened this issue Sep 30, 2024 · 2 comments
Open

aDNA metadata calculations #374

MattiasSealander opened this issue Sep 30, 2024 · 2 comments
Assignees
Labels
help wanted Extra attention is needed new

Comments

@MattiasSealander
Copy link

Among the data supplied from SciLifeLab there is a count of number of libraries that are connected to a sample. As the data for each library is given in the library data sheet this could be achieved by counting the library IDs for a sample on the fly. Rather than storing it as a value in the DB.

I have contacted SciLifeLab to confirm that we will be getting all libraries for a sample in every case.

Should we be counting this on the fly, or is it better to store it as a value. It is metadata information, rather than a result, although I guess it is something aDNA specialists consider when evaluating the results. Ties into a bigger question of what should be solved by calculating stuff on the fly and what should be stored as values in DB.

For clarification, we will be storing results for samples as well as libraries. Currently, sample results and libraries would be different datasets, unless this needs to be reconsidered for some reason.

@johanvonboer
Copy link
Collaborator

It is an interesting question. I'm not sure what is meant by 'library' in this context, but in general I would say that if the libraries themselves are something that I would need to send to the client, then it makes sense to just store the link (and not the count) between the data/value and the libraries.

A similar example is how we handle feature_types, we don't store that a physical sample has X number of a certain feature_type, instead we store the link (array of feature_type IDs) in the physical_sample and then the definition of each feature_type is also sent to the browser along with the physical_sample data. Then when the website is being rendered the number of feature_types for each sample and their definitions are being looked up in real time through the data that then exists in the browser.

Would it make sense to handle this in a similar way or am I misunderstanding what 'libraries' are here?

@MattiasSealander
Copy link
Author

Yes, it sounds that handling it the same way would work. TJ has answered that we will be getting all libraries (incl. those that didn't work well). A library is similar to a sample, in that you have a bunch of variables with results (some categories overlapping with the sample category). They separate the results into "sample results" and "library results" you could say. And one sample can have multiple libraries. As far as I understand it.

Right now, it seems that separating samples and libraries into different datasets is the best way, potentially using data_type_id to distinguish between the two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed new
Projects
None yet
Development

No branches or pull requests

3 participants