Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are we going to optimize model parameters? #44

Open
matsen opened this issue Oct 26, 2012 · 13 comments
Open

How are we going to optimize model parameters? #44

matsen opened this issue Oct 26, 2012 · 13 comments
Labels

Comments

@matsen
Copy link
Contributor

matsen commented Oct 26, 2012

Connor's option-- do MH moves that impact all particles at once.

Erick's option-- for each particle with some probability, propose a MH move for the rate parameters. This duplicates the particle when accepted.

@koadman
Copy link
Contributor

koadman commented Oct 26, 2012

Aaron's option -- eating chinese with Connor and Erick, quantum-superposition-style. For parameter X keep a fixed number k of global parameter "tables" x_1...x_k, each particle keeps a vector of length k called V that encodes the probability it is sitting at each of the k tables. During the extension step individual particles can propose changes to V. beagle has support in calculateRootLogLikelihoods() to set arbitrary weights for rate categories and state frequency categories, I think allowing particles with shared history to have different V. Updates to V will not require a full re-peel. In between generations we can do moves that change the x_1...x_k. These will affect all particles equally and won't force us to delete shared history among particles. We will have to repeel every node in the shared history forest.

@matsen
Copy link
Contributor Author

matsen commented Oct 26, 2012

Erick's other idea: Chinese musical chairs. There are a fixed set of k
global parameter tables. Every particle is sitting at a table (no quantum
superposition here). When the music stops every x number of generations,
each particle moves to the table that has the highest likelihood for it.
Tables without many particles are composted. To replace them, we sample
from the active tables, adding perturbations.

If we were clever with BEAGLE, would it save time to peel all of the
particles at a given table simultaneously?

On Fri, Oct 26, 2012 at 11:20 AM, Aaron Darling notifications@github.comwrote:

Aaron's option -- eating chinese with Connor and Erick,
quantum-superposition-style. For parameter X keep a fixed number k of
global parameter "tables" x_1...x_k, each particle keeps a vector of length
k called V that encodes the probability it is sitting at each of the k
tables. During the extension step individual particles can propose changes
to V. beagle has support in calculateRootLogLikelihoods() to set arbitrary
weights for rate categories and state frequency categories, I think
allowing particles with shared history to have different V. In between
generations we can do moves that change the x_1...x_k. These will affect
all particles equally and won't force us to delete shared history among
particles.


Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-9822190.

Frederick "Erick" Matsen, Assistant Member
Fred Hutchinson Cancer Research Center
http://matsen.fhcrc.org/

@koadman
Copy link
Contributor

koadman commented Oct 26, 2012

One thing I'm not sure about is what a "global" MH move looks like. Normally MH is the ratio of probs for current and proposed state for a single state. What would it look like for a population of states? Product of that ratio over all states?

@koadman
Copy link
Contributor

koadman commented Oct 26, 2012

In Chinese musical chairs, when two particles with shared history move to different tables, would that require us to duplicate their shared history?

@matsen
Copy link
Contributor Author

matsen commented Oct 26, 2012

Yes. Which motivates quantum Chinese musical chairs?

On Fri, Oct 26, 2012 at 12:32 PM, Aaron Darling notifications@github.comwrote:

In Chinese musical chairs, when two particles with shared history move to
different tables, would that require us to duplicate their shared history?


Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-9824376.

Frederick "Erick" Matsen, Assistant Member
Fred Hutchinson Cancer Research Center
http://matsen.fhcrc.org/

@koadman
Copy link
Contributor

koadman commented Oct 26, 2012

Not necessarily, quantum Chinese demands calculating likes for all tables for each node in the history all the time -- which might be more work.

@koadman
Copy link
Contributor

koadman commented Dec 10, 2012

Thinking on this thread again. One further possibility occurred to me, though I'm not sure whether it's a good idea (feedback pls!): at each generation it might be possible to make an ML estimate of the gamma (or another continuous) parameters. This would work by doing something like Brent optimization, similar to what happens in FastTree. The ML parameter estimates could be used for all particles at the next generation. The result would be a sort of SMC-within-EM algorithm, with EM for continuous parameters. Possible problems include early generations being too uninformative for continuous parameter estimation and local maxima traps, and no estimates of uncertainty. The potential advantage is greater efficiency by preserving shared particle history. This approach may be quite similar to what Connor was suggesting, but without the MH steps?

@matsen
Copy link
Contributor Author

matsen commented Dec 11, 2012

What bothers me about this is that all of the particles are constrained to
have the same parameters. I can imagine a restaurant-style extension though
as a revision to this if it seems promising though.

On Mon, Dec 10, 2012 at 3:24 PM, Aaron Darling notifications@github.comwrote:

Thinking on this thread again. One further possibility occurred to me,
though I'm not sure whether it's a good idea (feedback pls!): at each
generation it might be possible to make an ML estimate of the gamma (or
another continuous) parameters. This would work by doing something like
Brent optimization, similar to what happens in FastTree. The ML parameter
estimates could be used for all particles at the next generation. The
result would be a sort of SMC-within-EM algorithm, with EM for continuous
parameters. Possible problems include early generations being too
uninformative for continuous parameter estimation and local maxima traps,
and no estimates of uncertainty. The potential advantage is greater
efficiency by preserving shared particle history. This approach may be
quite similar to what Connor was suggesting, but without the MH steps?


Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-11223725.

Frederick "Erick" Matsen, Assistant Member
Fred Hutchinson Cancer Research Center
http://matsen.fhcrc.org/

@koadman
Copy link
Contributor

koadman commented Dec 11, 2012

yet another idea: a hierarchical prior on rates.
each branch would have its own set of discrete gamma rates, with parameters drawn from some prior distribution. This might reduce the total amount of peeling required because particles at earlier generations could keep their own gamma rate parameters. but would this have a downside that poor estimates of rates in early generations get fixed and can not be improved at later generations? maybe some mcmc moves would resolve that?

@koadman
Copy link
Contributor

koadman commented Dec 11, 2012

^--------- this also requires computing lots of finite time transition matrices with beagle, unless we do it chinese restaurant style.

@matsen
Copy link
Contributor Author

matsen commented Dec 11, 2012

Er, this idea scares me a little because:

  • the estimates of rate parameters at the beginning will have lots of variance
  • will we really report a branch-wise rate parameter for each branch at the end?

@koadman
Copy link
Contributor

koadman commented Dec 11, 2012

Wouldn't the degree of variance in early particles depend on how tight the prior is? If the prior is tight then it seems like it leaves less room for rate parameters to stray. Of course, next question is how do you set a tight prior? Maybe hyperpriors would help?

I agree that reporting branch-wise rate parameters would not be terribly helpful, but in a hierarchical Bayes model maybe these could be summarized over all branches to a posterior on the prior's parameters. That would be a single global rate estimate.

@matsen
Copy link
Contributor Author

matsen commented Dec 11, 2012

Right, I don't know how to set a tight prior without setting the wrong
tight prior!

On Mon, Dec 10, 2012 at 4:54 PM, Aaron Darling notifications@github.comwrote:

Wouldn't the degree of variance in early particles depend on how tight the
prior is? If the prior is tight then it seems like it leaves less room for
rate parameters to stray. Of course, next question is how do you set a
tight prior? Maybe hyperpriors would help?

I agree that reporting branch-wise rate parameters would not be terribly
helpful, but in a hierarchical Bayes model maybe these could be summarized
over all branches to a posterior on the prior's parameters. That would be a
single global rate estimate.


Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-11226391.

Frederick "Erick" Matsen, Assistant Member
Fred Hutchinson Cancer Research Center
http://matsen.fhcrc.org/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants