How are we going to optimize model parameters? #44

matsen · 2012-10-26T17:46:46Z

Connor's option-- do MH moves that impact all particles at once.

Erick's option-- for each particle with some probability, propose a MH move for the rate parameters. This duplicates the particle when accepted.

koadman · 2012-10-26T18:20:01Z

Aaron's option -- eating chinese with Connor and Erick, quantum-superposition-style. For parameter X keep a fixed number k of global parameter "tables" x_1...x_k, each particle keeps a vector of length k called V that encodes the probability it is sitting at each of the k tables. During the extension step individual particles can propose changes to V. beagle has support in calculateRootLogLikelihoods() to set arbitrary weights for rate categories and state frequency categories, I think allowing particles with shared history to have different V. Updates to V will not require a full re-peel. In between generations we can do moves that change the x_1...x_k. These will affect all particles equally and won't force us to delete shared history among particles. We will have to repeel every node in the shared history forest.

matsen · 2012-10-26T19:23:57Z

Erick's other idea: Chinese musical chairs. There are a fixed set of k
global parameter tables. Every particle is sitting at a table (no quantum
superposition here). When the music stops every x number of generations,
each particle moves to the table that has the highest likelihood for it.
Tables without many particles are composted. To replace them, we sample
from the active tables, adding perturbations.

If we were clever with BEAGLE, would it save time to peel all of the
particles at a given table simultaneously?

On Fri, Oct 26, 2012 at 11:20 AM, Aaron Darling notifications@github.comwrote:

Aaron's option -- eating chinese with Connor and Erick,
quantum-superposition-style. For parameter X keep a fixed number k of
global parameter "tables" x_1...x_k, each particle keeps a vector of length
k called V that encodes the probability it is sitting at each of the k
tables. During the extension step individual particles can propose changes
to V. beagle has support in calculateRootLogLikelihoods() to set arbitrary
weights for rate categories and state frequency categories, I think
allowing particles with shared history to have different V. In between
generations we can do moves that change the x_1...x_k. These will affect
all particles equally and won't force us to delete shared history among
particles.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-9822190.

Frederick "Erick" Matsen, Assistant Member
Fred Hutchinson Cancer Research Center
http://matsen.fhcrc.org/

koadman · 2012-10-26T19:27:02Z

One thing I'm not sure about is what a "global" MH move looks like. Normally MH is the ratio of probs for current and proposed state for a single state. What would it look like for a population of states? Product of that ratio over all states?

koadman · 2012-10-26T19:32:21Z

In Chinese musical chairs, when two particles with shared history move to different tables, would that require us to duplicate their shared history?

matsen · 2012-10-26T19:34:24Z

Yes. Which motivates quantum Chinese musical chairs?

On Fri, Oct 26, 2012 at 12:32 PM, Aaron Darling notifications@github.comwrote:

In Chinese musical chairs, when two particles with shared history move to
different tables, would that require us to duplicate their shared history?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-9824376.

Frederick "Erick" Matsen, Assistant Member
Fred Hutchinson Cancer Research Center
http://matsen.fhcrc.org/

koadman · 2012-10-26T21:58:09Z

Not necessarily, quantum Chinese demands calculating likes for all tables for each node in the history all the time -- which might be more work.

koadman · 2012-12-10T23:24:45Z

Thinking on this thread again. One further possibility occurred to me, though I'm not sure whether it's a good idea (feedback pls!): at each generation it might be possible to make an ML estimate of the gamma (or another continuous) parameters. This would work by doing something like Brent optimization, similar to what happens in FastTree. The ML parameter estimates could be used for all particles at the next generation. The result would be a sort of SMC-within-EM algorithm, with EM for continuous parameters. Possible problems include early generations being too uninformative for continuous parameter estimation and local maxima traps, and no estimates of uncertainty. The potential advantage is greater efficiency by preserving shared particle history. This approach may be quite similar to what Connor was suggesting, but without the MH steps?

matsen · 2012-12-11T00:25:19Z

What bothers me about this is that all of the particles are constrained to
have the same parameters. I can imagine a restaurant-style extension though
as a revision to this if it seems promising though.

On Mon, Dec 10, 2012 at 3:24 PM, Aaron Darling notifications@github.comwrote:

Thinking on this thread again. One further possibility occurred to me,
though I'm not sure whether it's a good idea (feedback pls!): at each
generation it might be possible to make an ML estimate of the gamma (or
another continuous) parameters. This would work by doing something like
Brent optimization, similar to what happens in FastTree. The ML parameter
estimates could be used for all particles at the next generation. The
result would be a sort of SMC-within-EM algorithm, with EM for continuous
parameters. Possible problems include early generations being too
uninformative for continuous parameter estimation and local maxima traps,
and no estimates of uncertainty. The potential advantage is greater
efficiency by preserving shared particle history. This approach may be
quite similar to what Connor was suggesting, but without the MH steps?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-11223725.

Frederick "Erick" Matsen, Assistant Member
Fred Hutchinson Cancer Research Center
http://matsen.fhcrc.org/

koadman · 2012-12-11T00:27:00Z

yet another idea: a hierarchical prior on rates.
each branch would have its own set of discrete gamma rates, with parameters drawn from some prior distribution. This might reduce the total amount of peeling required because particles at earlier generations could keep their own gamma rate parameters. but would this have a downside that poor estimates of rates in early generations get fixed and can not be improved at later generations? maybe some mcmc moves would resolve that?

koadman · 2012-12-11T00:28:44Z

^--------- this also requires computing lots of finite time transition matrices with beagle, unless we do it chinese restaurant style.

matsen · 2012-12-11T00:34:44Z

Er, this idea scares me a little because:

the estimates of rate parameters at the beginning will have lots of variance
will we really report a branch-wise rate parameter for each branch at the end?

koadman · 2012-12-11T00:54:32Z

Wouldn't the degree of variance in early particles depend on how tight the prior is? If the prior is tight then it seems like it leaves less room for rate parameters to stray. Of course, next question is how do you set a tight prior? Maybe hyperpriors would help?

I agree that reporting branch-wise rate parameters would not be terribly helpful, but in a hierarchical Bayes model maybe these could be summarized over all branches to a posterior on the prior's parameters. That would be a single global rate estimate.

matsen · 2012-12-11T01:39:36Z

Right, I don't know how to set a tight prior without setting the wrong
tight prior!

On Mon, Dec 10, 2012 at 4:54 PM, Aaron Darling notifications@github.comwrote:

Wouldn't the degree of variance in early particles depend on how tight the
prior is? If the prior is tight then it seems like it leaves less room for
rate parameters to stray. Of course, next question is how do you set a
tight prior? Maybe hyperpriors would help?

I agree that reporting branch-wise rate parameters would not be terribly
helpful, but in a hierarchical Bayes model maybe these could be summarized
over all branches to a posterior on the prior's parameters. That would be a
single global rate estimate.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-11226391.

Frederick "Erick" Matsen, Assistant Member
Fred Hutchinson Cancer Research Center
http://matsen.fhcrc.org/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are we going to optimize model parameters? #44

How are we going to optimize model parameters? #44

matsen commented Oct 26, 2012

koadman commented Oct 26, 2012

matsen commented Oct 26, 2012

koadman commented Oct 26, 2012

koadman commented Oct 26, 2012

matsen commented Oct 26, 2012

koadman commented Oct 26, 2012

koadman commented Dec 10, 2012

matsen commented Dec 11, 2012

koadman commented Dec 11, 2012

koadman commented Dec 11, 2012

matsen commented Dec 11, 2012

koadman commented Dec 11, 2012

matsen commented Dec 11, 2012

How are we going to optimize model parameters? #44

How are we going to optimize model parameters? #44

Comments

matsen commented Oct 26, 2012

koadman commented Oct 26, 2012

matsen commented Oct 26, 2012

koadman commented Oct 26, 2012

koadman commented Oct 26, 2012

matsen commented Oct 26, 2012

koadman commented Oct 26, 2012

koadman commented Dec 10, 2012

matsen commented Dec 11, 2012

koadman commented Dec 11, 2012

koadman commented Dec 11, 2012

matsen commented Dec 11, 2012

koadman commented Dec 11, 2012

matsen commented Dec 11, 2012