Hidden Markov analysis of ion channels

From QuB

Jump to: navigation, search

Contents

Sunglasses on the Macro-Macromolecule

Suppose I like to sleep in class, but only when I'm wearing sunglasses. In this model I have three states:

  1. awake without glasses
  2. awake with glasses on
  3. asleep (with glasses).

There are direct transitions between 1 and 2, and 2 and 3, but not 1 and 3. You can tell if my sunglasses are on, but not if I'm asleep, so we say my state is hidden. We'll color state 1 black, and states 2 and 3 red. We call the color of a state its class.


Suppose also that I behave like a molecule -- that is, when I'm not wearing sunglasses there's a certain probability I'll put on my glasses, regardless of how long it's been since I last put them on. We'll express that probability as a mass action rate constant; it happens so many times per minute. Each transition has its own rate constant. For example, there are 1/4 transitions per minute from 1 to 2, so I put my sunglasses on after 4 minutes, on average.


It turns out you can estimate how much time I spend asleep (in state 3) just by keeping track of when I put on and take off the sunglasses.

This is our approach to single ion channels. A channel can have several conducting and several non-conducting states, usually called open and closed, and colored red and black. We record the sequence of open and closed "dwells" with their duration, and use QuB to find the most plausible model and the most likely rate constants.

Some rate constants could be sensitive to changes in voltage, or concentration of a drug. Maybe if I've had some coffee, instead of putting on my sunglasses for a nap, I start drumming my fingers on the desk. Let's call that state 4, connected to state 1, also black, for no sunglasses. The rate constant from 1 to 4 depends on the concentration of coffee in my blood. Here's the full formula for rate constants, including drug and voltage dependency:

k = Ligand * k0 * eVoltage * k1

The units for k are "per hour"; the units for k0 are "per millimolar, per hour" (or per micromolar, etc... as you've recorded it). You could record how much coffee I've had each day along with my dwells in class red or blue, and with a few days' record QuB can solve for the most likely k0. Then you could use QuB to simulate my behavior with ten times as much coffee. You could even simulate a class full of a hundred of me -- how many of us are asleep any given time? and what would be our response to a fresh pot of coffee?

I'm going to try to explain how QuB models relate to experimental data, and I'll also give you a glimpse inside the mathematics. Since I know more about coffee than acetylcholine, that's how I'm going to frame it, but keep in mind, instead of sunglass-wearing we could be talking about an ion channel opening and closing, or the motion of a fluorescent motor protein, or any number of macromolecules. Our program has even been used to model the sleep cycles of mice, with light as the variable instead of coffee.


Simulating with the A matrix

First, a little math. The "state probability vector," P(t), is the probability of being in each state at time t. At the beginning of class I'm in state 1 (awake, sunglasses off), so P(0) = [1 0 0]. We can find P(1) by multipling P(0) by a transition probability matrix (call it "A"): P(1) = P(0) * A.

P(1)_1 = 1 \cdot a_{12} + 0 \cdot a_{22} + 0 \cdot a_{32}

[P(1) = \left[ a_{11} \quad a_{12} \quad a_{13} \right]

Matrices can be overwhelming, but basically the probability of ending in state 2 is the sum over all possible states it could have come from, of the probability you were there, times the probability of that transition. The chance of going from state 1 to 2, in this case, is in "A", row 1, column 2, so that's convenient.

In our model, though, there's no A matrix. The closest thing we can make is called the "Q" matrix. Q_{12} is the rate constant -- the average number of transitions to 2, per minute spent in 1. You could also say it's the probability per minute of that transition, which means we can integrate it to get probability. We can take the calculus on faith, but here's the solution:

Q = \left[ \begin{array}{ccc}
 * & 1/4 & 0 \\
 1 & * & 1/2 \\
 0 & 1/5 & *
 \end{array} \right]; Q = \left[ \begin{array}{ccc}
 -1/4 & 1/4 & 0 \\
 1 & -3/2 & 1/2 \\
 0 & 1/5 & -1/5
 \end{array} \right]

A(t) = eQt

A(1) = \left[ \begin{array}{ccc}
 .847 & .118 & .035 \\
 .470 & .287 & .242 \\
 .056 & .097 & .847
 \end{array} \right]

And here's A(1 minute) for the sunglasses model. It turns out, with these rate constants, that if I'm in state 1 at the beginning of class, there's a 3 in 20 chance I'll have my sunglasses on a minute later. Let's round them all to 1/20, then we can simulate me, using this 20-sided die.

A(1) = \left[ \begin{array}{ccc}
 17/20 & 2/20 & 1/20 \\
 9/20 & 6/20 & 5/20 \\
 1/20 & 2/20 & 17/20
 \end{array} \right]

Same thing with the coffee model. We just re-compute the A matrix whenever the amount of coffee changes. Take a look at the transition rate from 1 to 4 (a1,4):

A_{1 cup}(1) = \left[ \begin{array}{cccc}
 .730 & .107 & .033 & .131 \\
 .427 & .284 & .242 & .047 \\
 .053 & .097 & .847 & .004 \\
 .327 & .029 & .006 & .638
 \end{array} \right]; A_{10 cups}(1) = \left[ \begin{array}{cccc}
 .240 & .051 & .020 & .688 \\
 .206 & .267 & .239 & .289 \\
 .033 & .096 & .847 & .025 \\
 .172 & .018 & .004 & .806
 \end{array} \right]

Here's a thousand of me simulated together, starting at 1 cup, increasing here to 10 cups. The graph shows the proportion with sunglasses on.

If you look at where it stabilizes, you might call that noise, but it's actually individuals moving their sunglasses -- just what we're interested in. Notice that it's "noisier" with less coffee, as more individuals are thinking of nodding off.

Also, notice the exponential rise and fall. In the old days they would curve fit it, and describe the effect of coffee on the parameter Tau. With Markov models we can use more structurally meaningful parameters.

Idealizing with Viterbi

This wouldn't be biophysics without a recording apparatus. For the coffee experiments we have a security camera trained on the subject's face. It takes a picture once a minute and sends it through a device which measures the amount of black in the frame, which is then recorded as a number between 0 (no black) and 1 (all black).

The subject fidgets, the sun comes out, ...; the data is noisy...

How do we clean it up into a sequence of on-or-off observations? We call this "idealizing" the data. When we analyze single-molecule data, generally we idealize it, then work with the sequence of "dwells" or "events", where it stays in one class for an interval of time.

The simplest way to idealize is with a threshold, say 1/2, above which the glasses are considered to be on. It's plainly wrong, though, when the sun glints off my glasses mid-nap.

  1. it barely crossed the threshold -- that is, it's far from 0
  2. it's way shorter than my typical wake-up time

So it would be good to use these expectations to idealize it better. It turns out we can build them into a hidden Markov model. You've seen the rate constants -- those say how frequently you expect transitions. In our models, you also give the amplitude distribution of each color (or "class") as a Gaussian, for example with mean 1.0 and standard deviation 0.2.

So if we have a data point that's exactly 0.5, the chance of sunglasses on, given that datapoint and our model, is Gaussian(1.0, 0.3, 0.5) = ...

And if it had previously been in state 1, the chance of now being in state 2 is a1,2 * Gaussian(1.0, 0.3, 0.5)

In other words, the probability of the transition, times the probability of the amplitude. Given a sequence of datapoints, and a model, we can calculate the probability of a particular state sequence, say 1, 1, 2, 3, 3, 3, 2, 1, by just chaining them together:

P(stateseq: 1, 1, 2, 3) = 
 P(start:1) \cdot P(amp_1, data_1) 
 \cdot P(1 \to 1) \cdot P(amp_1, data_2) 
 \cdot P(1 \to 2) \cdot P(amp_2, data_3) 
 \cdot P(2 \to 3) \cdot P(amp_3, data_4)

With this definition of the likelihood of a state sequence, what we want is the most likely sequence of states. There's a quick, slick way to find it, called Viterbi, that's been used in speech recognition for decades. We make this lattice, with states on the y axis and time on x,

and fill it in like a worksheet from left to right. First, the starting probabilities for each state. For simplicity, let's make them equally likely at 0.25. Next, the amplitude probability of the first datapoint (it's the same for states 1 and 4, and for states 2 and 3). Now multiply them together to get the likelihood of that one-state sequence. This is, trivially, the most likely state sequence ending in that state.

Now for the recursive step: we mark up the transitions with their probabilities, from the A matrix; and the next column of states with their amplitude probabilities. Then we look at each possible transition, let's say from state A to state B. Starting with the likelihood of having ended at A, we multiply by the transition probability and the amplitude probability to get the likelihood of the MLSS ending in A,B. We mark the most likely transition, and write down the most likely likelihood. This recursion repeats until the last datapoint, when we pick the state with the best likelihood and trace backwards through the most likely transitions.

This is theoretically the best idealization, but it depends on knowing the right rate constants and amplitude distributions. QuB handles this by alternating between Viterbi, to find the MLSS, and re-estimation, i.e. what is the population mean and std deviation of datapoints classified as red, and what proportion of datapoints classified in state 1 are followed by datapoints in state 2? This alternating procedure is called Segmental K-Means, and it's QuB's default. It's quite fast. Critically, SKM assumes noise is Gaussian, and transitions are instantaneous.

Maximizing Likelihood

A common theme in our software is "Maximizing Likelihood." With SKM we're finding the most likely state sequence, then re-estimating parameters to improve the most likely likelihood. We have another algorithm, Maximum Interval Likelihood (MIL) which uses the same principles to find the most likely rate constants, given a connection scheme and some data. MIL defines likelihood as the sum likelihood, over all state sequences that match the data, of that sequence being generated by the model. It uses a gradient search to adjust the rate constants until likelihood reaches a maximum. Another one, "Mac" optimizes rate constants and amplitude distributions from recordings of hundreds or thousands of (ion channels) together.

And that's great, if you know the model. But usually you have to come up with a model somehow, or decide between models. You need to decide how many states, and how they are connected. Sometimes it's hard even to know how many colors -- is it noise or a sub-level? In general, it's a better model if it has significantly higher likelihood with the data. And like any model, it's better if it's simpler. You can get a simpler model by simplifying the data: for example, no-one has a good model for desensitization in ion channels -- long periods of inactivity -- so they generally cut out the long dwells and analyze the rest. If the data responds to a stimulus, a good model has just one rate constant that changes with coffee. But in general this is kind of a dumb calculator. It will give you "the answer" no matter what model you give it.