MoreLikelihood

From Sean_Carver
Revision as of 17:24, 2 February 2009 by Carver (talk | contribs) (Probability Density with randn)
Jump to: navigation, search

Review

  • Random number generators (rand and randn) produced differently distributed pseudo-random numbers.
  • Setting the random seed allows you to reproduce the same random numbers.
  • Remember the help and type and lookfor commands.
  • Statistics: mean, median, mode, std, min, max. These MATLAB commands are functions of data, but they have theoretical counterparts which depend upon the PMF or PDF.
  • For discrete random variables: Probability Mass Function which is the function of the possible values of random variable equal to the fraction of throws that land on the value.
  • For continuous random variables: Probability Density Function which is the function of the possible values of the random variable equal to the fraction of throws that land in a bin divided by width of bin, or actually the limit of this fraction as the bin width (bin containing the value) converges to 0.
  • Not review but extension: The axiomatic theory of probability always has in mind a set of outcomes of the experiment at hand. A random variable is a "real valued" function of the possible outcomes of the experiment. The possible values of the random variable might be the same as the possible outcomes, or there might be more outcomes. Example: Say the experiment is to throw a dart at a dartboard and the random variable is the number of points awarded. The values of the random variable are the integers between 0 and 100. The possible outcomes of the experiment could be the number of points awarded or it could be the (Cartesian or polar) coordinates of the dart on the board. In the latter case, infinitely many outcomes yeild the same value of the random variable.
  • Following up the last 3 points, the PMF and PDF are always functions mapping all real numbers to probabilites (between 1 and 0) or probability densities (between 0 and infinity).
  • Functions of random variables lead to new random variables. This allows more possibilities than rand and randn. Also note that these functions of rand and randn can depend on other quantities (parameters).
  • Sometimes it is the case that you observe something random (a person's performance throwing darts) and you want a statistical model. You specify a PDF (for the height of the dart's landing point) but (since you are just guessing) you leave some parameters free. You have an assumed probability density (of the dart's height) that depends upon the dart's height and parameters. If you fix the parameters, you get a PDF, a function of just the dart's coordinates (the possible outcomes and/or values of the random variable). The integral of this function over its entire range must be 1. If you fix the cooridinates, you get a likelihood function, a function of the parameters. But is not a PDF: the parameters are not random they are just unknown. The integral of this function over the entire range need not be 1.
  • This likelihood function (for only one data point/dart throw) is not very useful. It becomes more useful when we consider many data points. Let's start with two.

Two Random Variables, Independence

Consider two examples of two random variables:

  • Two calls to randn (or two throws of the dice).
  • The membrane potential of a cell under study at 12:00 noon, yesterday, and the membrane potential of the same cell 10 microseconds later.

See the difference? In the first case, the value of the first random variable tells you nothing about the value of the second. In the second case, if you measure the membrane potential at 12:00 noon to be -60 mV, you can be quite sure that ten microseconds later that the membrane potential is close to (but not necessarily exactly) -60 mV. The point is, if you didn't know the value at 12:00 noon, you would be much less certain about the second value.

If the value of one random variable tells you nothing about the value of the other, we say the random variables are independent.

What does this mean mathematically? The mathematical definition of independence is written in terms of certain densities. There are three types of densities for two random variables X1, X2.

  • Marginal densities: p(X1), p(X2).
  • Joint density: p(X1,X2).
  • Conditional densities p(X1|X2) and p(X2|X1).

The difference is best described with a picture. First generate some data:

A = randn(100000,2)

This repeats the experiment 100000 times of calling randn twice. If you have access to the Statistics Toolbox (you don't on the machines in the classroom) you generate the plot below with this command:

hist3(A)

This is what you would see:

Click image for full size histogram

This is OK, but more resolution (more bins) is better. Try 30 x 30 bins:

hist3(A,[30,30])
Click image for full size histogram

If you don't have the Statistics toolbox, download the free version, hist2d here. Now type

hist2d(A)

I think this will work, but if it does not, you need to put the downloaded file in a place where matlab will see it. To see what MATLAB's current directory is, type

pwd

then move the file to that directory. I do not think this will be necessary in the classroom (I tried it already), but if it is necessary, there may be problems with permissions to write to MATLAB's current directory. Get help from me if this is the case.

What are we looking at: rows of bins correspond to the first call of randn and columns correspond to the second. The bins vary along two dimensions (first call to randn and second call to randn). The following explains the differences in the densities:

  • Marginals: p(X1) looks at the first call to randn separately from X2, as if X2 didn't exist. To generate the histogram for the marginals we add across columns to get the total numbers of points in the one-variable histogram bins for X1. We might do the same for X2 adding across rows. If we had a table instead of a graph, the resulting number from these additions could go in the margins. Hence the name. Note: The marginal density p(X1) is the density of X1 that we studied last week that does not take anything about X2 into account. Likewise for p(X2). To get marginal density from the histogram you would have to add rows or columns then normalize then take a limit. By normalize I mean divide by the total number of experiments so that the area under the resulting graph integrates to 1.
  • Joint: The histogram corresponding to the joint density is the graph you just generated, above. The height of the bin centered at (x1,x2) tells you how many times X1 ~= x1 (i.e the random variable X1 is falls in the row-bin containing the value x1) and X2 ~= x2. To get the joint density from the histogram you would have to noramalize then take a limit. Again by normalize I mean divide by the total number of experiments so that the volume under the resulting graph double-integrates to 1.
  • Conditional: The conditional density p(X2|X1) tells you the density of observing X2 after you have already observed X1. The histogram corresponding to the density p(X2|X1 ~= x1) is the row of the hist3 histogram corresponding to the bin containing x1. The difference between the conditional density and the joint density is the normalization. The conditional density p(X2|X1 = x1) is normalized with respect to only the x1 row. Consider the difference with the membrane potential example. Say the cell very rarely gets hyperpolarized down to -90 mV. Then p(X1 = -90 mV, X2 = -89.9 mV) would be very small. On the other hand, if it does get down to -90 mV, then it might not be so unlikely that a short time later it is -89.9 mV; the conditional density p(X2|X1) would be larger than the joint density. Note p(X2|X1) and p(X1|X2) are both defined, but they are not always the same thing. For example a hyperpolarized cell might be more likely to depolarize than further hyperpolarize. In this case, if b < a < Resting potential, then p(X2=a|X1=b) > p(X1=a|X2=b).

Let's try a histogram for the 10 microsecond experiment with the hogkin huxley model. First download HH1step.m, put it on your computer where MATLAB can see it (e.g. the current directory) then execute the following commands: V1 and V2 are the membrane potentials and N is the noise.

V1 = randn(100000,1);
N = randn(100000,1);
V2 = HH1step(V1,N);
hist3([V1,N],[30,30])
Click image for full size histogram

V1 Versus N is the same histogram as randn Versus randn. The only difference is that the axes are labeled. Here are the commands for this

xlabel('Initial Membrane Potential (mV)')
ylabel('Noise Current (nS)')
zlabel('Count')

Now let's try V1 versus V2:

hist3([V1,V2],[30,30])
Click image for full size histogram

V1 and N are independent and V1 and V2 are not.

  • Homework C.1: What is special about the joint density of independent random variables? How can you tell from the second 3D histogram that V1 and V2 are not independent? For this problem, think more about the definition of independence and use it to justify your answer. For more intuition, you could try plotting 3D histograms of other independent random variables [RV1, RV2] where the random variables are functions of different calls to rand or randn (as toward the end of Lab B). This extra work could be the basis of your bonus problem, if you want.

Joint Densities for Many Random Variables

Before we move on let's look a little closer at our Hodgkin-Huxley model and test to see if it is working. You would not ordinarily be interested in just one time step (10 microseconds). Usually you would want the membrane potential on a long sequence of equally spaced times. We can iterate HH1step to draw this picture. Cut and paste the following sets of commands into MATLAB:

N = 0.01;            % Set Noise level
I = 10;              % Set Injected current (mA)
V = zeros(10000,1);  % Allocate memory
V(1) = 0;            % Set Initial membrane potential

The symbol % tells MATLAB to ignore the rest of the line, so you can add a comment to a command. I is the injected current. Without specifying I, as above, the program sets I=0, but this doesn't make a nice voltage trace over 100 milliseconds because the squid axon is quiet without an injected current. Now we make a loop and plot:

[V(2),A] = HH1step(V(1),randn*N,I);
for i = 3:10000
[V(i),A] = HH1step(V(i-1),randn*N,I,A);
end
plot(1:10000,V)
Click image for full size plot

You should see a train of action potentials. The first command takes the first step, as before, but returns the ionic current activation variables (A) needed to continue the simulation. If you don't tell HH1step the values of A, it uses defaults; this is done on the first step. But if you continue the simulation you cannot continue to use the defaults: the components of A are changing quantities that, like the membrane potential, are needed for running the model and making predictions. The V's and the A's are the state variables. There are four state variables for the Hodgkin-Huxley equations (V and the three components of A). Notice what we have done with the noise: we add N*randn at each step. (Here is a technical point that is not too important: Inside the program I actually set the standard deviation of the added noise to be the noise level times the square root of the time step: N*randn*sqrt(10 microseconds), which is required for convergence as the time step (10 microseconds in this case) tends to zero.)

This is key: we have added noise at each step of the Hodgkin-Huxley simulation and arrived at a voltage trace. Thus each of the 10000 values of membrane potentials (at successive times separated by 10 microseconds) is a random variable. If the noise level N is large, this is more obivious. You can change I or N easily and rerun:

I = 11;
N = 10;

Now cut and paste the 5 commands above, ending with plot(1:10000,V), into MATLAB, several times, and see how the trajectory differs each time.

Click image for full size plot

Guess what? These random variables (V1,V2,...,V10000) have a joint density: p(V1,V2,...,V10000). You could plot this with a 10000 dimensional histogram! Well, OK, you probably couldn't do that, but theoretically speaking it's conceivable.

Each time you run the program you get V1,V2,...,V10000, a sequence of numbers. These will fall into bins of the 10000 dimensional histogram. The limiting fraction of times that the sequence falls into a small bin, divided by its 10000 dimensional volume, is the joint density, measuring how probable (i.e. probability density) the observed sequence is.

Another key point: If you change I and N, you get a different probability density. In other words, the probability density is a function of I and N. To express this fact it is written p(V1,V2,...,V10000|I,N) but this notation is an abuse of notation because I and N are not random variables.

What if you don't know what I and N are but you ran an experiment and had the numbers V1,V2,...,V10000. That is

I = 10*rand;
N = 10*rand;

Now repeat the 5 commands that run and plot the model. Now V1,V2,...,V10000 are fixed numbers (what you obtained from the calculation and plotted), but you think of I and N as variables. No more 10000 dimensions: there remain only two! p(N,I). This function is called the likelihood function.

Here is what it looks like (for I=11, N=10):

Click image for full size surface plot
Click image for full size contour plot

The peak this function is the maximum likelihood estimate of parameters.

There are somewhat subtle theorems that talk about the convergence of the estimates as the amount of data grows. Maximum likelihood estimation is one of the most well studied methods of parameter estimation in statistics.

Probability Density with randn

Calculating likelihood for the Hodgkin-Huxley model both conceptually difficult and takes a long time (minutes) on a computer. Drawing graphs takes hours. Lets do it for randn (much faster).

Remember the commands for drawing a histogram

A = randn(100000,1);
hist(A,100);
Click image for full size histogram

Compare this histogram with the theoretical density for the normal distribution:

x = -5:0.01:5;
density = exp(-x.^2/2)/sqrt(2*pi);
plot(x,density)
Click image for full size PDF

Remember what the difference between these two plots is. The histogram measures the count of how many rolls of randn fall into the bin. The pdf normalizes the histogram: it is the fraction of throws that fall into bins, not the count of throws; and it is the limit as the bin size goes to zero divided by the bin size.

It would be nice to have a function that returns the probability density. Type this into the editor and save as randnpdf.m:

function density = randnpdf(x)
% Return density of randn

density = exp(-x.^2/2)/sqrt(2*pi);

Now you can calculate the probability density of any value of randn. Try this

>> randnpdf(0)

ans =

    0.3989
 
>> randnpdf(1)

ans =

    0.2420

>> randnpdf(2)

ans =

    0.0540

Another thing you can do is get a data point, then see its density

x = randn
randnpdf(x)

Repeat these commands about 10 times (cut and paste or use the up arrow) to get some intuition about how much probability density points (i.e. calls to randn) tend to have and how much variation there is. Note that if the distribution changes (i.e. use rand instead of randn) this experiment can come out very differently. Specifically all points returned by rand have density 1: remember what the histogram looked like.