MoreLikelihood

From Sean_Carver
Revision as of 19:28, 29 January 2009 by Carver (talk | contribs) (Two Random Variables, Independence)
Jump to: navigation, search

Review

  • Random number generators (rand and randn) produced differently distributed pseudo-random numbers.
  • Setting the random seed allows you to reproduce the same random numbers.
  • Remember the help and type and lookfor commands.
  • Statistics: mean, median, mode, std, min, max. These MATLAB commands are functions of data, but they have theoretical counterparts which depend upon the PMF or PDF.
  • For discrete random variables: Probability Mass Function which is the function of the possible values of random variable equal to the fraction of throws that land on the value.
  • For continuous random variables: Probability Density Function which is the function of the possible values of the random variable equal to the fraction of throws that land in a bin divided by width of bin, or actually the limit of this fraction as the bin width (bin containing the value) converges to 0.
  • Not review but extension: The axiomatic theory of probability always has in mind a set of outcomes of the experiment at hand. A random variable is a "real valued" function of the possible outcomes of the experiment. The possible values of the random variable might be the same as the possible outcomes, or there might be more outcomes. Example: Say the experiment is to throw a dart at a dartboard and the random variable is the number of points awarded. The values of the random variable are the integers between 0 and 100. The possible outcomes of the experiment could be the number of points awarded or it could be the (Cartesian or polar) coordinates of the dart on the board. In the latter case, infinitely many outcomes yeild the same value of the random variable.
  • Following up the last 3 points, the PMF and PDF are always functions mapping all real numbers to probabilites (between 1 and 0) or probability densities (between 0 and infinity).
  • Functions of random variables lead to new random variables. This allows more possibilities than rand and randn. Also note that these functions of rand and randn can depend on other quantities (parameters).
  • Sometimes it is the case that you observe something random (a person's performance throwing darts) and you want a statistical model. You specify a PDF (for the height of the dart's landing point) but (since you are just guessing) you leave some parameters free. You have an assumed probability density (of the dart's height) that depends upon the dart's height and parameters. If you fix the parameters, you get a PDF, a function of just the dart's coordinates (the possible outcomes and/or values of the random variable). The integral of this function over its entire range must be 1. If you fix the cooridinates, you get a likelihood function, a function of the parameters. But is not a PDF: the parameters are not random they are just unknown. The integral of this function over the entire range need not be 1.
  • This likelihood function (for only one data point/dart throw) is not very useful. It becomes more useful when we consider many data points. Let's start with two.

Two Random Variables, Independence

Consider two examples of two random variables:

  • Two calls to randn (or two throws of the dice).
  • The membrane potential of a cell under study at 12:00 noon, yesterday, and the membrane potential of the same cell 10 microseconds later.

See the difference? In the first case, the value of the first random variable tells you nothing about the value of the second. In the second case, if you measure the membrane potential at 12:00 noon to be -60 mV, you can be quite sure that ten microseconds later that the membrane potential is close to (but not necessarily exactly) -60 mV. The point is, if you didn't know the value at 12:00 noon, you would be much less certain about the second value.

If the value of one random variable tells you nothing about the value of the other, we say the random variables are independent.

What does this mean mathematically? The mathematical definition of independence is written in terms of certain densities. There are three types of densities for two random variables X1, X2.

  • Marginal densities: p(X1), p(X2).
  • Joint density: p(X1,X2).
  • Conditional densities p(X1|X2) and p(X2|X1).

The difference is best described with a picture. First generate some data:

A = randn(100000,2)

This repeats the experiment 100000 times of calling randn twice. Now we plot:

hist3(A)

This is OK, but more resolution (more bins) is better. Try 30 x 30 bins:

hist3(A,[30,30])

What are we looking at: rows of bins correspond to the first call of randn and columns correspond to the second. If it happened that 1000 times both calls were near 0 (falling in the center bin) then the height of the center bin would be 1000. Maybe 5 times it was near (1,0), (i.e. first throw 1, second 0), 3 times near (0,1), and 1 time near (1,1). Those would be the heights of those bins.

  • Marginals: p(X1) looks at the first call to randn separately from X2, as if X2 didn't exist. To generate the histogram for the marginals we add across columns to get the total numbers of points in the one-variable histogram bins for X1. We might do the same for X2 adding across rows. If we had a table instead of a graph, the resulting number from these additions could go in the margins. Hence the name. Note: The marginal density p(X1) is the density of X1 that we studied last week that does not take anything about X2 into account. Likewise for p(X2). To get marginal density from the histogram you would have to add rows or columns then normalize then take a limit. By normalize I mean divide by the total number of experiments so that the area under the resulting graph integrates to 1.
  • Joint: The histogram corresponding to the joint density is the graph you just generated, above. The height of the bin centered at (x1,x2) tells you how many times X1 ~= x1 (i.e the random variable X1 is falls in the row-bin containing the value x1) and X2 ~= x2. To get the joint density from the histogram you would have to noramalize then take a limit. Again by normalize I mean divide by the total number of experiments so that the volume under the resulting graph double-integrates to 1.
  • Conditional: The conditional density p(X2|X1) tells you the density of observing X2 after you have already observed X1. The histogram corresponding to the density p(X2|X1 ~= x1) is the row of the hist3 histogram corresponding to the bin containing x1. The difference between the conditional density and the joint density is the normalization. The conditional density p(X2|X1 = x1) is normalized with respect to only the x1 row. Consider the difference with the membrane potential example. Say the cell very rarely gets hyperpolarized down to -90 mV. Then p(X1 = -90 mV, X2 = -89.9 mV) would be very small. On the other hand, if it does get down to -90 mV, then it might not be so unlikely that a short time later it is -89.9 mV; the conditional density p(X2|X1) would be larger than the joint density. Note p(X2|X1) and p(X1|X2) are both defined, but they are not always the same thing. For example a hyperpolarized cell might be more likely to depolarize than further hyperpolarize. In this case, if b < a < Resting potential, then p(X2=a|X1=b) > p(X1=a|X2=b).

Let's try a histogram for the 10 microsecond experiment with the hogkin huxley model. First download hh1step.m, put it on your computer where MATLAB can see it (e.g. the current directory) then execute the following commands: V1 and V2 are the membrane potentials and N is the noise.

V1 = randn(100000,1);
N = randn(100000,1);
V2 = hh1step(V1,N);
hist3([V1,N],[30,30])
hist3([V1,V2],[30,30])

V1 and N are independent and V1 and V2 are not.

  • Homework C.1: What is special about the joint density of independent random variables? How can you tell from the second 3D histogram that V1 and V2 are not independent? For this problem, think about more about the definition of independence and use it to justify your answer. For more intuition, you could try plotting 3D histograms of other independent random variables [RV1, RV2] where the random variables are functions of different calls to rand or randn (as toward the end of Lab C).