MoreLikelihood

From Sean_Carver
Revision as of 22:46, 28 January 2009 by Carver (talk | contribs) (Two Random Variables, Independence)
Jump to: navigation, search

Review

  • Random number generators (rand and randn) produced differently distributed pseudo-random numbers.
  • Setting the random seed allows you to reproduce the same random numbers.
  • Remember the help and type commands, these are useful for learning MATLAB. Here's another useful one:
lookfor random
  • Statistics: mean, median, mode, std, min, max. These MATLAB commands are functions of data, but they have theoretical counterparts which depend upon the PMF or PDF.
  • For discrete random variables: Probability Mass Function which is the function of the possible values of random variable equal to the fraction of throws that land on the value.
  • For continuous random variables: Probability Density Function which is the function of the possible values of the random variable equal to the fraction of throws that land in a bin divided by width of bin, or actually the limit of this fraction as the bin width (bin containing the value) converges to 0.
  • Not review but extension: The axiomatic theory of probability always has in mind a set of outcomes of the experiment at hand. A random variable is a "real valued" function of the possible outcomes of the experiment. The possible values of the random variable might be the same as the possible outcomes, or there might be more outcomes. Example: Say the experiment is to throw a dart at a dartboard and the random variable is the number of points awarded. The values of the random variable are the integers between 0 and 100. The possible outcomes of the experiment could be the number of points awarded or it could be the (Cartesian or polar) coordinates of the dart on the board. In the latter case, infinitely many outcomes yeild the same value of the random variable.
  • Following up the last 3 points, the PMF and PDF are always functions mapping all real numbers to probabilites (between 1 and 0) or probability densities (between 0 and infinity).
  • Functions of random variables lead to new random variables. This allows more possibilities than rand and randn. Also note that these functions of rand and randn can depend on other quantities (parameters).
  • Sometimes it is the case that you observe something random (a person's performance throwing darts) and you want a statistical model. You specify a PDF (for the height of the dart's landing point) but (since you are just guessing) you leave some parameters free. You have an assumed probability density (of the dart's height) that depends upon the dart's height and parameters. If you fix the parameters, you get a PDF, a function of just the dart's coordinates (the possible outcomes and/or values of the random variable). The integral of this function over its entire range must be 1. If you fix the cooridinates, you get a likelihood function, a function of the parameters. But is not a PDF: the parameters are not random they are just unknown. The integral of this function over the entire range need not be 1.
  • This likelihood function (for only one data point/dart throw) is not very useful. It becomes more useful when we consider many data points. Let's start with two.

Two Random Variables, Independence

Consider two examples of two random variables:

  • Two successive rolls of the dice: 1,2,3,4,5,6.
  • The membrane potential of a cell under study at 12:00 noon, yesterday, and the membrane potential of the same cell 1 microsecond later.

See the difference? In the first case, the value of the first random variable tells you nothing about the value of the second. In the second case, if you measure the membrane potential at 12:00 noon to be -60 mV, you can be quite sure that one microsecond later that the membrane potential is close to (but not necessarily exactly) -60 mV. The point is, if you didn't know the value at 12:00 noon, you would be much less certain about the second value.

If the value of one random variable tells you nothing about the value of the other, we say the random variables are independent.

What does this mean mathematically? There are two cases to consider: continuous (PDFs) and discrete (PMFs). But the results are analogous so I'll treat them interchangably. Starting with densities:

There are three types of densities for two random variables X1, X2.

  • Marginal densities: p(X1), p(X2).
  • Joint density: p(X1,X2) which is the same as p(X2,X1).
  • Conditional densities p(X1|X2) and p(X2|X1).

Here are the differences:

  • Marginals: p(X1 = -70 mV) is the probability density of observing the membrane potential to be -70 mV at 12:00 noon, irrespective of the other random variable. To get the marginals throw the other random variable away, and use the PDF of the remaining as in Lab B.