Difference between revisions of "Limma analysis"

From Organic Design wiki
(Adding references)
m (References)
Line 54: Line 54:
  
 
====References====
 
====References====
*Lonnstedt, I., and Speed, T. P. (2002). Replicated microarray data. Statistica Sinica  
+
*Lonnstedt, I., and Speed, T. P. (2002). Replicated microarray data. Statistica Sinica 12, 31–46.  
12, 31–46.  
 
 
*Smyth G.K. (2002). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
 
*Smyth G.K. (2002). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments

Revision as of 01:07, 25 July 2006


Linear models for microarray analysis

Linear models for microarray analysis (Limma) is a R and Bioconductor package for organising and analysing cDNA and Affymetrix microarray data. It is written by Gordon Smyth at WEHI.

Algorithm details

For a p * n matrix of expression intensities, Limma is fitting p linear models (one for each row). The lmFit function does this by calling functions such as lm.series which use lm.fit in Package:Stats. For cDNA/oligo two spotted technologies the matrix of expression intensities is usually the marix M values with respect to treatments. For Affymetrix single channel arrays the expression intensities are directly analysed comparing two treatments.

y = Xβ + ε

The linear model reduces to effectively estimating average M values using a categorical design matrix. If correlation between rows (spots) is estimated, then the function duplicateCorrelation is called. This fits a reml model on all genes to estimate a rho correlation matrix. A fisher transformation (identical to atanh(x)) is then applied to the rho matrix calculating a mean correlation with trim=0.15 by default, which is then backtransformed to give a consensus correlation. This correlation can be utilised in lmFit by calling gls.series which fits a generalized least squares model.

List elements from lmFit

> names(fit)
[1] "coefficients"     "rank"             "assign"           "qr"              
[5] "df.residual"      "sigma"            "cov.coefficients" "stdev.unscaled"  
[9] "pivot"            "method"           "design"           "Amean"           
[13] "genes" 
  • coefficients= estimated M values
  • qr = qr decomposition
  • df.residual=residual degrees of freedom for each gene
  • Amean = Estimated unweighted A values
  • stdev.unscaled = scaling required
  • method = model fitting method used
  • design=the design matrix X used
  • genes = list of gene names

The addition of a corrleation term, and specification of technical or biological blocking will alter sigma, stdev.unscaled, cov.oefficients, and remove rank and qr from the list. Different corrleations combined with spot weighting will effect coefficient estimates.

Empirical Bayes using conjugate priors are used to calculate posterior probability measures such as moderated t-statistics and B statistics. Prior information for the moderated t-statistics comes form the data, while B statistics aso depend on an assumed value of p, the proportion of genes which change.

The wrapper function eBayes for ebayes calculates these statistics for each row (spot).

List elements from ebayes

> names(ebayes(fit))
[1] "df.prior"  "s2.prior"  "s2.post"   "t"         "p.value"   "var.prior"
[7] "lods"
  • df.prior= dg
  • s2.prior=(so)2
  • s2.post=(sg)2
  • t=moderated t-statistics
  • p.value = moderated t-statistics p values
  • lods = log odds B statistics

Moderated t-statistics

The function ebayes calculates two statistical measures, moderated t-statistics log odds ratio B statistics. The moderated t-statistic part calls squeezeVar which calls fitFDist. This assumes an inverse gamma distribution prior, and gamma distribution likelihood function for the variances. The data is used to calculate the scale (var.prior) and degrees of freedom for the prior distribution.

References

  • Lonnstedt, I., and Speed, T. P. (2002). Replicated microarray data. Statistica Sinica 12, 31–46.
  • Smyth G.K. (2002). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments