Linear models for Microarray analysis

From Organic Design wiki


Overview of Limma package for R

  • Fits a linear model for each spot (gene)
  • An open source software package for the R programming environment
  • Focus on normalization and statistical analysis of cDNA microarray gene expression data
  • OOP environment for handling information in a microarray experiment
  • Statistical analysis approach can be used for Affymetrix microarray experiments

Origin

  • Written and maintained by Gordon Smyth with contributions From WEHI, Melbourne, Australia
  • Software made public at the Australian Genstat Conference, Perth, in Dec 2002
  • Became available in the Bioconductor open source bioinformatics project April 2003
  • Limma integrates with other Bioconductor software packages, affy, marray, using convert package
  • Active development cycle

File:Limma versions.tiff


Statistical approach

  • Parallel inference for each gene
  • Computationally fast/robust
  • Handles missing information/use defined flag information
  • Linear models are essentially t-statistics for each spot/gene (signal/noise)
  • Also makes use of between gene information (moderated t-statistics)

Object orientated programming environment

File:OOP.png

  • Uploading data into the R programming language automatically populates elements of RGList
    • R (Red foreground)
    • G (Green foreground)
  • Foreground intensities range ~ 1 → 65535
    • Rb (Red background)
    • Gb (Green background)
  • Background intensities range ~ 1 → 1000
    • genes (Spot annotation list)
    • weights (prior weights weights given to each spot)
  • MAList data transformation
    • M = log2(R) - log2(G) (minus)
    • A = (log2(R) + log2(G))/2) (add - abundance)
  • Backtransforming to Normalized R', G' values
    • log2(R') = A + M/2
    • log2(G') = A - M/2

Advantages using Limma

  • Nice organisational framework for handling cDNA expression data using object orientated programming
  • Flexible methods to handle weighting of poor quality spots
  • Encorporates cDNA normalization routines with a proven track record
  • Robust statistical analysis approach
Can analyze cDNA microarray slides possessing large amounts of missing information
  • Analysis methods able to encorporate duplicate spots from either technical or biological sources

Limitations

  • Experiments with different spotting templates cannot easily be combined for analysis
  • Statistical analysis cannot pool information together when there are variable numbers of the same replicated spots
Must analyze spot information about the same transcript independently
  • Linear models cannot encorporate error model structures from time series designs

Microarray workshop experiment

  • Dye swap experiment
  • Directed graph

File:Dyeswap.png

FileName SlideNumber   Cy3   Cy5 Design
BE34.gpr          34  Leaf Fruit     -1
BE35.gpr          35 Fruit  Leaf      1
BE36.gpr     	  36 Fruit  Leaf      1
BE37.gpr   	  37  Leaf Fruit     -1
  • Fruit versus Leaf comparisons M value multipliers -1, 1, 1, -1
  • Determining design questions of interest is the hardest part