Difference between revisions of "Linear models for Microarray analysis"

From Organic Design wiki
m (Caretaker: Format links, Format cat links, Format headings)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
 
[[Category:Sven/Rosaceae]]
 
[[Category:Sven/Rosaceae]]
 
__NOTOC__
 
__NOTOC__
===== Overview of Limma package for R=====  
+
 
 +
== Overview of Limma package for R ==
 
*Fits a linear model for each spot (''gene'')
 
*Fits a linear model for each spot (''gene'')
 
*An open source software package for the R programming environment
 
*An open source software package for the R programming environment
Line 9: Line 10:
 
----
 
----
  
===== Origin =====
+
== Origin ==
 
*Written and maintained by Gordon Smyth with contributions From WEHI, Melbourne, Australia
 
*Written and maintained by Gordon Smyth with contributions From WEHI, Melbourne, Australia
 
*Software made public at the Australian Genstat Conference, Perth, in Dec 2002
 
*Software made public at the Australian Genstat Conference, Perth, in Dec 2002
Line 18: Line 19:
 
----
 
----
  
==== Statistical approach ====
+
= Statistical approach =
 
*Parallel inference for each gene
 
*Parallel inference for each gene
 
*Computationally fast/robust
 
*Computationally fast/robust
Line 26: Line 27:
 
----
 
----
  
==== Object orientated programming environment ====
+
= Object orientated programming environment =
 
[[Image:OOP.png]]
 
[[Image:OOP.png]]
 
*Uploading data into the R programming language automatically populates elements of RGList
 
*Uploading data into the R programming language automatically populates elements of RGList
Line 47: Line 48:
 
----
 
----
  
==== Advantages using Limma ====
+
= Advantages using Limma =
 
*Nice organisational framework for handling cDNA expression data using object orientated programming
 
*Nice organisational framework for handling cDNA expression data using object orientated programming
 
*Flexible methods to handle weighting of poor quality spots
 
*Flexible methods to handle weighting of poor quality spots
Line 56: Line 57:
 
----
 
----
  
==== Limitations ====
+
= Limitations =
 
*Experiments with different spotting templates cannot easily be combined for analysis
 
*Experiments with different spotting templates cannot easily be combined for analysis
 
*Statistical analysis cannot pool information together when there are variable numbers of the same replicated spots  
 
*Statistical analysis cannot pool information together when there are variable numbers of the same replicated spots  
Line 64: Line 65:
 
----
 
----
  
==== Microarray workshop experiment ====
+
= Microarray workshop experiment =
 
*Dye swap experiment
 
*Dye swap experiment
 
*Directed graph
 
*Directed graph
Line 78: Line 79:
 
* Fruit versus Leaf comparisons M value multipliers -1, 1, 1, -1
 
* Fruit versus Leaf comparisons M value multipliers -1, 1, 1, -1
 
*<font color="blue">''Determining design questions of interest is the hardest part''</font>
 
*<font color="blue">''Determining design questions of interest is the hardest part''</font>
 +
 +
[[Category:Microarray]]

Latest revision as of 21:53, 11 November 2007


Overview of Limma package for R

  • Fits a linear model for each spot (gene)
  • An open source software package for the R programming environment
  • Focus on normalization and statistical analysis of cDNA microarray gene expression data
  • OOP environment for handling information in a microarray experiment
  • Statistical analysis approach can be used for Affymetrix microarray experiments

Origin

  • Written and maintained by Gordon Smyth with contributions From WEHI, Melbourne, Australia
  • Software made public at the Australian Genstat Conference, Perth, in Dec 2002
  • Became available in the Bioconductor open source bioinformatics project April 2003
  • Limma integrates with other Bioconductor software packages, affy, marray, using convert package
  • Active development cycle

File:Limma versions.tiff


Statistical approach

  • Parallel inference for each gene
  • Computationally fast/robust
  • Handles missing information/use defined flag information
  • Linear models are essentially t-statistics for each spot/gene (signal/noise)
  • Also makes use of between gene information (moderated t-statistics)

Object orientated programming environment

File:OOP.png

  • Uploading data into the R programming language automatically populates elements of RGList
    • R (Red foreground)
    • G (Green foreground)
  • Foreground intensities range ~ 1 → 65535
    • Rb (Red background)
    • Gb (Green background)
  • Background intensities range ~ 1 → 1000
    • genes (Spot annotation list)
    • weights (prior weights weights given to each spot)
  • MAList data transformation
    • M = log2(R) - log2(G) (minus)
    • A = (log2(R) + log2(G))/2) (add - abundance)
  • Backtransforming to Normalized R', G' values
    • log2(R') = A + M/2
    • log2(G') = A - M/2

Advantages using Limma

  • Nice organisational framework for handling cDNA expression data using object orientated programming
  • Flexible methods to handle weighting of poor quality spots
  • Encorporates cDNA normalization routines with a proven track record
  • Robust statistical analysis approach
Can analyze cDNA microarray slides possessing large amounts of missing information
  • Analysis methods able to encorporate duplicate spots from either technical or biological sources

Limitations

  • Experiments with different spotting templates cannot easily be combined for analysis
  • Statistical analysis cannot pool information together when there are variable numbers of the same replicated spots
Must analyze spot information about the same transcript independently
  • Linear models cannot encorporate error model structures from time series designs

Microarray workshop experiment

  • Dye swap experiment
  • Directed graph

File:Dyeswap.png

FileName SlideNumber   Cy3   Cy5 Design
BE34.gpr          34  Leaf Fruit     -1
BE35.gpr          35 Fruit  Leaf      1
BE36.gpr     	  36 Fruit  Leaf      1
BE37.gpr   	  37  Leaf Fruit     -1
  • Fruit versus Leaf comparisons M value multipliers -1, 1, 1, -1
  • Determining design questions of interest is the hardest part