Difference between revisions of "Introduction to Microarray analysis"
From Organic Design wiki
m (→Analysis wish list) |
m (→Statistical issues) |
||
Line 17: | Line 17: | ||
====Statistical issues==== | ====Statistical issues==== | ||
*In the past statistics was developed for n >>p | *In the past statistics was developed for n >>p | ||
− | :<font color="blue">n observations, p variables</font> | + | :<font color="blue">''n observations, p variables''</font> |
*Gene expression data n<<p | *Gene expression data n<<p | ||
− | :<font color="blue">Thousands of measured genes (p)</font> | + | :<font color="blue">''Thousands of measured genes (p)''</font> |
− | :<font color="blue">Small number of biological replicate slides (n)</font> | + | :<font color="blue">''Small number of biological replicate slides (n)''</font> |
*Gene expression data can be highly correlated | *Gene expression data can be highly correlated | ||
− | :<font color="blue">groups of genes are regulated in the same way</font> | + | :<font color="blue">''groups of genes are regulated in the same way''</font> |
*Data not normally distributed | *Data not normally distributed | ||
− | :<font color="blue">log transform highly skewed intensity data</font> | + | :<font color="blue">''log transform highly skewed intensity data''</font> |
[[Image:Graph channels.tiff|right|thumb|250px|''Density plots from a 16-bit scanner'']] | [[Image:Graph channels.tiff|right|thumb|250px|''Density plots from a 16-bit scanner'']] | ||
---- | ---- | ||
+ | |||
====Analysis wish list==== | ====Analysis wish list==== | ||
* Ideally would like unambiguous interpretation of results | * Ideally would like unambiguous interpretation of results |
Revision as of 21:16, 14 March 2006
Overview of experimental process
- Competitive hybridization to spotted oligo/cDNA transcripts
- Interested in genes that change between treatments
- → differential expression versus equivalent expression
Statistical analysis process
- Raw data (GPR file format)
- Each GPR intensity file is typically >8 megabytes
- Each TIFF image file is typically >30 megabytes
- A microarray experiment consists of several → many slides
Statistical issues
- In the past statistics was developed for n >>p
- n observations, p variables
- Gene expression data n<<p
- Thousands of measured genes (p)
- Small number of biological replicate slides (n)
- Gene expression data can be highly correlated
- groups of genes are regulated in the same way
- Data not normally distributed
- log transform highly skewed intensity data
Analysis wish list
- Ideally would like unambiguous interpretation of results
- Large amounts of data to analyse can be overwhelming and make interpretation subjective
- Independent reproducibility of results by another collegue
- Keep a record (log) of what was done
Analysis aim
- Easier to rank genes in order of evidence of differential expression than it is to select a specific cutoff
- If we do select a cutoff, False Discovery Rate (FDR) cutoff is usually used
- FDR threhold is the expected proportion of genes in a list that are likely to be incorrect
TODO: Picie of a gene list