Latest revision as of 21:53, 11 November 2007

Overview of experimental process

File:Expt.png

(Courtesy Mik Black)

Competitive hybridization to spotted oligo/cDNA transcripts
Interested in genes that change between treatment conditions

→ differential expression versus equivalent expression

Statistical analysis process

File:Process.png

Raw data (GPR file format)

http://www.moleculardevices.com/pages/software/gn_gpr_format_history.html

Each GPR intensity file is typically >8 megabytes
Each TIFF image file is typically >30 megabytes
A microarray experiment consists of several → many slides

Statistical issues

In the past statistics was developed for n >>p

n observations, p variables

Gene expression data n<<p

Thousands of measured genes (p)

Small number of biological replicate slides (n)

Gene expression data can be highly correlated

groups of genes are regulated in the same way

Data not normally distributed

log transform highly skewed intensity data

File:Graph channels.png

Analysis wish list

Ideally would like unambiguous interpretation of results
Large amounts of data to analyse can be overwhelming and make interpretation subjective
Independent reproducibility of results by another collegue

→Keep a record (log file) of what was done

Analysis aim

Obtain a list of genes which we think are differentially expressing

      Block Row Column            ID   Name         M        A         t      P.Value        B
10396    20  15     23 171121_390_49 171121  5.035364 13.25087  49.62425 3.220044e-05 11.27486
4517      9  13      9  20264_118_53  20264  4.396719 11.11976  47.06004 3.220044e-05 11.05671
16881    32  21     22 165415_634_53 165415  4.645384 12.65872  43.40359 3.220044e-05 10.70650
16086    31  10      9 185903_436_49 185903  5.146504 11.36911  42.75724 3.220044e-05 10.63926
6508     13   7     22 197386_457_55 197386  4.621024 13.20426  42.09902 3.220044e-05 10.56899
5471     11   8     20 142178_355_53 142178  4.795734 12.07427  41.23346 3.220044e-05 10.47374
8395     16  20     23   251706_1_53 251706 -5.003475 13.04571 -38.61325 3.220044e-05 10.16421
4330      9   5      6 297409_340_47 297409  4.421922 12.27208  38.52215 3.220044e-05 10.15284
12479    24  14     13 163360_396_47 163360  4.367943 11.10478  38.21662 3.220044e-05 10.11439
15024    29  10      5 149243_674_53 149243  4.372419 11.36572  37.86362 3.220044e-05 10.06935

Easier to rank genes in order of evidence of differential expression than it is to select a specific cutoff

If we do select a cutoff, False Discovery Rate (FDR) cutoff is usually used

FDR threhold is the expected proportion of genes in a list that are likely to be incorrect

@@ Line 1: / Line 1: @@
+{{#security:edit|Sven}}
+{{#security:*|Sven}}
+[[Category:Sven/Rosaceae]]
 __NOTOC__
-====Overview of experimental process====
-[[Image:expt2.tiff|thumb|500px|''Courtesy Mik Black'']]
+= Overview of experimental process =
+[[Image:expt.png]]
+:<font color="blue">(''Courtesy Mik Black'')</font>
 *Competitive hybridization to spotted oligo/cDNA transcripts
-*Interested in genes that change between treatments
+*Interested in genes that change between treatment conditions
 :<font color="blue">&rarr; ''differential expression versus equivalent expression''</font>
 ----
-====Statistical analysis process====
-[[Image:overview2.tiff|thumb|500px|''Analysis workflow from scanner to results'']]
+= Statistical analysis process =
+[[Image:process.png]]
 * Raw data (''GPR file format'')
 :''http://www.moleculardevices.com/pages/software/gn_gpr_format_history.html''
-* Each GPR intensity file is typically >8 megabytes
+* Each GPR intensity file is typically >8  megabytes
 * Each TIFF image file is typically >30 megabytes
 * A microarray experiment consists of several &rarr; many slides
 ----
-====Statistical issues====
+= Statistical issues =
 *In the past statistics was developed for n >>p
 :<font color="blue">''n observations, p variables''</font>
@@ Line 28: / Line 34: @@
 *Data not normally distributed
 :<font color="blue">''log transform highly skewed intensity data''</font>
-[[Image:Graph channels.tiff|right|thumb|250px|''Density plots from a 16-bit scanner'']]
+[[Image:Graph channels.png]]
 ----
-====Analysis wish list====
+= Analysis wish list =
 * Ideally would like unambiguous interpretation of results
 * Large amounts of data to analyse can be overwhelming and make interpretation subjective
 * Independent reproducibility of results by another collegue
-*<font color="blue">''Keep a record (''log'') of what was done'' </font>
+:<font color="blue">&rarr;''Keep a record (''log file'') of what was done'' </font>
 ----
-====Analysis aim====
+= Analysis aim =
-* Easier to rank genes in order of evidence of differential expression than it is to select a specific cutoff
+* Obtain a list of genes which we think are differentially expressing
+       Block Row Column            ID   Name         M        A         t      P.Value        B
+    20  15     23 [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?submit=y&db=Nucleotide&term=CN882776 171121_390_49] 171121  5.035364 13.25087  49.62425 3.220044e-05 11.27486
+      9  13      9  20264_118_53  20264  4.396719 11.11976  47.06004 3.220044e-05 11.05671
+    32  21     22 165415_634_53 165415  4.645384 12.65872  43.40359 3.220044e-05 10.70650
+    31  10      9 185903_436_49 185903  5.146504 11.36911  42.75724 3.220044e-05 10.63926
+     13   7     22 197386_457_55 197386  4.621024 13.20426  42.09902 3.220044e-05 10.56899
+     11   8     20 142178_355_53 142178  4.795734 12.07427  41.23346 3.220044e-05 10.47374
+     16  20     23   251706_1_53 251706 -5.003475 13.04571 -38.61325 3.220044e-05 10.16421
+      9   5      6 297409_340_47 297409  4.421922 12.27208  38.52215 3.220044e-05 10.15284
+    24  14     13 163360_396_47 163360  4.367943 11.10478  38.21662 3.220044e-05 10.11439
+    29  10      5 149243_674_53 149243  4.372419 11.36572  37.86362 3.220044e-05 10.06935
+* <font color="blue">Easier to rank genes in order of evidence of differential expression than it is to select a specific cutoff</font>
 *If we do select a cutoff, False Discovery Rate (FDR) cutoff is usually used
-**FDR threhold is the expected proportion of genes in a list that are likely to be incorrect
+:<font color="blue">''FDR threhold is the expected proportion of genes in a list that are likely to be incorrect''</font>
-TODO: Picie of  a gene list
 ----
-[[Category:Sven/Rosaceae]]
+[[Category:Microarray]]

Difference between revisions of "Introduction to Microarray analysis"

Latest revision as of 21:53, 11 November 2007

Overview of experimental process

Statistical analysis process

Statistical issues

Analysis wish list

Analysis aim

Navigation menu

Views

Personal tools

Navigation

Search

Navigation

Blogs

Site map

Tools