Overview of experimental process

File:Expt.png

(Courtesy Mik Black)

Competitive hybridization to spotted oligo/cDNA transcripts
Interested in genes that change between treatment conditions

→ differential expression versus equivalent expression

Statistical analysis process

File:Process.png

Raw data (GPR file format)

http://www.moleculardevices.com/pages/software/gn_gpr_format_history.html

Each GPR intensity file is typically >8 megabytes
Each TIFF image file is typically >30 megabytes
A microarray experiment consists of several → many slides

Statistical issues

In the past statistics was developed for n >>p

n observations, p variables

Gene expression data n<<p

Thousands of measured genes (p)

Small number of biological replicate slides (n)

Gene expression data can be highly correlated

groups of genes are regulated in the same way

Data not normally distributed

log transform highly skewed intensity data

File:Graph channels.png

Analysis wish list

Ideally would like unambiguous interpretation of results
Large amounts of data to analyse can be overwhelming and make interpretation subjective
Independent reproducibility of results by another collegue

→Keep a record (log file) of what was done

Analysis aim

Obtain a list of genes which we think are differentially expressing

      Block Row Column            ID   Name         M        A         t      P.Value        B
10396    20  15     23 171121_390_49 171121  5.035364 13.25087  49.62425 3.220044e-05 11.27486
4517      9  13      9  20264_118_53  20264  4.396719 11.11976  47.06004 3.220044e-05 11.05671
16881    32  21     22 165415_634_53 165415  4.645384 12.65872  43.40359 3.220044e-05 10.70650
16086    31  10      9 185903_436_49 185903  5.146504 11.36911  42.75724 3.220044e-05 10.63926
6508     13   7     22 197386_457_55 197386  4.621024 13.20426  42.09902 3.220044e-05 10.56899
5471     11   8     20 142178_355_53 142178  4.795734 12.07427  41.23346 3.220044e-05 10.47374
8395     16  20     23   251706_1_53 251706 -5.003475 13.04571 -38.61325 3.220044e-05 10.16421
4330      9   5      6 297409_340_47 297409  4.421922 12.27208  38.52215 3.220044e-05 10.15284
12479    24  14     13 163360_396_47 163360  4.367943 11.10478  38.21662 3.220044e-05 10.11439
15024    29  10      5 149243_674_53 149243  4.372419 11.36572  37.86362 3.220044e-05 10.06935

Easier to rank genes in order of evidence of differential expression than it is to select a specific cutoff

If we do select a cutoff, False Discovery Rate (FDR) cutoff is usually used

FDR threhold is the expected proportion of genes in a list that are likely to be incorrect

Introduction to Microarray analysis

Overview of experimental process

Statistical analysis process

Statistical issues

Analysis wish list

Analysis aim

Navigation menu

Views

Personal tools

Navigation

Search

Navigation

Blogs

Site map

Tools