Rosaceae
From Organic Design wiki
Microarray analysis workshop
Time schedule: 8:30 - 10:30am, 11-12:30am (3.5 hours)
Workflow
- Introduction to Microarray analysis (15 minutes)
- Normalization (15 minutes)
- BioConductor/R framework (15 minutes)
- R tutorial (60 minutes)
- Linear models for Microarray analysis (15 minutes)
- BioConductor analysis tutorial ("60 minutes")
{{#security:edit|Sven}} {{#security:*|Sven}}
Overview of experimental process
Statistical analysis process
Statistical issues
Analysis wish list
Analysis aim
Block Row Column ID Name M A t P.Value B 10396 20 15 23 171121_390_49 171121 5.035364 13.25087 49.62425 3.220044e-05 11.27486 4517 9 13 9 20264_118_53 20264 4.396719 11.11976 47.06004 3.220044e-05 11.05671 16881 32 21 22 165415_634_53 165415 4.645384 12.65872 43.40359 3.220044e-05 10.70650 16086 31 10 9 185903_436_49 185903 5.146504 11.36911 42.75724 3.220044e-05 10.63926 6508 13 7 22 197386_457_55 197386 4.621024 13.20426 42.09902 3.220044e-05 10.56899 5471 11 8 20 142178_355_53 142178 4.795734 12.07427 41.23346 3.220044e-05 10.47374 8395 16 20 23 251706_1_53 251706 -5.003475 13.04571 -38.61325 3.220044e-05 10.16421 4330 9 5 6 297409_340_47 297409 4.421922 12.27208 38.52215 3.220044e-05 10.15284 12479 24 14 13 163360_396_47 163360 4.367943 11.10478 38.21662 3.220044e-05 10.11439 15024 29 10 5 149243_674_53 149243 4.372419 11.36572 37.86362 3.220044e-05 10.06935
|
What is BioConductor? (http://www.bioconductor.org)
BioConductor GoalsThe broad goals of the project are:
library(tkWidgets) vExplorer() Object oriented class method design (OOP)
Advantages
Disadvantages
Accessing BioConductor
Sys.putenv("http_proxy"="http://proxy.hort.net.nz:8080") # Setting proxy variable source("http://www/bioconductor.org/getBioC.R") # Downloading installation script getBioC() # Running script Installing vs loading packages
library(limma) #Installs the limma package Documentation and Help
|
Resources
Obtaining help in Rhelp.start() # Browser based help documentation help() # Help on a topic (note: help pages have a set format) ? ls # alternative help method on ls function apropos(mean) # Find Objects by (Partial) Name example(mean) # Run an Examples Section from the Online Help demo() # Demonstrations of R Functionality demo(graphics) # Demonstration or graphics Functionality RSiteSearch() # Searches web newslist archives and retrieves results using http
Useful commands in the R environmentsearch() # Give Search Path for R Objects searchpaths() # Give Full Search Path for R Objects ls() # List objects objects() # alternate function to list objects data() # Publically available datasets rm() # Remove Objects from a Specified Environment save.image() # Save R Objects q() # Terminate an R Session → prompted to Save workspace image? [y/n/c]: Command prompt
> x <- 1:10 # assignment of 1 to 10 to an object called 'x' > x # Returning the x object to the screen [1] 1 2 3 4 5 6 7 8 9 10
> x <- 1: # partial command → parser is expecting more information + 10 > x [1] 1 2 3 4 5 6 7 8 9 10
Basic (atomic) data types
T # TRUE F # FALSE
3.141592654 # Any number [0-9\.]
"Putative ATPase" # Any character [A-Za-z] must be single or double quoted
NA # Label for missing information in datasets
Assignment of objects
x <- 42 # Assignment to the left x x = 42 # Equivalent assignment (not recommended) x 42 -> x # Assignment to the right x Saving objectsgetwd() # Returns the current directory where R is running setwd("C:/DATA/Microarray") # Set the working directory to another location getwd() # Check the directory has changed x <- 42 save.image() # Saves a snapshot of objects to file .RData y <- x * 2 # Make a new object called 'y' y # Return value of 'y' q() # quit R Restart R by double clicking on the file .RData in C:/DATA/Microarray x # Returns 'x' as it was saved to .RData in "C:/DATA/Microarray" y # 'y' should not exist Object data types
a <- 3.14 # Assign pythagorus to object 'a' length(a) # The scalar is actually a vector of length 1 pi # Already have a built in object for pythagorus search() # Print the search path for all objects find("pi") # "pi" is located in package:base
x <- c(2,3,5,2,7,1) # Numbers put into a vector using 'c' function concatenate x y <- c(10,15,12) y names(y) <- c("first","second","third") # Elements can be given names z <- c(y,x) z
zmat <- cbind(x,y) # cbind joins vectors together by column zmat
mat <- matrix(1:20, nrow=5, ncol=4) # Constructing a matrix mat colnames(mat) <- c("Col1","Col2", "Col3", "Col4") # Adding column names mat
mylist <- list(1:4,c("a","b","c"), LETTERS[1:10]) mylist mylist <- list("element 1" = 1:4,"second vector" = c("a","b","c"), "Capitals" = LETTERS[1:10]) mylist Indexing
x[c(1,2,3)] # Selecting the first three elements of 'x' x[1:3] # Same subset using ':' sequence generation → see help(":") y[2] # Selecting the second element of 'y' y["second"] # Selecting the second element of 'y' (by name)
mat[,1:2] # Selecting the first two columns of 'mat' mat[1:2, 2:4] # Selecting a subset matrix of 'mat'
mylist[[1]] # Subsetting list 'mylist' by index mylist[["element 1"]] # Subsetting list 'mylist' by name 'element 1' mylist$"element 1" # Alternate way of subsetting mylist$Capitals[1:5] # Selecting the first five elements of 'Capitals' in 'mylist' (case sensitive) Plotting data
help(plot) help(par) example(plot) par(ask=TRUE) # Set the printing device to prompt user before displaying next graph example(hist) Reading / writing files
help(scan) help(read.table)
dataDir <- "C:/DATA/Microarray/GPR") mydata <- scan(file.path(dataDir, "BE34.gpr"), what="", nlines=29) # Get first 29 rows of data mydata
colClasses <- rep("NULL", 82) colClasses[c(1:5, 9,12, 18, 21)] <- NA # Set colClasses to ignore unwanted columns mydata <- read.table(file.path(dataDir,"BE34.gpr"), header=T, sep="\t", nrows=20, skip=31, colClasses=colClasses) # Get first 20 lines of data after 31st row mydata
help(write) help(write.table)
User defined functions
myfun <- function( arglist ){ body }
myfun <- function(x){x} # Creating identity function myfun("foo") # Running the function myfun() # Fails: no input arguement provided
square <- function(x){x * x} # Square the input number square(10) # Returns 10 squared square(1:4) # Underlying arithmetic is vectorized
"biasVar" <- function(df1=4, df2=15, N = 100, seed=1) { set.seed(seed) # 1) Data setup ylim <- c(-2,2) xlim <- c(-3,3) par(mfrow=c(2,2), mar=c(5,4,4-2,2)+0.1,mgp=c(2,.5,0) ) x <- rnorm(80, 0, 1) y <- sin(x) + rnorm(80, 0, 1/9) xno <- 500 sim <- matrix(NA, nc=N, nr=xno) xseq <- seq(min(x),max(x), length=xno) plot(x, y, main=paste("df=",df1,sep=""), xlim=xlim, ylim=ylim) # Using Splines truex <- seq(min(x), max(x), length = 80) lines(truex, sin(truex), lty = 5) splineobj <- smooth.spline(x, y, df = df1) lines(splineobj, lty = 1) plot(x, y, main=paste("df=",df2,sep=""), xlim=xlim, ylim=ylim) # Using Splines truex <- seq(min(x), max(x), length = 80) lines(truex, sin(truex), lty = 5) splineobj <- smooth.spline(x, y, df = df2) lines(splineobj, lty = 1) plot(x, y, main=paste("Bias-Variance tradeoff, df=",df1, sep=""), type="n", xlim=xlim, ylim=ylim) for(i in seq(N)) { x <- rnorm(80, 0, 1) y <- sin(x) + rnorm(80, 0, 1/9) splineobj <- smooth.spline(x, y, df = df1) sim[,i] <- predict(splineobj,xseq)$y } ci <- qt(0.975, N) * sqrt(apply(sim,1, var)) bias <- apply(sim,1, mean) rect(xseq,bias-ci,xseq,bias+ci, border="grey") rect(xseq,sin(xseq),xseq,bias, border="black") lines(truex, sin(truex)) plot(x, y, main=paste("Bias-Variance tradeoff, df=",df2,sep=""), type="n", xlim=xlim, ylim=ylim) for(i in seq(N)) { x <- rnorm(80, 0, 1) y <- sin(x) + rnorm(80, 0, 1/9) splineobj <- smooth.spline(x, y, df = df2) sim[,i] <- predict(splineobj,xseq)$y } ci <- qt(0.975,N) * sqrt(apply(sim,1, var)) bias <- apply(sim,1, mean) rect(xseq,bias-ci,xseq,bias+ci, border="grey") rect(xseq,sin(xseq),xseq,bias, border="black") lines(truex, sin(truex)) }
biasVar() # Generates data from a sine curve looking at bias variance tradeoff biasVar(df1=2, df2=30) # Let's change the smoothing parameters in the function arguements Quiting Rrm(list=ls()) # Cleaning up: Remove Objects from a Specified Environment q() Links |
Overview of Limma package for R
Origin
Statistical approach
Object orientated programming environment
Advantages using Limma
Limitations
Microarray workshop experiment
FileName SlideNumber Cy3 Cy5 Design BE34.gpr 34 Leaf Fruit -1 BE35.gpr 35 Fruit Leaf 1 BE36.gpr 36 Fruit Leaf 1 BE37.gpr 37 Leaf Fruit -1
|
{{#security:edit|Sven}} {{#security:*|Sven,Tam,Andy}} Analysis of dyeswap experiment
Block Row Column ID Name M A t P.Value B 10396 20 15 23 171121_390_49 171121 5.035364 13.25087 49.62425 3.220044e-05 11.27486 4517 9 13 9 20264_118_53 20264 4.396719 11.11976 47.06004 3.220044e-05 11.05671 16881 32 21 22 165415_634_53 165415 4.645384 12.65872 43.40359 3.220044e-05 10.70650 16086 31 10 9 185903_436_49 185903 5.146504 11.36911 42.75724 3.220044e-05 10.63926 6508 13 7 22 197386_457_55 197386 4.621024 13.20426 42.09902 3.220044e-05 10.56899 5471 11 8 20 142178_355_53 142178 4.795734 12.07427 41.23346 3.220044e-05 10.47374 8395 16 20 23 251706_1_53 251706 -5.003475 13.04571 -38.61325 3.220044e-05 10.16421 4330 9 5 6 297409_340_47 297409 4.421922 12.27208 38.52215 3.220044e-05 10.15284 12479 24 14 13 163360_396_47 163360 4.367943 11.10478 38.21662 3.220044e-05 10.11439 15024 29 10 5 149243_674_53 149243 4.372419 11.36572 37.86362 3.220044e-05 10.06935 See also |