BioConductor/R framework

From Organic Design wiki
Revision as of 04:20, 16 March 2006 by Sven (talk | contribs) (Accessing BioConductor)
What is BioConductor? (http://www.bioconductor.org)
  • BioConductor is an open source development software project
  • Provides tools for analysis and comprehension of genomic data
  • Extensively for Affymetrix and cDNA microarray technologies
  • The project started in the autumn of 2001
  • Includes 23 core collaborating developers
  • Project Growth
v1.0: May 2nd, 2002, 15 packages
v1.1: November 18th, 2002, 20 packages.
v1.2: May 28th, 2003, 30 packages.
v1.3 Oct 29th 2003, 49 Packages
v1.4 May 18 2004, 82 Packages
v1.5 Nov 1st, 2004, 98 Packages

BioConductor Goals

The broad goals of the project are:

  • To enable sound and powerful statistical analyses in genomics
  • To provide a computing platform that allows the rapid design and deployment of high-quality software
  • To develop a computing environment for both biologists and statisticians
  • Promote high-quality dynamic documentation and reproducible research
  • Using LATEX, the Sweave system and tcl/tk to deliver interactive step by step pdf tutorials, e.g.
library(tkWidgets)
vExplorer()

File:Sweave2.png


Object oriented class method design (OOP)
  • Organized approach to handling large amounts of experimental data
  • Class structure encapsulates the data required for microarray analysis → object
  • Allows efficient representation and manipulation (including subsetting) of data from many microarray slides in an experiment
  • A method is a function that performs an action on data (objects) throughout analysis

Advantages
  • Newest cutting edge statistical methods available
  • Modern programming language
  • Powerful graphical tools available
  • Its freely available

Disadvantages
  • Steep learning curve
  • Need to have experience programming in the R programming environment (http://www.r-project.org)
  • Like all software there are bugs

Accessing BioConductor
  • BioConductor tools are accessed using the R programming language
  • R is a programming environment for statistical computing and graphics
  • Initially written by Robert Gentleman and Ross Ihaka (Auckland University)
  • Download R from a comprehensive R archive network (CRAN) mirror (http://cran.stat.auckland.ac.nz)
  • Install R (available for Unix, Windows, and Mac OS X)
  • R version 2.2.1 has been released on 2005-12-20
  • R is the environment used to design and distribute software:
    • Locally downloaded files
    • Via the internet e.g. commands in R:
Sys.putenv("http_proxy"="http://proxy.hort.net.nz:8080")   # Setting proxy variable
source("http://www/bioconductor.org/getBioC.R")            # Downloading installation script
getBioC()                                                  # Running script

Installing vs loading packages
  • Packages only need to be installed once onto a computer
  • Packages must be loaded with each new R session
  • The R function library is used to load packages e.g. command in R:
library(limma)     #Installs the limma package 

Documentation and Help
  • R manuals and tutorials are available from the R website or on-line in an R session
  • R on-line help system, detailed high quality on-line documentation, available in text, HTML, PDF, and LATEX formats.
  • Frequently asked questions BioConductor / R
  • Email news lists for BioConductor / R
Note: Read the posting guides before submitting questions

TODO

  • BioConductor resources/vignettes(including downloading)
  • BioConductor basics (any resources for limma out there?)
  • usingR-2.pdf Chapter 1 → Starting up.
  • MGEDI → installing R/Bioconductor p30 - p34 (documentation/help)