Using Bioconductor for Annotation

Bioconductor has extensive facilities for mapping between microarray probe, gene, pathway, gene ontology, homology and other annotations.

Bioconductor has built-in representations of GO, KEGG, vendor, and other annotations, and can easily access NCBI, Biomart, UCSC, and other sources.

Sample Workflow

The following psuedo-code illustrates a typical R / Bioconductor session. It continues the differential expression workflow, taking a 'top table' of differentially expressed probesets and discovering the genes probed, and the Gene Ontology pathways to which they belong.

## Affymetrix U133 2.0 array IDs of interest; these might be
## obtained from
##
##   tbl <- topTable(efit, coef=2)
##   ids <- tbl[["ID"]]
##
## as part of a more extensive workflow.
> ids <- c("39730_at", "1635_at", "1674_at", "40504_at", "40202_at")

## load libraries as sources of annotation
> library("hgu95av2.db")

## map probe ids to ENTREZ gene ids...
> entrez <- hgu95av2ENTREZID[ids]
> toTable(entrez)
  probe_id gene_id
1  1635_at      25
2  1674_at    7525
3 39730_at      25
4 40202_at     687
5 40504_at    5445
## ... and to GENENAME
> genename <- hgu95av2GENENAME[ids]
## ... and merge results
> merge(toTable(entrez), toTable(genename))
  probe_id gene_id                                          gene_name
1  1635_at      25         c-abl oncogene 1, receptor tyrosine kinase
2  1674_at    7525 v-yes-1 Yamaguchi sarcoma viral oncogene homolog 1
3 39730_at      25         c-abl oncogene 1, receptor tyrosine kinase
4 40202_at     687                              Kruppel-like factor 9
5 40504_at    5445                                      paraoxonase 2

## find and extract the GO ids associated with the first id
> goIds <- mappedRkeys(hgu95av2GO[ids[1]])

## use GO.db to find the Terms associated with the goIds, displaying
## the head (first six entries) of the result as a data frame
> library("GO.db")
> head(as.data.frame(Term(goIds)))
                                                                     Term(goIds)
GO:0000115 regulation of transcription involved in S-phase of mitotic cell cycle
GO:0006298                                                       mismatch repair
GO:0006355                            regulation of transcription, DNA-dependent
GO:0006464                                          protein modification process
GO:0007155                                                         cell adhesion
GO:0007165                                                   signal transduction

[ Back to top ]

Installation and Use

Follow installation instructions to start using these packages. To install the annotations associated with the Affymetrix Human Genome U95 V 2.0, and with Gene Ontology, use

> source("http://bioconductor.org/biocLite.R")
> biocLite(c("hgu95av2.db", "GO.db"))

Package installation is required only once per R installation. View a full list of available software and annotation packages.

To use the AnnotationDbi and GO.db package, evaluate the commands

> library(AnnotationDbi")
> library("GO.db")

These commands are required once in each R session.

[ Back to top ]

Exploring Package Content

Packages have extensive help pages, and include vignettes highlighting common use cases. The help pages and vignettes are available from within R. After loading a package, use syntax like

> help(package="GO.db")
> ?GOTERM

to obtain an overview of help on the GO.db package, and the GOTERM mapping. The AnnotationDbi package is used by most .db packages. View the vignettes in the AnnotationDbi package with

> browseVignettes(package="AnnotationDbi")

To view vignettes (providing a more comprehensive introduction to package functionality) in the AnnotationDbi package. Use

> help.start()

To open a web page containing comprehensive help resources.

[ Back to top ]

Annotation Resources

The following guides the user through key annotation packages. Users interested in how to create custom chip packages should see the vignettes in AnnotationDbi package. There is additional information in the annotate package for how to use some of the extra tools provided. You can also refer to the complete list of annotation packages, optionally broken down by category.

Key Packages

* AnnotationDbi Almost all annotations require the AnnotationDbi package. This package will be automatically installed for you if you install another ".db" annotation package using biocLite(). It contains the code to allow annotation mapping objects to be made and manipulated as well as code to generate custom chip platforms. * Category This is the base level package for dealing with annotation questions that involve categorical data. * GOstats This builds on what is found in Category so that you can do hypergeometric testing using the Gene Ontology found in the GO.db package. * annotate This package contains many helpful tools for making use of annotations. * biomaRt This package is a great way to pull annotation data directly from web based annotation resources. Such data is extremely "current", so it is a good idea to save and locally manage the data that you pull down from biomaRt so that your code will be reproducible.

Types of Annotation Packages

[ Back to top ]

Fred Hutchinson Cancer Research Center