48x36 Poster Template - Human Microbiome Project

Heather Creasy1, Catherine Jordan1, Mark Mazaitis1, Noam Davidovicz1, Michelle Giglio1, Joshua Orvis1, Ken Chu2, Konstantinos Liolios3, Amy Chen2, Victor Felix1, Nikos
Kyrpides3, Victor Markowitz2, Anup Mahurkar1, Jennifer Wortman1, Owen White1
Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD; 2Lawrence Berkeley National Laboratory, Berkeley, CA; 3Joint Genome Institute, Walnut Creek, CA


Data Organization & Display

Tools Organization & Display
We currently offer a BLAST server for searching against all or subsets of
annotated Reference Genomes. Over time, we will expand this to include
additional datasets

The HMP Data Analysis and Coordination Center (DACC) plays the role of collecting, integrating & standardizing
different data types from diverse sources and presenting this data back to the community in a clear & useful
manner via our web resource at www.hmpdacc.org. The DACC website is constantly evolving with regard to data
presentation and functionality. The long term value of the HMP DACC website is in facilitating access to data and
tools, making it easy for the user to (1) download HMP data, (2) combine data with a tool, and (3) perform
analyses; building a community of researchers; and providing unique and novel content to all users, both HMP
consortium members & members of the larger scientific community.

As new tools, resources &
protocols become available as
part of the HMP, they will be
included here. Future features
will include searchability &
sortability to generate
customized tool sets with

Users will soon see a
new look to the
Reference Genome
Project Catalog, with
improved searchability
and graphing. Next, we
will enable users to
select strain sets of
interest and link directly
to the corresponding
data set for download

In order to accomplish this, we have elected to take a three phase approach:
Phase 1: address usability issues of the site
Phase 2: address the needs of the research community by highlighting data as it becomes available and
facilitating access to data, tools, resources & protocols in use by the HMP consortium
Phase 3: enable the community to perform analyses on their own, by demonstrating and documenting pipelines
and intermediate data sets that can be used to replicate or expand existing research questions
We have completed Phase 1 and are now transitioning to Phase 2. Here we present concepts and mocks ups
demonstrating our goals to continue improvements to usability, increase the level of interactivity, and add data
sets, tools and protocols as they become available. Our approach emphasizes working with metagenomic users
& focus groups to identify their data needs and how they expect to interact with the site. To that end, we are
actively soliciting feedback via a user survey available throughout this conference.

The DACC hosts an SVN
repository for use by all HMP
participants, for convenient
maintenance of source code
& documentation

Data Analysis

A Data-centric Home Page

The Analysis page has 3 focus areas:

Browse summarized data for projects and/
or data types of interest, e.g.
+ All DACC HMP Data
+ Reference Genomes
- 16S rRNA Sequence
- Production Phase I Sequence
+ By Center
+ By Body Site
+ Metagenomic wgs Sequence
+ Demonstration Projects

Query available data to compile a customized
dataset of interest, e.g.
Show all human screened Metagenomic wgs reads
and 16S rRNA reads from anterior nares

NIH Program
We will expand upon the current data browsing capabilities by
providing comprehensive data organization. New features will include
the ability to query to create customized data sets for download.

Project Catalog
Tools & Protocols
FTP Data Site
NCBI HMP Home Page

High Priority Links

See Data Analysis Panel
For HMP Collaborator Use:
Access to resources used for
uploading data sets to the DACC,
e.g. FTP upload, DACC LBL

Datasets acquired above can be downloaded
for personal use or integration into
workbenches (See Analysis Panel)
Project Statistics
Sampling Statistics

Data Submission
Assembly Statistics
Annotation Statistics

Easy access to the Workbench, Tools & Protocols, and Analyses pages
(described elsewhere on this poster) will facilitate additional analysis.

Taking the customized tool set concept (described above) one
step further, the workbench will allow users to create their own
pipelines by stringing together available tools, for use on
datasets downloaded from the DACC website or on their own
data. (While we hope to introduce this concept in Phase 2, this is
largely a Phase 3 improvement)

Analyses download datasets &
analysis results coming out of HMP
publications, as well as pipelines to
reproduce these analyses, where
Case Studies - by working with users of
genomic & metagenomic data, we will
identify common needs and assemble
pipelines specific to those needs to allow
fast access to frequently used tools &
analysis sets
Analysis tracks (Phase 3) - structured,
interactive guides with highly detailed
documentation will walk users through
common analysis pipelines relevant to
specific research goals.
Integration of HMP data sets, tools, case
studies and analysis tracks will result in a
smooth, user friendly interface for
performing complex analyses.

Site Usage
Phase 2 of our approach to the DACC web redesign emphasizes working with user groups to identify their
data needs and how they expect to interact with the site. Please help us by completing our online user
experience survey available throughout the conference. The survey can be found at
www.surveymonkey.com/s/hmpdacc_phase1 through Wednesday, September 8.
In addition, we are looking at various site usage metrics, some of which are displayed to the right, to help
identify areas of improvement. By improving UI navigation and site organization, and focusing on user needs
& expectations, we hope to decrease our bounce rate (a measure of visit quality) dramatically over the next
year. We recognize that the majority of users currently reach our site from a referring source. However 21%
of site visits arrive via search engines. Looking at the most commonly used keywords bringing users to our
site is another means of identifying areas of interest to the public, allowing us to address them appropriately.
Other metrics reflecting the value of the site, such as the number of inbound links, will also be assessed.

Number of site visitors by month over the past year

Google Analytics metrics measured from July 2009 to August 2010
Total number of visits: 26,232
New visitors: 12,435
Non-US Visits: 8.113

Average time spent on site: 00:03:25
Average number of pages viewed per visit:
Bounce rate: 48.61%

Funded by NIH Common Fund

