A Brief History of Data Sharing in the U.S. LTER Network
A BRIEF HISTORY OF DATA SHARING IN THE U.S. LTER NETWORK John Porter Science in a number of disciplines are recognizing that our ability to manage and assimilate massive quantities of data are a key to understanding of our world. In September 2009 a special issue of NATURE addressed data sharing. Some quotes from the leadoff editorial: More and more often these days, a research projects success is measured not just by the publications it produces, but also by the data it makes available to the
wider community. universities and individual disciplines need to undertake a vigorous programme of education and outreach about data Sharing Data is Needed: To address complex, large scale and long-term environmental challenges Global and Regional studies require data that are often beyond the ability of a single researcher to collect Replication is a fundamental part of science Data used to parameterize models needs to be available Data Sharing
Improves data quality fresh eyes detect problems that went previously unnoticed If you doubt this, consider the changes made in a draft of a manuscript as it is viewed by reviewers and editors Enables New Science Makes possible comparisons between systems Enhances regional, global scale and long-term science Multiple investigators, who may be working independently Scientific Use of Data The traditional model of using data Scientific Use of Data A new model, incorporating sharing and
archiving Scientific Use of Data Archiving and sharing data provides new opportunities for better understanding our environment Sharing Data We may all agree that sharing data is a good thing and advances the cause of science But why is sharing of data so rare? What can we do to increase data sharing? The U.S. LTER Network has been sharing data since 1994 and currently shares more than 6,800 datasets.
The experience there may provide some helpful insights. LNO AND H.J. Andrews Experimental KBS Kellogg Biological Station LTER, ARC Arctic Tundra LTER, Alaska KNZ Konza Prairie LTER, Kansas BES Baltimore Ecosystem Study LUQ Luquillo Experimental Forest BNZ Bonanza Creek Experimental
MCM McMurdo Dry Valleys LTER, CAP Central Arizona-Phoenix LTER, MCR Moorea Coral Reef LTER, CCE California Current Ecosystem NWT Niwot Ridge LTER, Colorado Forest LTER, Oregon Michigan LTER, Puerto Rico LTER, Maryland
Antarctica Forest LTER, Alaska French Polynesia Arizona LTER, California NTL North Temperate Lakes LTER, Wisconsin CDR Cedar Creek Natural History Area LTER, Minnesota PAL Palmer Station LTER, Antarctica
CWT Coweeta LTER, North Carolina PIE Plum Island Ecosystem LTER, Massachusetts FCE Florida Coastal Everglades LTER, Florida SBC Santa Barbara Coastal Ecosystem LTER, California GCE Georgia Coastal Ecosystem LTER, Georgia SEV Sevilleta LTER, New Mexico HBR Hubbard Brook LTER, New Hampshire
SGS Shortgrass Steppe LTER, Colorado HFR Harvard Forest LTER, Massachusetts VCR Virginia Coast Reserve LTER, Virginia JRN Jornada Basin LTER, New Mexico LNO LTER Network Office, University of New Mexico, Albuquerque, NM LTER Timeline and Funding Sources 2010 MCR
CCE GEO-OCE SBC GCE FCE PIE BES CAP MCM SBE, EHR Polar PAL HFR
SEV LUQ VCR DEB KBS HBR BNZ ARC JRN CDR X ILL X OKE X X
SGS NIN NWT KNZ CWT AND NTL 1980 1985 1990 1995
2000 2005 2010 LTER and Data At its founding in 1980 LTER was almost unique in that NSF required sites to include management of data in proposals Reason: Long-term studies and experiments require data to be managed, otherwise you lose old data as fast as you gain new data Analysis of a 20-year experiment requires data from year 1 as well as year 20
LTERs First Decade 1980-1989 LTER did substantial work on developing best practices for managing data at the level of the individual LTER site This was a the dawn of the microcomputer/PC era Merging practices from mainframe computing with emerging technologies 1986 Research Data Management volume published Focus was almost entirely on the site Little sharing of information on what data was being archived between sites No formal mechanisms for sharing data 1989 LTER Network Office (LNO) established 2nd Decade 1990-2000 1990 - an important year!
First LTER-wide Data Catalog 10 datasets per site were listed First Network Guidelines for Site Data Access Policies Described elements that should be included in individual site policies 1990 Guidelines for Site Data Management Policies General Guidelines - The management policy should include provisions that assure: The timely availability of data to the scientific community; That researchers and LTER sites contributing data to LTER databases receive adequate acknowledgement for the use of their data by other researchers and that sites receive copies of any publication using that data; That documentation and transformation of data are adequate to permit data
to be used by researchers not involved in its original collection; That data must continue to be available even if an investigator leaves the project through transfer or death; That standards of quality assurance and quality control are adhered to; That long-term archival storage of data is maintained; That researchers have an obligation both to contribute data collected with LTER funding to the LTER site database and to publish the data in the open literature in a timely fashion; That costs of making data available should be recovered directly or by reciprocal sharing and collaborative research; That LTER data sets not be resold or distributed by the recipient; and Example Policy (1990) Data Type I. Published data and metadata (i.e., data about data). Policy: Data are available upon request without review. Data Type II. Collective data of the LTER site (usually
routine measurements generated by technical staff). Policy: Data are available for specific scientific purposes one year after generation. Data Type III. Original measurements by individual researchers. Policy: Data are available for specific scientific purposes two years after generation. Data can be released earlier with permission of the researcher. Data Type IV. Unusual long-term data collected by individual researchers. Why Guidelines for Site Policies? Why not just adopt a uniform policy? 1. We had no example policies to work from, so guidelines let us test a wide variety of options
2. Most researchers were not yet comfortable with sharing data - site policies could be crafted to address the specific concerns of researchers at the sites By 1994 most sites had published data policies that could then be compared to discern best practices 1992 First easy-to-use Internet downloading tools - Gopher Demonstration of the power of structured metadata Start of work on developing a content standard for exchange of metadata between sites Looked for common elements in existing site metadata
This effort paved the way for development of Ecological Metadata Language a decade later 1994 With the release of the first web browser in 1993, the World-Wide-Web became practical With substantial input from NSF, the LTER Coordinating Committee mandated that each site should make at least one dataset available online Demonstration of feasibility In fact, most LTER sites put more than one dataset online, often all their datasets Competition developed between sites over who had the best data online Rapid Growth of LTER Data
LTE R N etw ork O nline D ata 7000 6036 No. of Data sets 6000 5000 4000 2619 3000 2000 1000 0
6 156 1980 1985 1990 450 0 1995 Y ear
2000 2002 1997 Michener et al. paper on N0n-Geospatial Metadata published Set initial content standards for ecological metadata that were used to create Ecological Metadata Language LTER Network formerly adopts a network-wide standard for data sharing Data can be held back for 2 years Exceptions must be rare, justified and documented Data access policy for the LTER Network
1997 There are two types of data: Type I (data that are freely available within 23 years) with minimum restrictions, and Type II (Exceptional data sets that are available only with written permission from the PI/investigator(s)). Implied in this timetable, is the assumption that some data sets require more effort to get online and that no "blanket policy" is going to cover all data sets at all sites. However, each site would pursue getting all of their data online in the most expedient fashion possible. 2) The number of data sets that are assigned TYPE II status should be rare in occurrence and that the justification for exceptions must be well documented and approved by the lead PI and site data manager. Some examples of Type II data may include: locations of rare or endangered species, data that are covered by copyright laws (e.g. TM and/or SPOT satellite data) or some types of census data involving human subjects.
Addition of Data to LTER Goals 2001 In January 2001 a meeting of LTER Lead Investigators was convened to revise the goals for the LTER Network. Only one completely new goal was added: Information: To inform the LTER and broader scientific community by creating well-designed and welldocumented databases. Thus in little more than a decade the U.S. LTER went from not sharing data to having data sharing as one of its Lessons Learned Research communities need to own their data policies Difficult to do if policies are imposed from without
Incentives and Provisions must make sense to the community involved Experience with data sharing generally makes people more willing to share Myths get dispelled Myths About Sharing Data If I share my data, there are lots of people who will steal it by creating publications with it and not acknowledging my contribution Not true: Data sharing policies dictate that users must acknowledge or cite data By having your data in an archive you establish clear priority no one else can make a credible claim that they
collected the data, not you 2006 Survey A survey of LTER information managers sought to identify problems that had occurred due to data sharing In aggregate, those who responded reported on the results of 31,789 data set downloads and identified a grand total of four instances where problems occurred: 1. where a litigator requested unpublished data for courtroom use, 2. where a data requestor lied about their identity (circumstantial indications are that it was a K12 student), 3. different researchers downloaded the same data to work on similar papers without knowing that the other was doing so, and finally 4. where a researcher disagreed with a subsequent Interpretation of their data. Taken together these problems occurred in <0.1% of the requests.
Myths About Sharing Data Other researchers may analyze or interpret my data in different ways that contradict my conclusions True: Honest disagreements are inevitable Such disagreements are a critical part of the scientific process and have often led to important new understandings Withholding data just makes you look as if you have something to hide Journals are increasingly requiring that data used for publications be archived Myths About Sharing Data So many researchers will download my data that Ill be asked to spend my valuable time answering their
questions Usually False: Only a few, incredibly valuable datasets are used frequently You should be more worried that no-one will think your data is worth downloading Often users are the subsequent graduate students of the professor who initiated the data collection Good quality metadata means that people wont be bothering you Some researchers may contact you about collaboration or possible co-authorship Improving Incentives for Sharing Data For Scientists the following incentives may exist for sharing data Money
Increased likelihood of grant funding (common) US National Science Foundation now requires data management plans for all proposals Direct payments for data (rare) Scientific Credit Often data sharing leads to co-authorship on papers Citations of datasets (increasingly common) Acknowledgments Posterity Valuable, Well-documented data will long outlive their creator Slide from James Brunt Increasing value of data over time
Serendipitous Discovery Data Value Inter-site Synthesis Gradual Increase In Data Equity Methodological Flaws, Instrumentation Obsolescence Non-scientific Monitoring Time Final Thoughts
Developing a culture of data sharing takes time, but when the culture starts to shift, it can move incredibly fast Sharing still requires time and effort, so incentives for sharing need to be as strong as possible
D. Abraham did not fully obey the call. 1. He did not leave his kindred but took his father and his nephew. 2. He did not immediately go into the land of Canaan but delayed five years in Haran (Acts...
The percent of students "placed" into a remedial English and/or mathematics course in summer, fall or spring immediately following high school graduation based on the institutions' placement protocols. ... What can school districts do to support affordability.
2518 Artists', students' or signboard painters' colours, modifying tints, amusement colours and the like, in tablets, tubes, jars, bottles, pans or in similar forms or packings. 3213 Ceramic building bricks, flooring blocks, support or filler tiles and the like. 6904...
Training depends on what kinds of animals you intend to work with; internships at zoos or veterinarian offices help. For education, you must receive a degree in agriculture, biology, veterinarian science, or zoology. You must also pursue a master's degree/doctorate...
Procrastination. Not defined by the behavior you are doing at the time, but rather by the purpose of that behavior. Not surprising to hear that students who report higher levels of procrastination, tend to have lower grades
Criticism of their poetry is often one-dimensional and ignores the interior conflict, and I wanted to offer a new, poetry-based reading. College of Arts and Letters ... by examining the two extremes of Dante's spiritual journey — the City of...
NATS 101 Section 13: Lecture 5 Radiation Radiation: Third mode of heat transfer Radiation and photons Planck Function Wien's Displacement Law Total Radiant Energy per unit area: Stefan-Boltzmann Law Total Radiant Energy per unit area: Stefan-Boltzmann Law Solar and Terrestrial...
Hearing is the reception of an air sound wave that is converted to a fluid wave that ultimately stimulates mechanosensitive cochlear hair cells that send impulses to the brain for interpretation. ... Human Anatomy and Physiology, 10e Subject: Science
Ready to download the document? Go ahead and hit continue!