Data Management: From Field to Analysis Estimating Population Size using Small Mammal Trapping Data from the National Ecological Observatory Network What is data management? Organized, efficient data collection From the field to the computer
and beyond! Why do it? Good Data Support the Process of Science Data Informs Justifies Clarifies Question
or Observation Hypothesis Experiment Conclusion And leads to progress!
CC image by momboleum on Flickr CC image by Sharyn Morrow on Flickr Data under threat Changes in format Replication of Research & Changes in a project or staff Published methods: 1 paragraph Lab Protocols: 6 pages We measured the activities of PO and GLD in hemolymph from 40 and 448 larvae.
Larvae from both cohorts were inoculated orally with an LD50 of LdMNPV-7H5 occlusion bodies in 60% glycerol (165 OB/mL for 40 larvae and 4000 OB/mL for 448 larvae to obtain similar mortality levels). Two types of controls of each age group were prepared: one group was mock-inoculated with 60% glycerol only and the other group was untreated. Inoculations were performed with a syringe and a 30-gauge, blunt needle. The needle was inserted into the mouth and 1 mL of inoculum was delivered into the anterior midgut using a microinjector, as described above for intrahemocoelic inoculation. After inoculation, the larvae were kept in individual plastic cups and maintained at 25 8C with a 18:6 h (L:D) photoperiod. Hemolymph samples were collected at 36, 48, and 72 h postinoculation (hpi) from 8 to 10 larvae of each age from viral, mock, and uninoculated control treatments. These time points were chosen to reflect the time period in LdMNPV pathogenesis when infection is beginning to escape the midgut through the tracheal system and into the hemolymph (McNeil, 2008), and
thus is when humoral defenses are most likely to play a role in anti-viral responses. For all samples, 10 mL of hemolymph were diluted in 60 mL of Graces insect medium (Lonza, Walkersville, MD) in the well of a 96-well plate (Cellstar, Greiner Bio-one, Monroe, NC) and gently agitated for 3 min. To measure baseline PO activity, 10 mL of a hemolymph dilution was mixed with 10 mL of de-ionized water and 200 mL of 0.2 M L-3-(3,4-dihydroxyphenyl)alanine (L-DOPA, TCI America, Portland, OR) in 0.1 M phosphate buffer and the absorbance was read at 490 nm for 20 min using a Spectramax 250 spectrophotometer (GMI Inc., Ramsey, MN). Potential (total activatable) PO activity was measured by adding 10 mL of hemolymph dilution to 10 mL of 10% cetylpyridinium chloride (CPC, MP Biomedicals Inc., Solon, OH) (Hall et al., 1995) and 200 mL of 0.2 M L-DOPA; absorbance was measured at 490 nm as described above.
McNeil et al. 2010 How is it done? Field Collection Data Sheets Data File Raw Clean
Analyses Metadata Data Collection What data are collected? How will it be recorded? Safety: People, organisms, data Data Sheets WHAT is the content of the data? Data Files: Best Practices of Data
Columns (variables) & Rows (data) Single row of descriptive headers Avoid spaces or starting headers with #s Data disaggregation One cell per variable (e.g., toe length & tail length in separate columns) Each cell has one type of data Cell should only contain numbers or letters. Not 3 eggs -> Header: EggNumber , Data: 3 Plain text
Adapted from Borer, E.T., Seabloom, E.W., Jones, M.B., and Schildhauer, M. (2009). Some simple guidelines for data management Best Practices forPractices Data Organization Data Files: Best of Data Columns (variables) & Rows (data) Use standardized formats for date/time Date: YYYY-MM-DD (Year-Month-Day) Time: hh:mm:ss (use 24-hour time)
Adapted from Borer, E.T., Seabloom, E.W., Jones, M.B., and Schildhauer, M. (2009). Some simple guidelines for data management Best Practices forPractices Data Organization Data Files: Best of Data Columns (variables) & Rows (data) Use standardized formats for date/time Date: YYYY-MM-DD (Year-Month-Day) Time: hh:mm:ss (use 24-hour time)
Date & Time: YYYY-MM-DDThh:mm:ss Adapted from Borer, E.T., Seabloom, E.W., Jones, M.B., and Schildhauer, M. (2009). Some simple guidelines for data management Best Practices forPractices Data Organization Data Files: Best of Data Columns (variables) vs. Rows (data) Use standardized formats for date/time Use full taxonomic names
Genus and Genus species (Genus species names are italicized in writing but not in data tables in .csv format) Adapted from Borer, E.T., Seabloom, E.W., Jones, M.B., and Schildhauer, M. (2009). Some simple guidelines for data management Best Practices forPractices Data Organization Data Files: Best of Data
Columns (variables) vs. Rows (data) Use standardized formats for date/time Use full taxonomic names Retain raw data, separate clean files for analysis Adapted from Borer, E.T., Seabloom, E.W., Jones, M.B., and Schildhauer, M. (2009). Some simple guidelines for data management Best Practices
forPractices Data Organization Data Files: Best of Data Columns (variables) vs. Rows (data) Use standardized formats for date/time Use full taxonomic names
Retain raw data, separate clean files for analysis Using easily transferrable file formats & hardware .csv format, not .xls Internet/cloud storage & backup Adapted Non-proprietary formats from Borer, E.T., Seabloom, E.W., Jones, M.B., and Schildhauer, M. (2009). Some simple guidelines for data management
Best Practices forPractices Data Organization Data Files: Best of Data Columns (variables) vs. Rows (data) Use standardized formats for date/time
Use full taxonomic names Retain raw data, separate clean files for analysis Using easily transferrable file formats & hardware Descriptive file names (no spaces) Adapted from Borer, E.T., Seabloom, E.W., Jones, M.B., and Schildhauer, M. (2009). Some simple guidelines for data management Best Practices forPractices Data Organization Data
Files: Best of Data Columns (variables) vs. Rows (data) Use standardized formats for date/time Use full taxonomic names Retain raw data, separate clean files for analysis Using easily transferrable file formats &
hardware Descriptive file names (no spaces) Long-term data storage/archiving Adapted from Borer, E.T., Seabloom, E.W., Jones, M.B., and Schildhauer, M. (2009). Some simple guidelines for data management Metadata are data reporting WHO created the data? WHAT are the contents of the data? WHEN were the data created? WHERE is it geographically? HOW were the data developed?
WHY were the data developed? Photo by Michelle Chang. All Rights Reserved Metadata Synthesis of Field Notebook and Data Sheet Organization Ecological Metadata Language Several broad metadata categories General Dataset Geographic (if appropriate)
Temporal Taxonomic (if appropriate) Methods Data Table What is the National Ecological Observatory Network (NEON)? NEON is a continental-scale ecological observatory funded by the National Science Foundation as a large science facility. NEON provides: Free and open data on the drivers of and responses to ecological change
A standardized and reliable framework for research and experiments Data interoperability for integration with other national and international network science projects NEON data portal http://data.neonscience.org Small Mammal Trapping Technicians sample organisms and record data Small mammal sampling in by SCA/NPS in Denali National Park https://youtu.be/KvGvS8pApFE
NEON small mammal data Mark-Recapture Analysis Estimating abundance: Lincoln-Peterson n1 N = n2 m2
N = Total population size estimate n1 = # individuals captured and marked in first sampling bout n2 = # individuals captured in second sampling bout m2 = # of marked (recaptured) individuals in second sampling bout Assumptions: Individuals are randomly distributed between captures There is no change in the population (i.e. births, deaths, immigration, emigration) between sampling bouts Marking individuals does not impact their likelihood of being captured again in the future