Chemistry on the Internet: A Revolution In Chemical Information

Validation of chemical data on Wikipedia Martin A. Walker Dept. of Chemistry, SUNY Potsdam Member of the Wikipedia Chemistry Project

Overview Introduction Raising general quality in Wikipedia Validating chemical data in Wikipedia Recent developments in Wikipedia Chemistry

The future? Questions?

What is Wikipedia and what is it not? INTRODUCTION Wikipedia is An encyclopedia

A useful resource for chemistry Written by volunteers Editable by anyone Free to be copied, re-used

Free as in no cost Wikipedia is not A database A place to publish original research

An authoritative resource for chemistry Written mainly by kids, or by paid professionals Free to re-use

without attribution Run by a Types of chemistry article WIKIPROJECT CHEMISTRY Chemical concepts Chemical reactions & processes


WIKIPROJECT CELL & MOLECULAR BIOLOGY WikiProject Chemistry General chemistry content Reactions & processes,

concepts, chemists WikiProject Chemicals ~60 members (~20 active) Collaborates on writing quality articles and standards for:

developing data boxes for articles chemical naming, structure drawing article assessment Data validation

Collaboration with CAS Wim Van Dorst, a Dutch member of WP:Chem since March 2005.

Most articles have a Chembox Chembox is designed to be machine readable and database friendly WikiProject Pharmacology

Most articles have a Drugbox Traffic can be very high. Even for specialized topics

RAISING GENERAL QUALITY IN WIKIPEDIA WMF: Long term strategy Expand the virtuous circle Diagram by User:Randomran Creative Commons license

Article assessment by editors Assessment guides article improvement priorities Article ratings by users

Pending changes (flagged revisions) Articles under PC protection are open for editing, but changes will be visible to readers who are not logged in only after being checked for obvious vandalism and clear errors.

WikiTrust Downloadable as an extension to Firefox, this adds a tab above the article: VALIDATION OF WIKIPEDIA CHEMICAL DATA

How I use the key terms Validation => How I can be sure the data are correct? Curation = fixing errors Content validation

In 2008 a data validation drive was initiated for basic chemical identifiers Led to a collaboration with CAS, to ensure Wikipedia CAS registry nos. are correct Now around 3500 substances

have been validated against CAS Common Chemistry, as having correct name, structure & CAS RN Other fields now being validated Validated content indicated with

CommonChemistry Launched in April 2009 Came about as a result of a collaboration between CAS & Wikipedia Offered as a free service for

CAS RNs for members of the public. Organized by WP:Chemicals Moderate participation from members of WP:Pharmacology

The approach to validation Every old version (called a RevID) of an article is preserved (for all) for posterity, and can potentially serve as a permanent record of a validated version.

Protecting validated fields PROBLEM: This is the encyclopedia anyone can edit so anyone can change the BP of water to 200 oC. SOLUTION: A bot patrols the

pages, and watches for edits to key fields. Any dubious edits are flagged with a red X (next to the data), and logged. System developed by Dirk Beetstra (Eindhoven University of

Validation protected by bot If anyone tries to vandalize a validated field, this will be flagged by a bot soon afterwards. This example

received a red X 11 minutes after it was vandalized. Validated revisionIDs Checking structures

IN 2008-2010, around 3000 chemical structures were informally checked against CAS Common Chemistry PROBLEM: Structures are loaded from an external file on Wikimedia Commons, which can be invisibly

changed Since fall 2010 Now the bot has been modified to watch changes to the RevID of the Wikimedia Commons structure image A few hundred images now validated

Drugboxes Drugboxes are patrolled by the bot, but at present WP:PHARM not active

in formal validation. Most work done by Dirk Beetstra, using official lists from data sources (e.g., ChEBI). THE FUTURE?

Validation of melting points Physical properties are much harder require human validation

Collaboration beginning with JC Bradley (Drexel) & A Lang (Oral Roberts) on MPs. Supplementary data pages

Supplementary data pages can host MP validation sources These pages have room to list all sources with linked refs providing a paper trail to original

sources Other future developments New formats for content books, for cellphones (Kiwix, Wikipock, Okawix) Offline versions that use quality checks and vandalism

checks for use in schools, developing countries, etc. More validated data fields, with paper trails and realtime checks Mashups with other sites Acknowledgements

Antony Williams (RSC ChemSpider) Dirk Beetstra (Tech Univ Eindhoven) User:Physchim62 and many other Wikipedians JC Bradley and Andrew Lang

Thank you for your attention ANY QUESTIONS?

