DIGITAL HUMANITIES FOR DUMMIES: CROWDSOURCING AND TRANSCRIPTION Dr

DIGITAL HUMANITIES FOR DUMMIES: CROWDSOURCING AND TRANSCRIPTION Dr

DIGITAL HUMANITIES FOR DUMMIES: CROWDSOURCING AND TRANSCRIPTION Dr Alana Jayne Piper Twitter: @alana_piper Website: criminalcharacters.com What is crowdsourcing? A portmanteau of crowd and outsourcing, crowdsourcing is the practice of undertaking a task or series of task by enlisting a large number of people who will each perform a small amount of work to help complete it, typically via the Internet (coined 2006, by Jeff

Howe in Wired). Model was initially developed for industry, such as the marketplace Amazon Mechanical Turk, where crowdsourcing participants would be paid small amounts for completing simple online tasks, or receive some form of compensation for their labour. Academic crowdsourcing also began developing in the 2000s. Academic crowdsourcing involves researchers inviting members of the public to contribute to projects by performing a simple research task - such as annotating or transcribing records where the volume of data involved would inhibit an individual researcher from completing the task alone. Academic crowdsourcing typically involves volunteers rather than paid labour, which has led to it also being referred to as citizen science or citizen history.

Sometimes those engaged in such work are more of a small, tight-knit community, rather than a large public crowd. History of academic crowdsourcing Idea of academic crowdsourcing precedes the Internet. In 1848, scientist Matthew Fontaine Maury distributed 5000 free copies of his Wind and Current Charts on the condition that sailors submit a standardized log of their voyage to the US Naval Observatory. In 1884, the first edition of the Oxford English Dictionary, to which around 800 volunteers had submitted information or historical quotations showing the usages of different words. In 1937, the Mass-Observation social research project was launched by three Cambridge

graduates who over the next thirty years asked volunteers to record data about everyday life in Britain. In 1999, the University of California, Berkeley, launched [email protected], which allows volunteers to help search for signals that might come from extraterrestrial intelligence by installing a program that uses idle computer time for analyzing chunks of data recorded by radio telescopes. In 2008, the Australian Newspaper Digitisation Program began asking the general public to correct OCR text from articles generated by their digitization, one of the earliest and most successful crowdsourcing projects in the digital humanities. Crowdsourcing: Volunteer motivations

Learning processes play an important role in volunteer motivation; participating in crowdsourcing has also been shown to encourage deep learning about specific topics. Competitiveness/sense of individual or group accomplishment is another motivation for volunteers; but some research suggests that too much gamification can have a negative impact on the quality of transcriptions, and serve to exclude the casual volunteer. Perceived value to community can also be a strong motivation; this requires researchers to communicate the intentions and outcomes of the research to volunteers.

Volunteers also often have an expectation that the data will eventually be made publicly available in some form. While many volunteers may contribute to a project, research has shown that in the end most crowdsourcing project succeed because of a small group of dedicated super contributors. Crowdsourcing: Some critical reflections Crowdsourcing has its own set of ethics that needs consideration, particularly in terms of whether the labour being demanded of volunteers is exploitative and whether its use is taking opportunities away from students/RAs who would otherwise be paid to do the work.

In the case of the former objection, it is important to consider what you as a researcher are giving back to the community/crowdsourcing volunteers, and perhaps how you can make participation a form of serious leisure. In the case of the latter criticism, it has been suggested that crowdsourcing be restricted to projects that would not otherwise be able to be undertaken, or money be used to employ RAs on projects to work with the crowdsourced data. Further reading about crowdsourcing Estelles-Arolas, Enrique, and Fernando Gonzalez-Ladron-de-Guevara. "Towards an Integrated Crowdsourcing Definition." Journal of Information Science 38, no. 2 (2012): 189-200.

Grayson, Richard S. "A Life in the Trenches? The Use of Operation War Diary and Crowdsourcing Methods to Provide an Understanding of the British Armys Day-to-Day Life on the Western Front." British Journal for Military History 2, no. 2 (2016): 160-85. Hedges, Mark, and Stuart Dunn. Academic Crowdsourcing in the Humanities: Crowds, Communities and Co-Production. Cambridge, MA: Chandos Publishing, 2018. Rockwell, Geoffrey. "Crowdsourcing the Humanities: Social Research and Collaboration." In Collaborative Research in the Digital Humanities, edited by Willard Mccarty and Marilyn Deegan, 135-54. Farnham, England ; Burlington, Vt.: Routledge, 2012. Terras, Melissa. "Crowdsourcing in the Digital Humanities." In A New Companion to Digital Humanities, edited by Susan Schreibman, Raymond George Siemens and John Unsworth. Malden, MA; Chichester, West Sussex, UK: John Wiley & Sons, 2015.

Crowdsourcing research: What is your data? Deciding if crowdsourcing is a method right for your project is dependent on what types of data you are working with, and what you want to do with it. Types of data Types of research tasks Handwritten records (unstructured)

Transcription Handwritten records (structured) Transcription with coding of certain material OCR texts Verification of transcriptions Images Annotation/classification of records

Sound files Marking up of records Audio-visual content Linkage of records Transkribus Transkribus is a platform for enabling transcription, automated recognition and searching of historical documents. It can be adapted to crowdsourcing projects, or used within a project team. It relies on you locally hosting the site and records. Pros

Cons Open source (mostly) Tricky to set up (requires time and/or technical know-how) Uses OCR technology so that machine in term learns to read examples of particular handwriting better based on manual transcriptions

Excellent capabilities to transcribe and mark up texts, including busy ones Requires time and training to learn all features of the tool Works best with transcribing long-form documents, is less adaptable to structured data like tables. Example Project: Transcribe Bentham http://transcribe-bentham.ucl.ac.uk/td/Transcribe_Bentham

From The Page From The Page is a platform for manuscript transcription. It is designed for crowdsourcing, but can be used for private collaborative research projects as well. Site and records are hosted on server. Pros Cons Easy to use Less successful than Transkribus at

handling hyper dense text (e.g. where the writer has written both ways across a page) More features than Transkribus, particularly for automatic mark up, annotations on records, collaborator engagement and quality review Paid software, minimum rate is $80 a month

Control over who can work on transcription general public or select collaborators and can switch from one type to other. Example Project: Indiana WWI https://fromthepage.com/indianaarchives/indiana-wwi-service-record-cards Digivol Digivol is an online crowdsourcing platform developed by the Atlas of Living Australia. Site and records are hosted on a server.

Pros Cons Open source Does not allow for mark up of records Can be used for different types of data, including images

Projects/records for transcription must be submitted for approval to platform to be hosted. Set up so that once a transcription is done, another volunteer will verify the transcription to ensure accuracy Example Project: City of Parramatta Minute Books https://digivol.ala.org.au/project/index/30417196

Zooniverse Zooniverse is a crowdsourcing tool that allows transcription and other types of research tasks to be performed. Projects can be made private, be made public but not submitted to Zooniverse for approval, or be made public and launched on Zooniverse platform. Site and records are hosted on a server. Pros Cons Open source

Works better with structured than longform data. Works with different types of data, including images, sound and video files (have to be in particular formats) Does not have all features present in other transcription tools, such as OCR software or mark up. Projects can be made private or public,

and do not have to be submitted to official Zooniverse platform. Very easy to use Example Project: Criminal Characters https://www.zooniverse.org/projects/ajpiper/criminalcharacters Zooniverse Project Builder The first step of building a crowdsourcing project on Zooniverse is to register on the site. Pick a sensible username as it will appear as part of the URL of any project you create on the site and cannot be easily changed. Zooniverse Project Builder

Once you are registered you will want to select Build a Project. You will be prompted to sign in with your newly created username and password. Zooniverse Project Builder From here, you will find a number of help guides to using the Project Builder application, along with a button to Create a New Project. Click this now. Zooniverse Project Builder Now you can give your project a name and a description about it that will appear on your site. All these details can be updated later, so far now just come up with a simple name.

Zooniverse Project Builder Now you have named the project you will be taken to the Lab the area in which projects are built. View Project The View Project button can be used at any time to see what the production version of your site currently looks like. Project Details The Project Details button can be used to control what volunteers will see on the landing page of your site. You can add introductory text, a background image, avatar and links to

other websites or social media accounts. About The About button lets you build up more in-depth information that volunteers will be able to access from a dedicated about page. You can have different sections for information about the research, research team, research results, education and FAQs. Collaborators The Collaborators button lets you add other people to the project team in different roles to let them have more access to the site. Any one you add will need to be registered on Zooniverse so you can add their Zooniverse username. General project volunteers will not

need to be registered here, if you decide to take the project public. Field Guide The Field Guide button is an optional feature you can set up to help volunteers understand your records as they are viewing them. It creates a guide that volunteers can pull up at any time while working on your records. For instance, if your records contained lots of acronyms, your field guide might include a list of such acronyms that the volunteer could then see the meaning of. If your records consist of images of different types of historical weaponry, and you want them to annotate what weaponry is in each image, your field guide might contain a list of different weapons with example images of each one.

Tutorial The Tutorial button lets you build a tutorial that volunteers can access so they can learn how to undertake the work you expect of them. Tutorials can incorporate images or videos. If you have multiple workflows or tasks that volunteers can choose to perform, you can set up different tutorials for each one. Media The Media button lets you upload images, sound or video files that you might want to include in the tutorial or project details section of your site. Zooniverse then gives these files a URL that you can paste into the portions of the tutorial or project details sections where you want them to appear. (You can also add URLs from sites like youtube, flickr to these

sections, but can only upload files from your computer via this Media section.) Visibility The Visibility button lets you control the status of your project. Classifications of data will not be counted while the project is in development, only once it is live. A project can be made live but still be kept private, meaning only those who are added to the project as collaborators under the Collaborators tab will be able to work on it. Once a project is made public, it will have a web address that any one can access if they have the link. You also have the option of applying for Zooniverse to review and beta test the project with the aim of having the project included on their list of official Zooniverse projects that are promoted to their over 1 million registered volunteers.

Talk Once a Project is Public, you can set up different discussion boards about the project from the Talk button. Volunteers can use these to ask for help in completing the tasks asked of them, or point out interesting things they have found in the records. It also lets you communicate directly with the volunteer group. Data Exports Once your Project is Live, this is where you can come to export the data results. All the classifications or research tasks performed by volunteers can be downloaded in CSV format and viewed as a spreadsheet. You will likely want to do some further work on this

data using OpenRefine, Python or R to get it ready for analysis. Zooniverse has a code repository on suggested ways to clean or aggregate collected data. Workflows The Workflows button is where you set up what you actually want the volunteer to do. You can set up multiple workflows if there are a variety of tasks you want a volunteer to perform on the record; or you can create one workflow that consists of multiple tasks/actions. Subject Sets The Subject Sets button is where you upload the records you want volunteers to work on by transcribing, annotating or classifying data. You can upload multiple subject sets.

Adding a subject set Click Create a new subject set in the Subject Sets view. This will bring up a page where you can give a name to the subject set and upload the files the interface can handle up to 500-1000 files being added to a Subject Set at a time. Users are capped at an allowance of 10,000 records, but you can apply to have this allowance increased. When you upload your records into the subject set, you can also add a csv-formatted Manifest containing information about each record in the set that will then attach permanently to each record as metadata. Adding a workflow

Once you have a subject set of records, you can create a workflow for what you want volunteers to do with them. In the Workflows view click New Workflow and give it a name. Then click Add a Task, which will give you four options for what you would like volunteers to do: answer a question about the record; draw/mark something on the record; do a text transcription of the record; or survey the record to indicate whether multiple types of things that you are looking for are present within it. Setting up a task Once you have chosen a task that you want them to perform, it is a simple matter of entering the information that will help them perform the task. For a transcription task, you will need to enter instructions about what they are transcribing. You can also add further information that

will appear as Help Text if they click on the help button located underneath the task. You can also check boxes that will enable extra buttons that allow volunteers to mark where a record contains text that has been deleted (crossed out) or inserted into the record, or is unclear. Setting up a task Perhaps there is only one task that you want volunteers to perform on the record. On the other hand, there may be multiple actions or tasks you want to combine into one workflow. You can keep adding tasks to your workflow and dictate the order that they will appear to the volunteer by setting up one task as the First task and then selecting within each task which is the Next Tast that is to be performed. For the Last Task this should be Submit classification and load next subject.

Setting up a workflow Scrolling further down the page you are able to select additional options you want to enable with this workflow, such as picking which Subject Set of records it is associated with, which Tutorial will be included for volunteers with it, whether the volunteers can zoom in and pan across records. Setting up a workflow At the very bottom of the page you have one more important option. Zooniverse allows you to dictate how many times a workflow must be performed on a record before it is

considered complete. For instance, you may want the same record transcribed 3 times in order to ensure the accuracy of the transcription. You can pick the number of times you want a workflow performed on the record in the Subject Retirement box. Once you have set up the workflow, you can then test it out to see if it is appearing and performing the way you expect it to. You can then refine or change the functionality from there. All changes made in Zooniverse are automatically saved. Thank you

Recently Viewed Presentations

  • AutoCAD Architecture 2008: Part I: Getting Started

    AutoCAD Architecture 2008: Part I: Getting Started

    Effective communication is a higher order of communication. It means the message is received, understood, and being acted on in the desired manner. Communication is the oil that keeps the total quality engine running. Without it, total quality breaks down....
  • Economic Update Ryan Wang, US Economist HSBC Securities

    Economic Update Ryan Wang, US Economist HSBC Securities

    Ryan Wang, US Economist HSBC Securities (USA) Inc. [email protected] +1 212 525 3181 February 2012 * * Advanced economics: Just over 50% of US exports Source: Netherlands Bureau for Economic Policy Analysis * Eurozone import volumes have stalled Source: Netherlands...
  • Bit Stuffing and Applications to Constrained Codes

    Bit Stuffing and Applications to Constrained Codes

    We will use as our constraint, binary (d,k) codes, although our technique applies to a much wider class of codes. We will begin with plain vanilla bit stuffing which gives rates strictly less than capacity. Then we show how bit...
  • Supportive VA Programs for Veterans and their Caregivers

    Supportive VA Programs for Veterans and their Caregivers

    VA Boston Healthcare System: Supportive Programs for Veterans and their Caregivers. Olga Quinlan, LICSW Dementia Care Coordinator. VA Boston Healthcare System . February 4, 2014
  • Kinematics - Haringeymath's Blog

    Kinematics - Haringeymath's Blog

    Where For vertical motion acceleration due to gravity is g, 9.8ms-2. s = displacement in metres u = initial velocity in ms-1 v = final velocity in ms-1 a = acceleration in ms-2 t = time taken in seconds These...
  • An Introduction to VCOP What does VCOP mean?

    An Introduction to VCOP What does VCOP mean?

    Ros Wilson has spent 45 years working in Education in the UK. She realised that good writing consisted of 4 key components alongside accurate grammar, spelling and sense. V- Vocabulary. C- Connectives. O- Openers. P- Punctuation
  • Moderne Behandlungsverfahren im Kindesalter: Wie finde ich ...

    Moderne Behandlungsverfahren im Kindesalter: Wie finde ich ...

    Among patients with acute medical illnesses, the highest frequency of hospital-related DVT and PE occurs: During the first 5 days of hospitalization 34%
  • Vegetation Zones

    Vegetation Zones

    deciduous forest. trees with broad, flat leaves that are shed before winter. mixed forest. a mix of coniferous and deciduous trees. broadleaf evergreen forest. tall trees with large leaves that remain green all year. ... Vegetation Zones Last modified by: