Computer Systems - UMIACS

Computer Systems - UMIACS

Discovery and Delivery Week 7 LBSC 671 Creating Information Infrastructures Tonight Access points Discovery Delivery Midterm exam review Authority Control Unify references to the same entity (synonyms) Samuel Clemens, Mark Twain

Distinguish references to different entities (homonyms) Michael Jordan (basketball), Michael Jordan (computers) Establish access points Canonical and variant forms, to better support find it tasks Access Points Originally designed for card catalogs One card for every authorized access point Four types dictionary catalog access points Title (uniform titles) Author (name authority)

Subject (controlled vocabulary) Series Other things can serve a similar purpose Call number (shelf order) Keywords (full-text search) Functional Requirements for Authority Data (FRAD) Name Canonical form for display to users Identifier Canonical form for use by systems

Controlled access points Forms that can be used as a basis for access Rules For creating access points Agency Organization responsible for creating access points FRBR Bibliographic User Tasks Find it Search (to find) Recognize (to identify)

Choose (to select) Serve it Location (to obtain) FRAD Authority Control User Tasks Searcher tasks Find Identify Authority control tasks Contextualize Justify

http://authorities.loc.gov/ Hands On Find the authoritative LC name for one of ... http://ischool.umd.edu/faculty-staff/jennifer-j-preece http://www.umiacs.umd.edu/~jimmylin/ http://terpconnect.umd.edu/~pwang/ http://en.wikipedia.org/wiki/Robert_S._Taylor http://en.wikipedia.org/wiki/Hans_Peter_Luhn Entity Linking Knowledge Base Query

Entity Linking Given A mention of a persons name in a document A knowledge base containing information about a set of known entities Determine Whether the mentioned person is in the knowledge base If so, where Match unstructured text to structured knowledge source Related to: Record linkage: Structured to structured

Co-reference resolution: Unstructured to unstructured Entity Linking Task Michael Phelps Debbie Phelps, the mother of swimming star Michael Phelps, who won a record eight gold medals in Beijing, is the author of a new memoir, ... Michael Phelps Michael Phelps is the scientist most often identified as the inventor of PET, a technique that permits the imaging of

biological processes in the organ systems of living individuals. Phelps has ... 818k+ entries Michael Phelps swimmer 1985- Michael E Phelps

biophysicist 1939- Mike Phelps basketball player 1961- Edmund Phelps economist

1933- Identify matching entry, or determine that entity is missing from KB. Non-trivial due to name ambiguity, name variation, & KB absence. Technical Approach According to the CDC the prevalence of H1N1 influenza in California prisons has increased ... Query = CDC

California Dept. of Corrections Cedar City Regional Airport Cheerdance Competition Communicable Disease Centre

Congress for Democratic Change Consumers for Dental Choice Control Data Corporation Cult of the Dead Cow NIL (Absence from KB) US Center for Disease Control ... Several phases 1. Candidate identification (triage) based on target name Technical Approach

According to the CDC the prevalence of H1N1 influenza in California prisons has... Query = CDC 1. California Dept. of Corrections 2. US Center for Disease Control 3. Cedar City Regional Airport (IATA code) 4. Communicable Disease Centre (Singapore) 5. Congress for Democratic Change (Liberian political party) 6. Cult of the Dead Cow (Hacker organization) 7. Control Data Corporation 8. NIL (Absence from KB) 9. Consumers for Dental Choice (non-profit) 10. Cheerdance Competition (Philippine

organization) Several phases 1. Candidate identification (triage) based on target name 2. Candidate selection (ranking) exploiting document features using supervised machine learning Technical Approach According to the CDC the prevalence of H1N1 influenza in California prisons has...

Query = CDC 1. California Dept. of Corrections 2. US Center for Disease Control 3. Cedar City Regional Airport (IATA code) 4. Communicable Disease Centre (Singapore) 5. Congress for Democratic Change (Liberian political party) 6. Cult of the Dead Cow (Hacker organization) 7. Control Data Corporation 8. NIL (Absence from KB) 9. Consumers for Dental Choice (non-profit) 10. Cheerdance Competition (Philippine organization)

Several phases 1. Candidate identification (triage) based on target name 2. Candidate selection (ranking) exploiting document features using supervised machine learning 3. Possibly choosing absence (NIL) Supervised Machine Learning Steven Bird et al., Natural Language Processing, 2006

Cross-Language Entity Linking Cross-Language Entity Linking Knowledge Base Query One-Best Person Linking Accuracy Dawn Lawrie et al, Cross-Language Person-Entity Linking from Twenty Languages, under review (2013) Classification Classification

A system for organizing knowledge Notation Expressing the classification in a systematic way Library of Congress Subject Headings Controlled vocabulary for subject access points Most commonly applied to books and serials Used when a subject describes 20% of the work Choose the most specific appropriate headings But if more than 3 subtopics, choose a broader heading

LCSH Subdivisions Topical Archaeology Methodology Form Archaeology Fiction Chronological Archaeology History 18th century Geographic Archaeology Egypt Hands On

Find the LCSH for one of: http://www.mayoclinic.com/health/heart-attack/DS00094 http://en.wikipedia.org/wiki/AS-204 http://www.apollotheater.org/ http://www.flickr.com/photos/usnationalarchives/4153755504/ http://en.wikipedia.org/wiki/Operation_Entebbe Tonight

Access points Discovery Delivery Midterm exam review Two Ways of Searching Controlled Vocabulary Searcher Free-Text Searcher Author

Indexer Construct query from terms that may appear in documents Write the document using terms to convey meaning Choose appropriate concept descriptors

Query Terms Content-Based Query-Document Matching Document Terms Document Descriptors Retrieval Status Value

Construct query from available concept descriptors Metadata-Based Query-Document Matching Query Descriptors Supporting the Search Process Source

Selection IR System Query Formulation Query Search Ranked List Selection

Indexing Document Index Examination Acquisition Document Collection Delivery

Online Public Access Catalog (OPAC) Known-item search Author, Title Topic search Title, subject headings Result display Sort by publication date, relevance, Navigation Broader/narrower headings, other editions,

Delivery Call number or (digital content) direct delivery Tonight Access points Discovery Delivery Midterm exam review Delivery (Serve It) Assigning a shelf order Moving physical materials Controlling access to digital materials

Library of Congress Classification Book title: Uncensored War: The Media and Vietnam Author: Daniel C. Hallin Call Number: DS559.46 .H35 1986 The first two lines describe the subject of the book. DS559.45 = Vietnamese Conflict DDS1-937History History of Asia use number: 3 DS520-560.72 Asia DS556-559.93

Annam DS557-559.9 Conflict e i o r 4 5 6 7 For expansion for the letter:

use number: a-d 3 e-h 4 Southeast Vietnam. The third line often represents the author's last Vietnamese name. After other initial consonants

H = Hallin for the second letter: a u y i-l 5 m-o p-s 6 7 8 9

t-v 8 w-z 9 The last line represents the date of publication. http://www.usg.edu/galileo/skills/unit03/libraries03_04.phtml The World Is Flat (in LCC) HM846 .F74 2005 H Social sciences

HM Sociology HM831 Social change Causes HM846 Technological Innovations. Technology. .F74 Cutter number for Friedman, Thomas The World Is Flat (in Dewey) 303.4833 300 Social science 300 Social sciences, sociology, & anthropology

303 Social processes 303.4 Social change 303.48 Causes of change 303.483 Development of science and technology 303.4833 Communication (Information technology) Inter-Library Loan Users search union catalog to find books Remote library ships it to local library Often by scanning it, where practical Someone pays for this (local library or user)

Local library manages circulation Limited access period Some return mechanism E-Book Distribution OECD, E-Books: Development and Policy Considerations (2011) Copyright Balances two public interests Incentivizing production of new information Through owners interest in monetizing assets Fostering use of information

First sale doctrine Fair use doctrine First Sale Doctrine Owner may transfer access of the owned copy But may not make a copy then transfer the copy This is what permits lending libraries Exception: no commercial lending of audio recordings Licensing can apply more restrictive rules Establishes a conditional right of access This is what permits limited- Fair Use Doctrine

Balance two desirable characteristics Financial incentives to produce content Desirable uses of existing information Safe harbor agreement Book chapter, magazine article, picture, Developed in an era of physical documents Perfect copies/instant delivery alter the balance Recent Copyright Laws Copyright Term Extension Act (CTEA) Ruled constitutional (Jan 2003, Supreme Court)

Digital Millennium Copyright Act (DMCA) Prohibits circumvention of technical measures Implements WIPO treaty database protection Digital Rights Management (DRM) Goal: protect intellectual property rights Copyright relies on cost and quality of analog copies Three interlocking strategies Make it difficult to produce an exact digital copy Encrypt the content and then control description Enforce policies to rebalance costs and benefits Digital Rights Management

No standards, so proliferation of one-off solutions Many of which have caused unintended problems Unilateral implementation can result in imbalance Establishing balance is a political process The analog hole is technically intractable Unless interaction is needed Midterm Exam Posted by 5 AM on Tuesday October 28 Due at 11 PM on Saturday November 2 3 Hours, same process as the quiz (email, no talking, )

Comprehensive Nature of information institutions Have it, find it, serve it One question will be to create + represent a bibliographic description (w/authority control) One RDA+MARC, MODS or BIBFRAME option One DACS+EAD option Before You Go! On a sheet of paper (no names), answer the following question: What was the muddiest point in todays

class?

Recently Viewed Presentations

  • Comparative Politics Chapter 5

    Comparative Politics Chapter 5

    Interest Aggregation and Political Parties Introduction Interest Aggregation- activity in which the political demands of individuals and groups are combined into policy programs. Political parties are important to interest aggregation in both democratic and non-democratic systems.
  • Welcome to a Revolution…

    Welcome to a Revolution…

    •Yeoman farmers had to buy things like coffee, tools, plows, nails, medicines, sewing needles, strong thread, scissors, cooking utensils, shoes (though most shoes were made on the farm-brogans), bolts of fabric, and iron skillets. They may by wheat flour, but...
  • STARR User Group_TS - Georgia

    STARR User Group_TS - Georgia

    State Technology Annual Report Registry (STARR) Application Inventory. ... The SRS Maturity Dashboard represent cumulative operational responses from Business Owners and CIOs input related to the current state of their agency; data, systems and IT security. ... STARR User Group_TS
  • Physical Geography Physical Geography 1. 2. 3. 4.

    Physical Geography Physical Geography 1. 2. 3. 4.

    Physical Geography Climate Landforms Soil and Vegetation Ecozones Climate Weather is the day to day readings of temperature and precipitation (and wind speed, barometric pressure, and several other factors) Climate is the long term patterns of temperature and precipitation We...
  • Mrs. Winiarski-Kuzdak Health Semester II (March) March 2

    Mrs. Winiarski-Kuzdak Health Semester II (March) March 2

    March 2. Go to my Weebly. Go to Research - hover. Go to Homework/Research - hover. Go to Assigned Research Topic - click to open. Click on your hour at the bottom of the page
  • Financial Reporting

    Financial Reporting

    Example - School A starts 10 students on January 1. They charge $2,000 tuition for the 52 week program. There were no other starts in the year and half the students paid their full tuition by the end of the...
  • Sorting: why? We do it A LOT! Makes

    Sorting: why? We do it A LOT! Makes

    Repeat (recursively) with the lesser list and the greater list. Analysis: Average case: nlog. 2. n (with n being the length of the list)Each time we're dividing the list in half (hopefully) So we compare the full list, then half...
  • R002 Unit 2 - LO1 Cambridge L2 - Enderoth

    R002 Unit 2 - LO1 Cambridge L2 - Enderoth

    With the splash screen is agreed by your test buddy, you will need to make the finished product using an appropriate package. Evidence of the stages of creation and saving must be evident for the higher grades. The splash screen...