EPortolio Presentation - Virginia Tech

EPortolio Presentation - Virginia Tech

Supply and Demand Analysis in NDLTD Based on Patron Specialty and Contents Statistics The 9th International Symposium on Electronic Theses and Dissertations Quebec City, Quebec, Canada June 7-10, 2005 Seonho Kim, Seungwon Yang, Edward A. Fox Digital Library Research Laboratory, Virginia Tech Blacksburg, VA 26061 USA Overview Purpose of Study Data Set (ETDs, patrons, queries) Our Approach Data Analysis Conclusions and Future Work

ETD 2006, Quebec, Canada 2 Purpose of Study Analysis of ETD Subjects (supply) Analysis of Users, Queries (demand) Comparison of the Above Two Users Years of Experience in Their Field Distribution of Date Stamp of ETDs ETD 2006, Quebec, Canada 3 Overview

Purpose of Study Data Set (ETDs, patrons, queries) Our Approach Data Analysis Conclusions and Future Work ETD 2006, Quebec, Canada 4 Data Set - ETDs Up-to-date Union archive harvested from Online Computer Library Center (OCLC) Using OAI/ODL Harvester [2] by Hussein Suleman http://oai.dlib.vt.edu/odl/software/harvest/ Total of 242,688 records ETD 2006, Quebec, Canada

5 Example ETD Metadata Composer-Centered Computer-Aided Soundtrack CompositionVane, Roland EdwinComputer Sciencehuman computer interactionmusic compositionsoundtrackscreativity< /subject>For as long as computers have been around, people have looked for ways to involve them in music. University of Waterloo2006Electronic Thesis or Dissertationapplication/pdfhttp:// etd.uwaterloo.ca/etd/revane2006.pdfenCopyright: 2006 ETD 2006, Quebec, Canada 6

Patrons, Queries User Profile Data (Oct. 2005 May 2006) Online User Survey [3] as part of User Modeling study Total 1100 User Data that include Majors, specialties, years of experience, and demographic information. Queries and detailed research interests ETD 2006, Quebec, Canada 7 User Profile Form ETD 2006, Quebec, Canada 8 Example User Profile Data shk [email protected] Sh King

CSDigital Library User interface 8,2 digital library computer science virginia tech artificial intelligence digital library. Digital Library Electronic Theses and Dissertations Digital Library Data ETD 2006, Quebec, Canada 9 Overview Purpose of Study Data Set (ETDs, patrons, queries)

Our Approach Data Analysis Conclusions and Future Work ETD 2006, Quebec, Canada 10 Categorization of Academic Subjects Built Our Own Classification Categories Based on Colleges / Faculties in - Virginia Tech, University of Virginia, George Mason University, VCU and Virginia State University Identified - 7 categories and 77 subcategories - Word patterns for each subcategories ETD 2006, Quebec, Canada 11 Categorization of Academic Subjects 7 categories and selected 77 subcategories

7 Categories Selected 77 Sub-categories 1 Architecture and Design ArchitectureConstruction, LandscapeArchitecture, 2 Law Law 3 Medicine, Nursing and Veterinary Medicine Dentistry, Medicine, Pharmacy, Nursing, 4

Arts and Science Agriculture, AnimalPoultry,Biology,... 5 Engineering and Applied Science ComputerScience, Material, Electronics, 6 Business and Commerce Buisiness, Economics, Management, 7 Education 8 Others (unclassifiable) Education

ETD 2006, Quebec, Canada 12 Categorization of Academic Subjects Each subcategory has a set of word patterns - Matching table developed Process of word matching table development 1. Run our subject-matching classifier program 2. Count each unclassified subject & sort them. 3. If num. > 10, add the unclassified subject to matching table 4. Repeat 1 3 until num. < 10 for all unclassified subjects ETD 2006, Quebec, Canada 13 Categorization of Academic Subjects Matching Table 77 categories

Word Patterns Education /bildung/, /pedagog/, /fakul/, /educa/, /teaching/, Geology /geolog/, /geoscience/, LibraryScience /librari/, /library/, /informatik/, ETD 2006, Quebec, Canada 14 Categorization of Academic Subjects

Unclassified ETD Subjects: Approx. 85 % unique Approx. 10 % only two occurrences ETD 2006, Quebec, Canada 15 Measuring Supply Demand ETD Supply (Num. of Resources) - 242,688 ETDs classified into 7 categories and counted Patrons Demand (Num. of Queries) - 4519 queries (in 1100 user data) classified into 7 categories - Sum of all queries in each category calculated as Demand of a Category number of queries user category ETD 2006, Quebec, Canada

16 ETD Classification Based on the first subject field Composer-Centered Computer-Aided Soundtrack CompositionVane, Roland EdwinComputer Sciencehuman computer interactionmusic compositionsoundtrackscreativity< /subject>For as long as computers have been around, people have looked for ways to involve them in music. University of Waterloo2006Electronic Thesis or Dissertationapplication/pdfhttp:// etd.uwaterloo.ca/etd/revane2006.pdfenCopyright: 2006 ETD 2006, Quebec, Canada

17 User Classification Based on the major, broadresearch, and specific fields in each user profile shk [email protected] Sh King CSDigital Library User interface 8,2 digital library computer science virginia tech artificial intelligence digital library. Digital Library Electronic Theses and Dissertations Digital Library Data ETD 2006, Quebec, Canada 18 Challenges Varieties in describing research subjects

Solution: we built a subject matching table 77 categories Decision patterns Education /bildung/, /pedagog/, /fakul/, /educa/, /teaching/, Geology /geolog/, /geoscience/, LibraryScience /librari/, /library/, /informatik/, Arts /music/, ETD 2006, Quebec, Canada 19

Challenges Interdisciplinary ETDs e.g., Music Education Solution: adjust matching order Unclassifiable ETDs Null Entry (No subject field data) Erroneous entries (e.g., Ph.D, Georgia,[email protected]) Typo (e.g. edcuation, poluition) Too much detail (e.g., pulsars, muon, cytochrome) Abbreviations (e.g., MOCVD, OFDM) ETD 2006, Quebec, Canada 20 Overview

Purpose of Study Data Set (ETDs, patrons, queries) Our Approach Data Analysis Conclusions and Future Work ETD 2006, Quebec, Canada 21 Resource Distribution 1 Architecture and Design 1 2

2 Law 3 4 5 6 7 8 3 Medicine, Nursing and Veterinary Medicine 4 Arts and Science 5 Engineering and Applied Science

6 Business and Commerce 7 Education 8 Others. (unclassifiable) Resource Distribution in NDLTD 1 2 3 4 8

5 7 6 ETD 2006, Quebec, Canada 22 User Distribution User Distribution in NDLTD 1 1 Architecture and Design 2 Law 3

Medicine, Nursing and Veterinary Medicine 4 Arts and Science 5 Engineering and Applied Science 6 Business and Commerce 7 Education 8 Others.

(unclassifiable) 2 3 8 4 7 1 2 3 4 5 6 7 8 5 6 ETD 2006, Quebec, Canada

23 Query Distribution 1 Architecture and Design Query Distribution in NDLTD 1 2 Law 2 3 4 8 5 1 2 3 4 5 6

7 8 3 Medicine, Nursing and Veterinary Medicine 4 Arts and Science 5 Engineering and Applied Science 6 Business and Commerce 7 Education 7 6 ETD 2006, Quebec, Canada 8 Others. (unclassifiable)

24 Supply-Demand Comparison 1 Architecture and Design ETD Resources and User Demands (Number of Q ueries) in NDLTD 50% ETDs 2 Law Demands 3 Medicine, Nursing and Veterinary Medicine 45% 40% 35% 4 Arts and

Science 30% 25% 5 Engineering and Applied Science 20% 15% 10% 6 Business and Commerce 5% 7 Education 0% 1 2

3 4 5 Academic Categories 6 ETD 2006, Quebec, Canada 7 8 8 Others. (unclassifiable) 25 Supply-Demand of 77 Subcategories (1/2) Supply/ Demand 77 Subcategories (1/ 2) 12% ETD supply

User Demand 10% 8% 6% 4% 2% 0% ETD 2006, Quebec, Canada 26 Supply-Demand of 77 Subcategories (2/2) Supply/ Demand 77 Subcategories (2/ 2) 12% ETD Supply User Demand 10% 8%

6% 4% 2% 0% ETD 2006, Quebec, Canada 27 User Expertise Years Users' Expertise in Years 200 180 160 Users 140 120 100 80 60 40 20 0

Years ETD 2006, Quebec, Canada 28 Expertise Years and Demand Expertise Years and Demand 25% Users Demand 20% 15% 10% 5% 0% Years

ETD 2006, Quebec, Canada 29 Date Stamp of ETD 60,000 50,000 40,000 30,000 20,000 10,000 0 Year ETD 2006, Quebec, Canada 30 Date Stamp of ETD ETDs from seventeen hundreds ? - Some of scanned copies from European universities - Oldest ETDs are from British universities

- Some of the older dates are typos - you'd have to check each one to know for sure ETD 2006, Quebec, Canada 31 Overview Purpose of Study Data Set (ETDs, patrons, queries) Our Approach Data Analysis Conclusions and Future Work ETD 2006, Quebec, Canada 32

Conclusions Analysis of ETD Subjects (supply) Analysis of User Queries (demand) Comparison of the Above Two Users years of experience in their field Date Stamp of ETDs Learned the future directions ETD 2006, Quebec, Canada 33 Future Work Use of widely-used classification system - e.g., Dewey Decimal Classification 22 More detailed classification of ETDs - Include title, abstract and other subject field data - Utilize discipline in ETD_MS format records

But only approx. 7000 records Use of user behavior data - e.g., Clicking of query results in NDLTD ETD 2006, Quebec, Canada 34 References [1] NDLTD, Networked Digital Library of Theses and Dissertations, available at http://www.ndltd.org, 2006 [2] Hussein Suleman, OAI/ODL Harvester, available at http://oai.dlib.vt.edu/odl/software/harvest/ [3] Seonho Kim, Uma Murthy, Kapil Ahuja, Sandi Vasile, Edward A. Fox, Effectiveness of Implicit Rating Data on Characterizing Users in Complex Information Systems, Springer-Verlag LNCS3652, 9th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2005), 2005, 186-194 [4] Search Interface Embedded User Tracking System, available at http://boris.dlib.vt.edu:8080/controller/index.jsp, 2006 ETD 2006, Quebec, Canada

35 Appendix 7 Categories and 77 Subcategories 7 categories 77 subcategories 1 Architecture and Design ArchitectureConstruction, LandscapeArchitecture 2 Law Law 3

Medicine, Nursing and Veterinary Medicine Dentistry, Medicine, Nursing, Pharmacy, Veterinary 4 Arts and Science Agriculture, AnimalPoultry, Anthropology, ApparelHousing, Archaeology, Art, Astronomy, Biochemistry, Biology, Botany, Chemistry, Communication, CropSoilEnvSciences, DairyScience, Ecology, EngineeringScience, English, Entomology, Family, Food, ForeignLanguageLiterature, Forestry, Geography, Geology, GovernmentInternationalAffair, History, Horticulture, HospitalityTourism, HumanDevelopment, HumanNutritionExercise, Informatics, Interdisciplinary, LibraryScience, Linguistics, Literature, Meteorology, Mathematics, Music Naval, Philosophy, Physics, Plant, Politics, Psychology, PublicAdministrationPolicy, PublicAffair, Sociology, Statistics, UrbanPlanning, Wildlife, Wood, Zoology 5 Engineering and Applied Science

Aerospace, BiologicalEnginerring, Chemical, ComputerScience, Electronics, Environment, Industrial, Materials, Mechanics, MiningMineral, Nuclear, OceanEngineering 6 Business and Commerce AccountingFinance, Business, Economics, Management 7 Education Education 8 Others. (unclassifiable) (Unclassifiable) ETD 2006, Quebec, Canada 36

Thank You ETD 2006, Quebec, Canada 37 Questions or Comments? ETD 2006, Quebec, Canada 38 User Data - Fields : entered by the user : ETD results clustered and displayed : cluster labels clicked by the user ETD 2006, Quebec, Canada 39

Recently Viewed Presentations

  • Aggregate Planning Production and Operations Planning Production Process

    Aggregate Planning Production and Operations Planning Production Process

    This is the no. of workers for each month 3,495 Shortage cost 156,793 Total Cost 152,000 Labor cost 798 Inventory cost 500 Layoff cost Plan 2: Level strategy Plan 3: Stable strategy with outsourcing 20 22 21 21 19 22...
  • FPGA     Xilinx  ISE5.2  XST  HDL  XST       ISE5.2 iMPACT

    FPGA Xilinx ISE5.2 XST HDL XST ISE5.2 iMPACT

    5.1 ISE5.2中的综合工具XST XST的综合约束文件是XCF(XST Constrain File),而在布局布线阶段,最重要的约束文件是用户约束文件UCF(User Constraint File),两者有着千丝万缕的关系,UCF几乎支持XCF的所有约束语言与命令。
  • THE EMERITI PROGRAM - Saint Mary&#x27;s College

    THE EMERITI PROGRAM - Saint Mary's College

    Please consult your plan highlights and summary plan description. ... Please listen carefully, as our menu has changed. ... Health Solutions is a registered investment adviser for purposes of selecting the range of investment options for the Emeriti Program, selecting...
  • Waves - I

    Waves - I

    16-1 Transverse Waves. The Speed of a Traveling Wave. Two snapshots of the wave: at time. t =0, and then at time. t = Δt. As the wave moves to the right at velocity .
  • Sensation and Perception - McCreary County Middle School

    Sensation and Perception - McCreary County Middle School

    Sensation and Perception Sensation: your window to the world Perception: interpreting what comes in your window. Transduction Transforming signals into neural impulses. Information goes from the senses to the thalamus , then to the various areas in the brain. Remember...
  • Exercise Prescription Certificate Course (Session 2 ...

    Exercise Prescription Certificate Course (Session 2 ...

    Definition of "Older Adult" People with age ≥ 65 years . People 50-64 years with clinically significant conditions or physical limitations that affect movement, physical fitness, or physical activity. Positive improvements from PA are attainable at any age. Those who...
  • Il testo narrativo - Digisic

    Il testo narrativo - Digisic

    IL TESTO NARRATIVO I testi narrativi si distinguono dagli altri testi in quanto raccontano una storia. Una storia è un insieme concluso di fatti accaduti a qualcuno, in un certo tempo e in determinati luoghi, e che qualcuno racconta.
  • Introduction - Univerzitet u Zenici

    Introduction - Univerzitet u Zenici

    TCP/IP Mreže Primjer Uvod u mreže. Ova prezentacija će pomoću jednog primjera ilustrirati interakciju protokola koji je zasnovan na temelju TCP/IP protokola.