Google and Google Scholar Roger Mills and Judy Reading May 2007 Welcome to the Web
The worlds biggest haystack What can you do in a haystack? Romp Get hay fever
Have unexpected encounters Sleep Not do research So what do you fancy? Finding needles
Google helps you find needles in haystacks But: Google is an index of web pages A journal article is not a web page So Google is not good at finding journal articles However:
An image of a journal article may be placed on a web page So Google may find it If its free and not behind a firewall How do you know? Google is fast
Very fast Proudly fast Tells you how fast Found OUCS home page in 0.09 secs Also found 350,000 other relevant pages But put home page first
Brilliant - How does it do it? Not telling. Did I need 350,000 references? Nobody looks at all the references Google retrieves So why display them?
Algorithm takes into account links made by other pages And click-throughs So the top result for a given search is determined over time by the people who make that search Is that the same as the best result?
OK, how would you do it? To index a document, Id read it first. Google cant read We dont read the web we view it We remember references visually that red book on the third shelf down
If Google can list all the red books on all the third shelves down in all the world Im bound to find it, right? Actually I remember I saw in Oxford, so I just need to list all the red books in Oxford doddle Thats not really how Google works is it?
So you read the article, and then? Give it some index terms Not ones Ive just made up, but ones from a standard list. That way, everyone will know what the articles about, and every article on the same topic can be found. Provided everyone agrees what the articles about.
Then Id list the authors in a standard form: so everything by Roger Mills, Roger Anthony Mills, Roger A Mills, R Anthony Mills, Anthony Mills, R A Mills can be found in one go. Thats a controlled vocabulary. Works for journal titles too.
Google doesnt do that No controlled terms So you must think of synonyms, different forms of name, title abbreviations etc You must define the context that matters.
Knitting according to Google OK, we get it. So lets invent Google Scholar Lets team up with publishers so they let us search behind
their firewalls Lets modify our algorithm so it excludes non-scholarly material (how do we define that?) Lets look at citations so when one article we index cites another one we index, we can move it higher up the relevance ranking
Lets link together different versions of the same article Lets include library locations for full-text access Lets see how it goes But lets not allow: creation of sets
Or controlled vocabularies Or combining of searches Or hit rate figures for individual search terms Or proximity searching Or saving and e-mailing results Or creation of alerts
Or standardisation of journal names/abbreviations Or info on what is included and what is not Or info on how the system decides what is scholarly Or an indication of update frequency seems slower than normal Google
Which of these statements is true? Google is comprehensive Google is all I need Google is up-to-date Google is not evil Google is commercial
Google is independent Google is secretive Google wants to rule the world Google wants to beat Microsoft Google loves me I love Google
Google is a family A range of products under a common brand Some add value to the basic search engine; others are nothing to do with searching Google Scholar is a variant of the standard search engine
It uses a different algorithm, but we dont know how it differs Whats in Google Scholar? Google Scholar provides a simple way to broadly search for
scholarly literature. From one place, you can search across many disciplines and sources: peer-reviewed papers, theses, books, abstracts and articles, from academic publishers, professional societies, preprint repositories, universities and other scholarly organizations. Google Scholar helps you identify the most relevant research across the world of scholarly research.
NB: only in Beta Features may change Developing in tandem with Google Books, which will include digitised texts from Oxford collections and others In competition with WoK, ScienceDirect, SCOPUS, Scirus
etc Content Algorithm to identify scholarly materials crawled by Google from the open web Access to materials locked behind subscription barriers
Must include abstract Full-text access requires institutional subscriptions or individual payment Includes peer-reviewed papers, theses, books, preprints, abstracts, full-text, citations, etc.
Library links Includes OpenURL links to local library holdings In Oxford displays as Oxford Full Text beside title Includes citation data Uses citation extraction to build connections between
papers Cited by link lists items (known to Google Scholar) that cite the original paper Cited items not available online are listed with prefix [citation] Citation analysis puts the most-cited papers at the top of
the results list Searching AND implied between words as in normal Google + to include common words, letters or numbers that Googles search technology generally ignores
quote marks to search for a phrase minus sign to exclude from a search OR for either search term author: for author search intitle: to search document title restrict by date and publication
advanced search screen available Exercise Try searching for: French national identity In Google and Google Scholar With and without quotation marks
Now try searching in Web of Science (or other relevant database) Is it clear why results differ? What approach provides the most useful results:
For writing a paper for publication For quoting in a thesis
For preparing a speech For preparing for a pub quiz Or any other purpose Help screens
Earlier version Alternatives to Google Google it! See Charles Knights up-to-date Top 100 list in Reade/Write Web:
http://www.readwriteweb.com/archives/top_100_alternative _search_engines_mar07.php Use Intute www.intute.ac.uk for reputable human-selected sites, chosen for a UK academic audience Check OxLIP www.ouls.ox.ac.uk/oxlip for complete listing and subject guide to university-subscribed databases. Most
list the sources they cover and use controlled vocabularies for indexing An example of Googles strengths - and weaknesses in finding a specific article: a search done in 2005 and repeated in Nov 2006:
Biology search: glutathione in green Arabidopsis WoS Exact article in one step
Scholar phrase search 2005: 15 results, this one at 7 Scholar phrase search 2006: 16 results, this one first
Scholar keyword search 2005: 2420 results, this one at 10 Scholar keyword search 2006: 4800 results, this one first
Google keyword search 2005: 17600 results, this one first Google keyword search 2006: 169000 articles, this one first
Google phrase search 2005: 59 results, this first Google phrase search 2006: 86 results, this first
Scholar 2005: all 7 versions Scholar 2005: cited by 2 Scholar 2006: cited by 14
WoS 2005: cited by 3 WoS 2006: cited by 15 Comparing citations data: 2005
X GS X SC X GS Comparing citations data: 2006
X GS Citations arranged by most cited SCIRUS phrase search: 2 journals, this first; 8 other web
sources (inc previous versions of this talk!) SCIRUS keyword search: 735 journals, this first; 6996 others Biological Abs phrase search: exact match in 1
note controlled keywords SCIRUS Very similar to Scholar but can also: Mark records Save records
E-mail records Export set in RIS format (for Endnote) Search on controlled terms in Biological Abstracts Omitting green, 14 results
Not including this one, first on Scholar Need wildcard arabidopsis-* Conclusion
Maintain a balanced diet! Five a day WoK, Scopus, Intute, subject-specific database, Google Scholar
• words, chunks of language, or simple phrasal patterns associated with common social and instructional situations • possible use of some conventions • usage of highest frequency general content related words • usage of everyday social and instructional words and...
Technology Transfers from VaNTH ERC for Bioengineering Educational Technologies Interoperable Adaptive Collaborative Pilot Studies TAO Portal TRUST Academy Online Online Learning Design Capture Xiao Su, Wieder Yu, San Jose State John Mitchell, Stanford Yuan Xue, Vanderbilt Ken Debelak, Yuan Xue,...
Kinesis. Stimuli: gas levels, humidity, air pressure, ambient temperature…. The rate of movement of the animal depends on the intensity of the stimuli and not it's directional. Ex: Woodlice are know to show kinesis to humidity. They dry out if...
Irregular Verbs A Project LA Activity COMMON IRREGULAR VERBS blow COMMON IRREGULAR VERBS blow break break catch catch choose chose come come do do draw draw drink drink dream dream drive drive eat eat fall fall forgive forgive get get...