Using Context to Support Searchers in Searching Susan Dumais Microsoft Research http://research.microsoft.com/~sdumais ACL/HLT June 18, 2008 Using Context to Support Search Today Searchers User Conte xt Query Words Query Words Ranked List Ranked List Documen Context Task/Use Context ACL/HLT June 18, 2008 Web Info through the Years Whats available
Number of pages indexed How its accessed 7/94 Lycos 54,000 pages 95 10^6 millions 97 10^7 98 10^8 01 10^9 billions 05 10^10 Types of content Web pages, newsgroups Images, videos, maps News, blogs, spaces
Shopping, local, desktop Books, papers Health, finance, travel ACL/HLT June 18, 2008 Some Support for Searchers The search box Spelling suggestions Query suggestions Advanced search operators and options (e.g., , +/-, site:, language:, filetype:, intitle:) Richer snippets But, we can do better using context ACL/HLT June 18, 2008 Key Contexts
Users: Documents/Domains: Individual, group (topic, time, location, etc.) Short-term or long-term models Explicit or implicit capture Document-level metadata, usage/change patterns Relations among documents Tasks/Uses: Information goal Navigational, fact-finding, informational, monitoring, research, learning, social, etc. Physical setting Device, location, time, etc. ACL/HLT June 18, 2008 Using Contexts Identify:
Accommodate: What context(s) are of interest? What do we do differently for different contexts? Outcome (Q|context) >> Outcome (Q) Influence points within the search process Articulating the information need Initial query, subsequent interaction/dialog Selecting and/or ranking content Presenting results Using and sharing results ACL/HLT June 18, 2008 Context in Action Research prototypes: provide insights about algorithmic, user experience, and policy challenges
User Contexts: Document/Domain Contexts: Finding and Re-Finding (Stuff Ive Seen) Personalized Search (PSearch) Novelty in News (NewsJunkie) Metadata and search (Phlat) Visualizing patterns in results (GridViz) Task/Use Contexts: Pages as context (Community Bar, IQ) Richer collections as context (NewsJunkie, PSearch) Working, understanding, sharing (SearchTogether, InkSeine) ACL/HLT June 18, 2008 Dumais et al., SIGIR 2003
SIS: Stuff Ive Seen Unified index of stuff youve seen Many info silos (e.g., files, email, calendar, contacts, web pages, rss, im) Unified index, not storage Index of content and metadata (e.g., time, author, title, size, access) Re-finding vs. finding Vista Desktop Search (and Live Toolbar) Also, Spotlight, GDS, X1, Stuff Ive Seen Windows Live-DS ACL/HLT June 18, 2008
SIS Demo ACL/HLT June 18, 2008 SIS Usage Experiences Internal deployment ~3000 internal Microsoft users Analyzed: Free-form feedback, Questionnaires, Structured interviews, Log analysis (characteristics of interaction), UI expts, Susan's (Laptop) World Lab expts Type Personal store characteristics 5k 500k items Query characteristics Web Files Mail Total Index N 3k 28k 60k
91k items Size 0.2 Gb 23.0 GB 2.2 Gb 25.4 Gb 190 Mb +1.5 Mb/week Short queries (1.6 words) Few advanced operators or fielded search in query box (~7%) Many advanced operators and query iteration in UI (48%) Filters (type, date); modify query; re-sort results ACL/HLT June 18, 2008 SIS Usage Data, contd Importance of people, time, and memory People 25% of queries contained names People in roles (to:, from:) vs. people as entities in text Time Age of items opened
Log(Freq) = -0.68 * log(DaysSinceSeen) + 2 Date most common sort field, even when Rank was the default 5% today; 21% last week 50% of the cases in 36 days Web (11); Mail (36); Files (55) Support for episodic memory Few searches for best topical match many other criteria 30000 120 reu qerie uenscIs ysued Number oF fQ 25000 100 20000
80 Date 15000 60 Rank 10000 Other 40 5000 20 0 0 0 500 Date 1000 1500 Rank2000 Days Since ItemSort First Seen Starting Default Order
ACL/HLT June 18, 2008 2500 SIS Usage Data, contd Observations about unified access Metadata quality is variable Email: rich, pretty clean Web: little, available to application Files: some, but often wrong Memory depends on abstractions Useful date is dependent on the object ! Appointment, when it happens File, when it is changed Email and Web, when it is seen People attribute vs. contains
To, From, Cc, Attendee, Author, Artist ACL/HLT June 18, 2008 Ranked list vs. Metadata (for personal content) Why Rich Metadata? People remember many attributes in refinding - Often: time, people, file type, etc. - Seldom: only general overall topic Rich client-side interface - Support fast iteration/refinement - Fast filter-sort-scroll vs. next-next-next ACL/HLT June 18, 2008 Teevan et al., SIGIR 2007 Re-finding on the Web 50-80% URL visits are revisits 30-40% of queries are re-finding queries ACL/HLT June 18, 2008 Cutrell et al., CHI 2006 Phlat: Search and
Metadata Shell for WDS; publically available Features: Search / Browse (faceted metadata) Unified Tagging In-Context Search ACL/HLT June 18, 2008 Phlat: Faceted metadata Tight coupling of search and browse Q Results & Property filters integrated with query
Associated metadata w/ query previews 5 default properties to filter on (extensible) Includes tags Query = words and/or properties No stuck filters Search == Browse ACL/HLT June 18, 2008 Phlat: Tagging Apply a single set of user-generated tags to all content (e.g., files, email, web, rss, etc.) Tagging interaction
Tag structure Tag widget or drag-totag Allow but do not require hierarchy Tag implementation Tags directly associated with files as NTFS or MAPI properties ACL/HLT June 18, 2008 Phat: In-Context Search Selecting a result Linked view to show associated tags Rich actions
Open, drag-drop, etc. Pivot on metadata Sideways search Refine or replace query ACL/HLT June 18, 2008 Phlat Phlat shell for Windows Desktop Search Tight coupling of searching/browsing Rich faceted metadata support Including unified tagging across data types In-context search and actions Download: http://research.microsoft.com/adapt/phlat ACL/HLT June 18, 2008 Web Search using Metadata Many queries include implicit metadata portrait of barak obama recent news about midwest floods
good painters near redmond starbucks near me overview of high blood pressure Limited support for users to articulate this ACL/HLT June 18, 2008 Search in Context Search is not the end goal Support information access in the context of ongoing activities (e.g., writing talk, finding out about, planning trip, buying, monitoring, etc.) Search always available Search from within apps (keywords, regions, full doc) Show results within app Maintains flow (Csikszentmihalyi) Can improve relevance
ACL/HLT June 18, 2008 Documents as (a simple) Context Proactive query specification depending on current document content and activities Recommendations Contextual Ads Ads relevant to page Community Bar People who bought this also bought Notes, Chat, Tags, Inlinks, Queries
Implict Queries (IQ) Also Y!Q, Watson, Rememberance Agent ACL/HLT June 18, 2008 Dumais et al., SIGIR 2004 Document Contexts (Implicit Query, IQ) Proactively find info related to item being read/created Quick links for People and Subject. Quick links Related content Challenges
Relevance, fine When to show? (useful) How to show? (peripheral awareness) Background search on top k terms, based on users index Score = tfdoc / log(tfcorpus+1) Top matches for this Implicit Query (IQ). ACL/HLT June 18, 2008 PSearch: Personalized Search Teevan et al., SIGIR 2005 (Even Richer Context) Today: People get the same results, independent of current session, previous search history, etc.
PSearch: Uses rich client-side info to personalize results Building a user profile Personalized ranking When to personalize? How to personalize display? ACM SIGIR Special Interest Group on Information Retrieval Home Page Welcome to the ACM SIGIR Web site SIGIR thanks Doug Oard, Bill Hersh, David Carmel, Noriko Kando, Diane Kelly Get ready for SIGIR 2008! sigir.org ACL/HLT June 18, 2008 Building a User Profile Type of information: PSearch Explicit: Judgments, categories Content: Past queries, web pages, desktop Behavior: Visited pages, dwell time
Time frame: Short term, long term Who: Individual, group Where the profile resides: Local: Richer profile, improved privacy Server: Richer communities, portability ACL/HLT June 18, 2008 Personalized Ranking 0.5 8.5 0 15 1 2 Personal Rank = f(Cont, Beh, Web) Pers_Content Match: sim(result, user_content_profile) Pers_Behavior Match: visited URLs Web Match:
web rank ACL/HLT June 18, 2008 When to Personalize? Personal ranking Personal relevance (explicit or implicit) Group ranking Potentialfor forPersonalization Personalization Potential Decreases as you add more people Gap is potential for personalization (p4p) 1.051.2
1 1 Potential for Personalization 0.950.8 DCG Personalization works well for some queries, but not for others Framework for understanding when to personalize DCG Individual 0.850.4 Group 0.90.6 Individual 0.80.2 0.75 0 1 1
22 33 44 55 6 6 NumberofofPeople People Number ACL/HLT June 18, 2008 More Personalized Search PSearch - rich long-term context; single individual Short-term session/task context Session analysis Query: ACL, ambiguous in isolation
Natural language summarization ACL Knee surgery orthopedic surgeon ACL Groups of similar people Groups: Location, demographics, interests, behavior, etc Mei & Church (2008) H(URL) = 22.4 Search: H(URL|Q) = 2.8 Personalization: H(URL|Q, IP) = 1.2 Many models smooth individual, group, global models ACL/HLT June 18, 2008 Beyond Search - Gathering Info Support for more than retrieving documents
Retrieve -> Analyze -> Use ScratchPad Lightweight scratchpad or workspace support Iterative and evolving nature of search Resuming at a later time or on other device Sharing with others ACL/HLT June 18, 2008 Beyond Search SearchTogether Collaborative web search prototype Sync. or async. sharing w/ others or self
SearchTogether E.g., Planning travel, purchases, events; understanding medical info; researching joint project or report Today little support Sharing & Collaborating Collaborative search tasks Morris et al., UIST 2007 Email links, instant messaging, phone SearchTogether adds support for Awareness (history, metadata) Coordination (IM, recommend, split) Persistence (history, summaries) ACL/HLT June 18, 2008 Looking Ahead
Continued advances in scale of systems, diversity of resources, ranking, etc. Tremendous new opportunities to support searchers by Understanding user intent Supporting the search process Modeling user interests and activities over time Representing non-content attributes and relations Developing interaction and presentation techniques that allow people to better express their information needs Supporting understanding, using, sharing results Considering search as part of richer landscape ACL/HLT June 18, 2008 Using Context to Support Searchers Think Outside the IR Box(es)
User Context Query Words Ranked List Documen Context Task/Use Context ACL/HLT June 18, 2008 Thank You ! Questions/Comments More info, http://research.microsoft.com/~sdumais Windows Live Desktop Search, http://toolbar.live.com Phlat, http://research.microsoft.com/adapt/phlat Search Together, http://research.microsoft.com/searchtogether/
ACL/HLT June 18, 2008 References Stuff Ive Seen Phlat J. Teevan, E. Adar, R. Jones & M. Potts (2007). Information re-retrieval. SIGIR 2007. InkSeine S. T. Dumais, E. Cutrell, R. Sarin & E. Horvitz (2004). Implicit queries (IQ) for contextualized search. SIGIR 2004. Revisitation on Web J. Teevan, S. T. Dumais & E. Horvitz (2005). Personalizing search via automated analysis of interests
and activities. SIGIR 2005. Implicit Queries M. Ringel, E. Cutrell, S. T. Dumais & E. Horvitz (2003). Milestones in time: The value of landmarks in retrieving information from personal stores. Interact 2003. Personalized Search E. Cutrell, D. C. Robbins, S. T. Dumais & R. Sarin (2006). Fast, flexible filtering with Phlat - Personal search and organization made easy. CHI 2006. Download: http://research.microsoft.com/adapt/phlat Memory Landmarks S. T. Dumais, E. Cutrell, J. J. Cadiz, G. Jancke, R. Sarin & D. C. Robbins (2003). Stuff I've Seen: A system for personal information retrieval and re-use. SIGIR 2003. Download: http://toolbar.live.com and Vista Search K. Hinckley, S. Zhao, R. Sarin, P Baudisch, E. Cutrell & M. Shilman (2007). InkSeine: In situ search for active note taking. CHI 2007. Download: http://research.microsoft.com/inkseine/ Search Together
M. Morris & E. Horvitz (2007). Search Together: An interface for collaborative web search. UIST 2007. Download: http://research.microsoft.com/searchtogether/ ACL/HLT June 18, 2008