Recommender Systems

Recommender Systems

Content-based recommendation -1- Content-based recommendation While CF methods do not require any information about the items, it might be reasonable to exploit such information; and recommend fantasy novels to people who liked fantasy novels in the past What do we need: some information about the available items such as the genre ("content") some sort of user profile describing what the user likes (the preferences) The task: learn user preferences locate/recommend items that are "similar" to the user preferences "show me more of the same what I've liked" -2- What is the "content"? Most CB-recommendation techniques were applied to recommending text documents. Like web pages or newsgroup messages for example. Content of items can also be represented as text documents. With textual descriptions of their basic characteristics. Structured: Each item is described by the same set of attributes Title Genre

Author Type Price Keywords The Night of the Gun Memoir David Carr Paperback 29.90 Press and journalism, drug addiction, personal memoirs, New York The Lace Reader Fiction, Mystery Brunonia Barry Hardcover 49.90 American contemporary fiction, detective, historical

Into the Fire Romance, Suspense Suzanne Brockmann Hardcover 45.90 American fiction, murder, neo-Nazism Unstructured: free-text description. -3- Content representation and item similarities Item representation Title Genre Author Type Price Keywords The Night of the Gun Memoir

David Carr Paperback 29.90 Press and journalism, drug addiction, personal memoirs, New York The Lace Reader Fiction, Mystery Brunonia Barry Hardcover 49.90 American contemporary fiction, detective, historical Into the Fire Romance, Suspense Suzanne Brockmann Hardcover 45.90 American fiction, murder,

neo-Nazism Title Genre Author Type Price Keywords Fiction Brunonia, Barry, Ken Follett Paperback 25.65 Detective, murder, New York User profile Simple approach Compute the similarity of an unseen item with the user profile based on the keyword overlap (e.g. using the Dice coefficient) describes Book with a set of keywords | ( ) ( )|

| ( )|+| ( )| Or use and combine multiple metrics -4- Term-Frequency - Inverse Document Frequency () Simple keyword representation has its problems in particular when automatically extracted as not every word has similar importance longer documents have a higher chance to have an overlap with the user profile Standard measure: TF-IDF Encodes text documents in multi-dimensional Euclidian space weighted term vector TF: Measures, how often a term appears (density in a document) assuming that important terms appear more often normalization has to be done in order to take document length into account IDF: Aims to reduce the weight of terms that appear in all documents -5- TF-IDF II Given a keyword and a document term frequency of keyword in document inverse document frequency calculated as : number of all recommendable documents : number of documents from in which keyword appears is calculated as: -

-6- Example TF-IDF representation Term frequency: Each document is a count vector in Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony 157 73 0 0 0 0 Brutus 4 157

0 1 0 0 Caesar 232 227 0 2 1 1 Calpurnia 0 10 0 0 0 0 Cleopatra 57 0

0 0 0 0 mercy 1.51 0 3 5 5 1 worser 1.37 0 1 1 1 0 Vector with dimension Example taken from -7-

Example TF-IDF representation Combined TF-IDF weights Each document is now represented by a real-valued vector of -weights Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony 157 73 0 0 0 0 Brutus 4 Caesar Calpurnia

Cleopatra mercy worser 157 Antony 0 and Cleopatra 232 227 0 Antony 5.25 0 10 0 Brutus 1.21 57 0 0 Caesar 8.59 1.51 0 3 Calpurnia 0 1.37 0 1 Cleopatra 2.85 Julius 1 Caesar The 0 Tempest 2 3.18

6.1 2.54 1.54 0 1 0 0 0 5 1 Hamlet 0 0 0 0 0 0 5 1 Macbeth 0 0.35 0 0 0.25 0 0

0 0 0 1 0 0 Othello 1 0 0 1.51 1 0 0 0 mercy 1.51 0 1.9 0.12 5.25 0.88 worser 1.37

0 0.11 4.15 0.25 1.95 Example taken from -8- Improving the vector space model Vectors are usually long and sparse remove stop words They will appear in nearly all documents. e.g. "a", "the", "on", use stemming Aims to replace variants of words by their common stem e.g. "went" "go", "stemming" "stem", size cut-offs only use top n most representative words to remove "noise" from data e.g. use top 100 words -9- Improving the vector space model II Use lexical knowledge, use more elaborate methods for feature selection Remove words that are not relevant in the domain Detection of phrases as terms

More descriptive for a text than single words e.g. "United Nations" Limitations semantic meaning remains unknown example: usage of a word in a negative context "there is nothing on the menu that a vegetarian would like.." The word "vegetarian" will receive a higher weight then desired an unintended match with a user interested in vegetarian restaurants - 10 - Cosine similarity Usual similarity metric to compare vectors: Cosine similarity (angle) Cosine similarity is calculated based on the angle between the vectors Adjusted cosine similarity take average user ratings into account (), transform the original ratings U: set of users who have rated both items a and b - 11 - Recommending items Simple method: nearest neighbors Given a set of documentsalready rated by the user (like/dislike) Either explicitly via user interface Or implicitly by monitoring user's behavior Find the nearest neighbors of an not-yet-seen item in Use similarity measures (like cosine similarity) to capture similarity of two documents Take these neighbors to predict a rating for e.g. most similar items to. 4 of items were liked by current user item will also be liked by this user Variations:

Varying neighborhood size k lower/upper similarity thresholds to prevent system from recommending items the user already has seen Good to model short-term interests / follow-up stories Used in combination with method to model long-term preferences - 12 - Recommending items Retrieval quality depends on individual capability to formulate queries with right keywords. Query-based retrieval: Rocchio's method The SMART System: Users are allowed to rate (relevant/irrelevant) retrieved documents (feedback) The system then learns a prototype of relevant/irrelevant documents Queries are then automatically extended with additional terms/weight of relevant documents - 13 - Rocchio details Document collections D+ (liked) and D- (disliked) Calculate prototype vector for these categories. Computing modified query Qi+1 from current query Qi with: , , used to fine-tune the feedback weight for original query weight for positive feedback

weight for negative feedback + = + Often only positive feedback is used More valuable than negative feedback - 14 - Practical challenges of Rocchio's method Certain number of item ratings needed to build reasonable user model Can be automated by trying to capture user ratings implicitly (click on document) Pseudorelevance Feedback: Assume that the firstdocuments match the query best. The set is not used until explicit negative feedback exists. User interaction required during retrieval phase Interactive query refinement opens new opportunities for gathering information and Helps user to learn which vocabulary should be used to receive the information he needs - 15 - Probabilistic methods Recommendation as classical text classification problem long history of using probabilistic methods Simple approach: 2 classes: hot/cold simple Boolean document representation calculate probability that document is hot/cold based on Bayes theorem Doc-ID

recommender intelligent learning school 1 1 1 1 0 2 0 0 1 1 3 1 1 0 0 4 1

0 1 1 5 0 0 0 1 6 1 1 0 0 Label 3212 (|=1)=( =1|=1)( =1|=1)( =0|=1)(h =0|=1)= 0.149 3333 1 0 1 1 0 ? - 16 -

Linear classifiers Most learning methods aim to find coefficients of a linear model A simplified classifier with only two dimensions can be represented by a line The line has the form and correspond to the vector representation of a document (using e.g. TF-IDF weights) and are parameters to be learned Classification of a document based on checking In n-dimensional space the classification function is Relevant Nonrelevant Other linear classifiers: Naive Bayes classifier, Rocchio method, Windrow-Hoff algorithm, Support vector machines - 17 - Improvements Side note: Conditional independence of events does in fact not hold "New York", "Hong Kong" Still, good accuracy can be achieved

Boolean representation simplistic positional independence assumed keyword counts lost More elaborate probabilistic methods e.g., estimate probability of term v occurring in a document of class C by relative frequency of v in all documents of the class Other linear classification algorithms (machine learning) can be used Support Vector Machines, .. Use other information retrieval methods (used by search engines..) - 18 - Explicit decision models Decision tree for recommendation problems inner nodes labeled with item features (keywords) used to partition the test examples existence or non existence of a keyword in basic setting only two classes appear at leaf nodes interesting or not interesting decision tree can automatically be constructed from training data works best with small number of features use meta features like author name, genre, ... instead of TF-IDF representation. - 19 - Explicit decision models II

Rule induction built on RIPPER algorithm good performance compared with other classification methods eloborate postpruning techniques of RIPPER extension for e-mail classification takes document structure into account main advantages of these decision models: inferred decision rules serve as basis for generating explanations for recommendation existing domain knowledge can be incorporated in models - 20 - On feature selection process of choosing a subset of available terms different strategies exist for deciding which features to use feature selection based on domain knowledge and lexical information from WordNet (Pazzani and Billsus 1997) frequency-based feature selection to remove words appearing "too rare" or "too often" (Chakrabarti 2002) Not appropriate for larger text corpora Better to evaluate value of individual features (keywords) independently and construct a ranked list of "good" keywords. Typical measure for determining utility of keywords: e.g. , mutual information measure or Fisher's discrimination index - 21 -

Limitations of content-based recommendation methods Keywords alone may not be sufficient to judge quality/relevance of a document or web page up-to-date-ness, usability, aesthetics, writing style content may also be limited / too short content may not be automatically extractable (multimedia) Ramp-up phase required Some training data is still required Web 2.0: Use other sources to learn the user preferences Overspecialization Algorithms tend to propose "more of the same" Or: too similar news items - 22 - Discussion & summary In contrast to collaborative approaches, content-based techniques do not require user community in order to work Presented approaches aim to learn a model of user's interest preferences based on explicit or implicit feedback Deriving implicit feedback from user behavior can be problematic Evaluations show that a good recommendation accuracy can be achieved with help of machine learning techniques These techniques do not require a user community

Danger exists that recommendation lists contain too many similar items All learning techniques require a certain amount of training data Some learning methods tend to overfit the training data Pure content-based systems are rarely found in commercial environments - 23 - Literature [Michael Pazzani and Daniel Billsus 1997] Learning and revising user profiles: The identification of interesting web sites, Machine Learning 27 (1997), no. 3, 313-331. [Soumen Chakrabarti 2002] Mining the web: Discovering knowledge from hyper-text data, Science & Technology Books, 2002. - 24 -

Recently Viewed Presentations

  • Structural Mechanics 4 Shear Force, Bending Moment and ...

    Structural Mechanics 4 Shear Force, Bending Moment and ...

    CHECK VERTICAL FORCES. UPWARD REACTION = R B + R A . = 35 + 25 = 60 kN. DOWNWARD FORCE = 10+20+30 =60 kN. So likely that calculations are correct!
  • Lower Extremity H&P: Knee Exam

    Lower Extremity H&P: Knee Exam

    Have the patient flex his quadricep, then apply a posteriorly-directed force to the patella. Apley's test. With the patient prone, flex the affected knee to 90 degrees, grasp the foot and rotate the knee, applying a downward force. Reproduction of...
  • Student-Performed Demonstrations: Laboratory Component for ...

    Student-Performed Demonstrations: Laboratory Component for ...

    Times Palatino Blank Presentation Student-Performed Demonstrations: Laboratory Component for Non-Science Majors - January Term Overview Goals Goals Objectives Courses CHM 290 - Science Demonstrations Lab Format Demonstration Selection Types of Demonstrations Hydrogen Balloon Polyurethane Foam Nylon Rope Liquid Nitrogen Solid...
  • Maximum Throughput Capacity

    Maximum Throughput Capacity

    MCT is a measure of the capacity of the runway. It defines the average number of arrivals and/or departures that can be performed on the runway in one hour. MCT makes several assumptions, that are listed on the screen. Because...
  • Demonstration cum Training Election Information System ...

    Demonstration cum Training Election Information System ...

    The CUs/BUs are then randomized by the DEO and sent to the AC Strong Room type warehouse from where they will be sent to the polling stations for the elections after SLC and second level randomization. 2.FLC is done by...


    what lies ahead? we are reviewing the 2018 season . we are seeking a new c grade coach. our agm is coming up next month - we encourage everyone to attend- there will be a q&a session for members- new...
  • Health At Every Size® -

    Health At Every Size® -

    A celebrity cook was diagnosed with diabetes. Four things happened: ... lead by Dr. Linda Bacon at UC Davis involved randomly assigning obese female chronic dieters ages 30-45 to a HAES intervention or a traditional diet group intervention. ... Sarah...
  • Romanticism - Socorro Independent School District

    Romanticism - Socorro Independent School District

    Romanticism. Revolutions. Revolutions occurring in France, and in America, thus many in England saw this as a turning point in history for a more ideal and civilized society. ... Romantics were interested in the Medieval past, the supernatural, the mystical,...