A Syntactic Justification of Occam's Razor

A Syntactic Justification of Occam's Razor

1 A Syntactic Justification of Occams Razor John Woodward, Andy Evans, Paul Dempster Foundations of Reasoning Group University of Nottingham Ningbo, China Email: [email protected] [email protected] [email protected] Overview 2

Occams Razor Sampling of Program Spaces (Langdon) Definitions Assumptions Proof Further Work Context Occams Razor 3

Occams Razor says has been adopted by the machine learning community to mean; Given two hypotheses which agree with the observed data, pick the simplest, as this is more likely to make the correct predictions Definitions 4 Program

Hypothesis Size Function Set of predictions (concept) Complexity 5 6 Langdon 1 (Foundation of Genetic Programming) 1. The limiting distribution of functions is independent of program size! There is a correlation between the frequency in the limiting distribution and the complexity of a function.

7 Langdon 2 (Foundation of Genetic Programming) Hypothesis-Concept Spaces 8 Notation 9

P is the hypothesis space (i.e. a set of programs). |P| is the size of the space (i.e. the cardinality of the set of programs). F is the concept space (i.e. a set of functions represented by the programs in P). |F| is the size of the space (i.e. the cardinality of the set of functions). If two programs pi and pj map to the same function (i.e. they are interpreted as the same function, I(pi)=f=I(pj)), they belong to the same equivalence class (i.e. pi is in [pj] I(pi)=I(pj)). The notation [p] denotes the equivalence class which contains the program p (i.e. given I(pi)=I(pj), [pi]=[pj]). The size

of an equivalence class [p] is denoted by |[p]|. Two assumptions 10 1. 2. Uniformly sample the hypothesis space, probability of sampling a given program is 1/|P|. There are fewer hypotheses that represent

complex functions |[p1]|>|[p2]|c(f1)

starting from a statement of the assumption |[p1]|>|[p2]| c(f1)< c(f2) Dividing the left hand side by |P|, |[p1]|/|P|>|[p2]|/|P| c(f1)< c(f2) As |[p1]|/|P| = p(I(p1)) =p(f1), we can rewrite as p(f1)>p(f2) c(f1)< c(f2) a mathematical statement of Occams razor. 12

Restatement of Occams Razor Often stated as prefer the shortest consistent hypothesis Restatement of Occams Razor: The preferred function is the one that is represented most frequently. The equivalence class which contains the shortest program is represented most frequently. Summary 13

Occams razor states pick the simplest hypothesis consistent with data We agree, but for a different reason. Restatement. Pick the function that is represented most frequently (i.e. belongs to the largest equivalence class). Occams razor is concerned with probability, and we present a simple counting

argument. Unlike many interpretations of Occams razor we do not throw out more complex hypotheses we count them in [p]. We offer no reason to believe the world is simple, our razor only gives a reason to predict using the simplest hypothesis. 14 further work To prove Assumption 2

there are fewer hypotheses that represent complex functions: |[p1]|>|[p2]| c(f1)

Further work -> to prove out assumptions. Does it depend on the primitive set??? How are the primitive linked together (e.g. tree, lists, directed acyclic graphs) How does nature compute? 16

Heuristics such as Occams razor need not be explicitly present as rules. Random searches of an agents generating capacity may implicitly carry heuristics. Axiomatic reasoning probably comes late. Thanks & Questions? 17 1)

2) 3) 4) 5) 6) 7) Thomas M. Cover and Joy A. Thomas. Elements of information theory. John Wiley and Sons 1991. Michael J. Kearns and Umesh V. Vazirani. An introduction to computational learning theory. MIT Press, 1994. William B. Langdon. Scaling of program fitness spaces. Evolutionary Computation, 7(4):399-428,1999. Tom M. Mitchell. Machine Learning. McGraw-Hill 1997.

S. Russell and P. Norvig. Artificial Intelligence: A modern approach. Prentice Hall, 1995. G. I. Webb. Generality is more significant than complexity: Toward an alternative to occams razor. In 7th Australian Joint Conference on Artificial Intelligence Artificial Intelligence: Sowing the Seeds for the Future, 6067, Singapore, 1994, World Scientific. Ming Li and Paul Vitanyi . An Introduction to Kolmogorov Complexity and Its Applications (2nd Ed.). Springer Verlag.

Recently Viewed Presentations

  • WAWF Overview Slides

    WAWF Overview Slides

    CHECK THE STATUS OF A DOCUMENT. Self-Registration Tips. Self-Register at . https://wawf.eb.mil. For initial registration (including CAC users), use the User ID and Password logon method. CAC users can convert to CAC login after user ID is activated by the...
  • LEE COUNTY SECOND BUDGET WORKSHOP August 31, 2010

    LEE COUNTY SECOND BUDGET WORKSHOP August 31, 2010

    Arial Times New Roman Default Design Microsoft Office Excel Worksheet Microsoft Graph Chart Microsoft Office Excel Chart LEE COUNTY Overarching Issues Reserves/Deficits Slide 4 Slide 5 REDUCED BUDGET SUMMARY RECOMMENDATION Slide 8 Slide 9 Board Reductions All Funds Slide 11...
  • Aprendizagem Significativa

    Aprendizagem Significativa

    Teoria desenvolvida por David Ausubel É uma teoria cognitiva: Procura explicar o processo de aprendizagem e como o ser humano compreende, transforma, armazena e usa as informações É uma teoria construcionista: O ser humano aprende a partir daquilo que já...
  • EOC Review - Leon County Schools

    EOC Review - Leon County Schools

    You need to know the basic anatomy and physiology of the human reproductive system. You need to know the process of human development from zygote to birth. For the male reproductive system, you will need to know the seminal vesicle,...
  • Email Management Objectives To review the impact of

    Email Management Objectives To review the impact of

    Email Management When receiving an email, you can ask yourself the following questions and then move it on fairly quickly: Delete it: if the content of the email does not relate to a meaningful objective your're involved in or working...
  • Understanding how media producers define ... - Faustina Starrett

    Understanding how media producers define ... - Faustina Starrett

    Understanding how media producers define audiences for their product. Types of Research. ... Understanding how media producers define audiences for their product Last modified by: Starrett, Faustina ...
  • Geometric Formulas PERIMETER OF A SQUARE P= 4s

    Geometric Formulas PERIMETER OF A SQUARE P= 4s

    Geometric Formulas PERIMETER OF A SQUARE P= 4s AREA OF A SQUARE A=s 2 3 in. 3 in. Area of a Square A=s A= 3 in.x3 in. A=9 sq. in. Perimeter of a Square P=4s P=4(3 in ...
  • Assessing Potentially Violent Students

    Assessing Potentially Violent Students

    Understanding Verbally Aggressive Students: Anger and frustration with situation becomes displaced to faculty. Verbal abuse may follow when frustrating situation seems beyond the student's control. Students may feel they will be rejected, and want to reject you first. They often...