Class Separation and Parameter Estimation with Neural Nets

Class Separation and Parameter Estimation with Neural Nets

Statistical Learning Methods in HEAP Jens Zimmermann, Christian Kiesling Max-Planck-Institut fr Physik, Mnchen MPI fr extraterrestrische Physik, Mnchen Forschungszentrum Jlich GmbH Statistical Learning: Introduction with a simple example Occams Razor Decision Trees Local Density Estimators Methods Based on Linear Separation Examples: Triggers in HEP and Astrophysics Conclusion C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 1 Statistical Learning Does not use prior knowledge No theory required Learns only from examples Trial and error Learning by reinforcement Two classes of statistical learning: discrete output 0/1: classification continuous output: regression Application in High Energy- and Astro-Physics: Background suppression, purification of events Estimation of parameters not directly measured C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 2 # slides 42 21 28 8 71 19 64

31 29 36 15 34 1 48 44 56 51 25 55 12 16 4 3 2 # slides 5 6 x10 # formulas 0 Experimentalists Theorists A simple Example: Preparing a Talk 0

1 2 3 4 5 6 x10 # formulas Data base established by Jens during Young Scientists Meeting at MPI C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 3 Discriminating Theorists from Experimentalists: A First Analysis 0 2 4 6 0 x10 2 6 x10 # slides x10 # formulas 4 5 4

3 0 Talks a week before meeting 1 2 # slides First talks handed in 6 Experimentalists Theorists 0 1 2 3 4 5 6 x10 # formulas C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 4 First Problems Simple Completely model, separable, but no complete but only separation via complicated boundary 4 3

1 4 2 0 3 # slides 2 5 6 # slides x10 5 6 x10 New talk by Ludger: 28 formulas on 31 slides 0 1 2 3 4 5 6 x10 0 1 # formulas

0 1 2 3 4 # formulas 5 6 x10 At this point we cannot know which feature is real! Use Train/Test or Cross-Validation! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 5 See Overtraining - Want Generalization Need Regularization Test x10 Train Test Set 4 Training Set 3 Overtraining 1 2 # slides 5

6 E 0 Training epochs 0 1 2 3 4 5 6 x10 # formulas E 1 N t N i out ( xi ) 2 Want to tune the parameters of the learning algorithm depending on the overtraining seen! i 1 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 6

See Overtraining - Want Generalization Need Regularization Test x10 Train Test Set 3 4 Training Set 1 2 # slides 5 6 E 0 Training epochs 0 1 2 3 4 5 6 x10 # formulas E 1

N N 2 ti out ( xi , w) i 1 Regularization will ensure adequate performance (e.g. VC dimensions): Limit the complexity of the model Factor 10 - Rule: (Uncle Bernies Rule #2) C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 7 Philosophy: Occams Razor Pluralitas non est ponenda sine necessitate. Do not make assumptions, unless they are really necessary. 14th century From theories which describe the same phenomenon equally well choose the one which contains the least number of assumptions. First razor: Given two models with the same generalization error, the simpler one should be preferred because simplicity is desirable in itself. Yes! But not of much use. Second razor: Given two models with the same training-set error, the simpler one should be preferred because it is likely No! No free lunch-theorem Wolpert to have lower generalization error. 1996 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 8 Decision Trees # formulas 0 2 4 6 x10

#formulas < 20 exp #formulas > 60 th 20 < #formulas < 60 ? # slides #slides > 40 #slides < 40 0 2 4 #formulas < 20 exp 6 x10 all events #formulas > 60 rest subset 20 < #formulas < 60 #slides < 40 th exp th #slides > 40 exp C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 th Classify Ringaile:

31 formulas on 32 slides th Regularization: Pruning 9 Local Density Estimators x10 6 5 4 0 1 2 3 # slides 4 3 0 1 2 # slides 5 6 x10 Search for similar events already classified within specified region, count the members of the two classes in that region. 0 1 2 3 4

5 6 x10 # formulas C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 0 1 2 3 4 5 6 x10 # formulas 10 Maximum Likelihood # formulas 0 2 4 # slides 6 x10 0 31 4 6

x10 32 2 3 pTh 0.24 5 5 1 1 pExp 0.04 5 5 Regularization: Binning 2 out= Correlation gets lost completely by projection! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 11 k=3 out= k=4 out= 6 k=2 out= 4 3 2 1 0 k=5 out= # slides 5 k=1 out= x10

k-Nearest-Neighbour 0 1 2 3 4 5 6 x10 # formulas Regularization: Parameter k For every evaluation position the distances to each training position need to be determined! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 12 Range Search x10 1 x x 7 9 8 7 2 8 6

0 Small box: checked 1,2,4,9 out= 3 10 1 10 5 4 4 1 9 3 6 y # slides y 5 3 5 4 2 2 6 x Large box: checked all out=

0 1 2 3 4 5 6 x10 # formulas Regularization: Box-Size Tree needs to be traversed only partially if box size is small enough! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 13 Methods Based on Linear Separation x10 x10 Divide the input space into regions separated by one or more hyperplanes. Extrapolation is done! 6 5 4 0 0 1 1 2 3 # slides

4 3 2 # slides 5 6 LDA (Fisher discr.) 0 1 2 3 4 # formulas 5 6 x10 0 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 1 2 3 4 5 6 x10 # formulas 14

Neural Networks 1 (a) 1 e a 1 N 2 E ti out ( xi ) N i 1 y wi xi s 1 x10 0 5 6 arbitrary inputs and hidden neurons -1.8 +3.6 4 +3.6 3 -50 +20 2 +1.1 -1.1 +0.2 1 +0.1 0 # formulas 0

Network with two hidden neurons (gradient descent): 1 2 3 4 5 6 x10 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 # slides Regularization: # hidden neurons weight decay 15 Support Vector Machines Separating hyperplane with maximum distance to each data point: Maximum margin classifier Found by setting up condition for correct classfication yi ( wxi b) 1 2 and minimizing w which leads to the Lagrangian 1 2 L w i yi ( wxi b) 1 2 Necessary condition for a minimum is w i yi xi Output becomes out sgn i yi xi x b Only linear separation? No! Replace dot products: xy ( x ) ( y ) The mapping to feature space : R d F

is hidden in a kernel K ( x , y ) ( x ) ( y ) C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 Non-separable case: 1 2 1 2 w w C i 2 2 16 Physics Applications: Neural Network Trigger at HERA H1 keep physics reject background C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 17 Trigger for J/Events H1 [email protected]=95%: NN SVM k-NN RS C4.5 ML LDA C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 99.6% 98.3% 97.7% 97.5% 97.5% 91.2% 82% 18 Triggering Charged Current Events ~ e

p signal W p X p background [email protected]=80%: NN SVM C4.5 RS k-NN LDA ML C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 74% 73% 72% 72% 71% 68% 65% 19 Astrophysics: MAGIC - Gamma/Hadron Separation Photon Hadron vs. Training with Data and MC Evaluation with Data = signal (photon) enhancement factor Random Forest: = 93.3 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 Neural Net: = 96.5 20 Future Experiment XEUS: Position of X-ray Photons (Application of Stat. Learning in Regression Problems)

transfer direction XEUS ~10m ~300m electron potential xCOM xc c i i i xCCOM xCOM ( xCOM ) of reconstruction in m NN 3.6 SVM 3.6 k-NN 3.7 RS 3.7 ETA 3.9 CCOM 4.0 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 21 Conclusion Statistical learning theory is full of subtle details (models statistics) Widely used statistical learning methods studied: Decision Trees LDE: ML, k-NN, RS Linear separation: LDA, Neural Nets, SVMs Neural Networks found superior in the HEP and Astrophysics applications (classification, regression) studied so far Further applications (trigger, offline analyses) under study C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 22 From Classification to Regression k-NN 2 4 3 2 out 8 3 5 5 3

NN 1 E N 4 3 3 5 5 N y 2 RS i out ( xi ) 13 2 out 4 Fit Gauss 2 i 1 a=(-2.1x - 1) b=(+2.1x - 1) C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003 out=(-12.7a-12.7b+9.4) 23

Recently Viewed Presentations

  • Age Managers Course 2nd Edition 2010 Age Managers

    Age Managers Course 2nd Edition 2010 Age Managers

    To ensure that all children have a suitable swimming ability for water activities, a Preliminary Skills Evaluation is undertaken by all children in the U8 age group and above. An SLSA Accredited Age Manager, Coach, Official, Training Officer (SRC/Bronze) or...
  • The Enlightenment and the Great Awakening: A Comparison

    The Enlightenment and the Great Awakening: A Comparison

    When James II became king in 1865 he vowed to get the colonies back in line. King James II Reacts. ... James's daughter Mary and her husband William ascended to the throne. ... The most important piece of the English...
  • 1 RECOGNITION THE MESSIANIC PROPHECY BIBLE PROJECT http://

    1 RECOGNITION THE MESSIANIC PROPHECY BIBLE PROJECT http://

    the lord of the flies". LIVESTOCK ... We will see that the first nine plagues are divided into three groups of three. The first plague in each group describes a time ... (Hebrews chapter 3-4 etc) the day of the...
  • Applet Graphical User Interface Event-driven Programming

    Applet Graphical User Interface Event-driven Programming

    C# IDE and Tools Yingcai Xiao Integrated Development Environment (IDE) (1) IDE: all-in-one programming tools (2) Includes: editor, compiler, linker, loader, debugger, profiler, project organizer, context-sensitive help, form designer, programming code libraries, application framework templates, …
  • Chest radiology - WordPress.com

    Chest radiology - WordPress.com

    Atelectasis. Ask them how. to define these patterns before describing them. Consolidation - any pathologic process that fills the alveoli with fluid, pus, blood, cells (including tumor cells) or other substances resulting in lobar, diffuse or multifocal ill-defined opacities. Interstitial
  • Questions in Real Estate Finance

    Questions in Real Estate Finance

    Combines to reduce the value of servicing rights SERVICING RIGHTS Excess servicing rights are fees greater than "normal" Usually occurs when mortgages are sold with a promised rate less than the coupon on the mortgages The greater the spread, the...
  • Dones del POP - XTEC

    Dones del POP - XTEC

    "Parla de les relacions sentimentals i de la manera en que moltes coses es troben al voltant teu volent distreure't. Les persones sempre han d'intentar dir allò que vulguin dir, llavors evoca el "No one" que vol dir "ningú", ningú...
  • DNI OTWARTE CTT PK Jak znaleźć stypendium, pracę za granicą ...

    DNI OTWARTE CTT PK Jak znaleźć stypendium, pracę za granicą ...

    Projekty indywidualne dla naukowców w 7.PR - aktualne konkursy programu PEOPLE Urszula Grzyb Regionalne Centrum Informacji dla Naukowców Regionalny Punkt Kontaktowy