Epitope prediction based on peptide array data

Napovedovanje imunskega odziva iz peptidnih mikromre Mitja Lutrek1 (2), Peter Lorenz2, Felix Steinbeck2, Georg Fllen2, Hans-Jrgen Thiesen2 1 Odsek za inteligentne sisteme, Institut Joef Stefan 2 Univerza v Rostocku 1. Introduction 2. Immune response prediction 3. Interpretation 1. Introduction 2. Immune response prediction 3. Interpretation Peptide = part of protein = short sequence of amino acids

Image taken from EMBL website Peptide = part of protein = short sequence of amino acids SNDIVLT Image taken from EMBL website = string of letters from 20-letter alphabet (1 letter = 1 amino acid, 20 standard amino acids) Epitope Antigen protein

Antibody binding Antibody Epitope Epitope Antigen protein Antibody binding Antibody Epitope Epitope Antigen protein Peptide

Epitope Epitope Antigen protein Epitope Epitope Antigen protein Antibody binding Antibody Epitope Epitope Antigen protein

Antibody binding Antibody Epitope Epitope Antigen protein Antibody binding Antibody Epitope Epitope Antigen protein Epitope

Epitope Antigen protein Peptide arrays Peptide array Peptides (15 amino acids) Glass slide Peptide arrays IVIg antibody mixture Peptide array

Peptides (15 amino acids) Glass slide Peptide arrays IVIg antibody mixture Red = epitopes (bind antibodies) Black = non-epitopes Peptide array Peptides (15 amino acids) Glass slide Peptide arrays

Red = epitopes (bind antibodies) Black = non-epitopes Antibody against antibody + dye Antibody Peptide Glass slide Peptide arrays Peptide Red = epitopes (bind antibodies) Black = non-epitopes Class

PGIGFPGPPGPKGDQ non-ep. PNMVFIGGINCANGK non-ep. DGIGGAMHKAMLMAQ non-ep. REDNLTLDISKLKEQ non-ep. TPLAGRGLAERASQQ non-ep. DQVHPVDPYDLPPAG non-ep. ... RRMISRMPIFYLMSG epitope LPPGFKRFTCLSIPR epitope EFSQMESYPEDYFPI epitope ... 1. Introduction 2. Immune response prediction 3. Interpretation Our task Peptide RRKGGLEEPQPPAEQ

SEDLENALKAVINDK EDHVKLVNEVTEFAK GEKIIQEFLSKVKQM ILVSRSLKMRGQAFV YTCQCRAGYQSTLTR ... Our task Peptide Peptide RRKGGLEEPQPPAEQ non-ep. SEDLENALKAVINDK non-ep. RRKGGLEEPQPPAEQ SEDLENALKAVINDK EDHVKLVNEVTEFAK GEKIIQEFLSKVKQM

Class Machine learning EDHVKLVNEVTEFAK non-ep. GEKIIQEFLSKVKQM non-ep. YTCQCRAGYQSTLTR ILVSRSLKMRGQAFV epitope YTCQCRAGYQSTLTR epitope ... ... ILVSRSLKMRGQAFV

Our task Peptide Peptide RRKGGLEEPQPPAEQ non-ep. SEDLENALKAVINDK non-ep. RRKGGLEEPQPPAEQ SEDLENALKAVINDK EDHVKLVNEVTEFAK GEKIIQEFLSKVKQM Class Machine learning

EDHVKLVNEVTEFAK non-ep. GEKIIQEFLSKVKQM non-ep. YTCQCRAGYQSTLTR ILVSRSLKMRGQAFV epitope YTCQCRAGYQSTLTR epitope ... ... ILVSRSLKMRGQAFV Training set: 13,638 peptides (3,420 epitopes) Test set: 13,640 peptides (3,421 epitopes) Balanced until the final testing

Machine learning Peptide Class PGIGFPGPPGPKGDQ non-ep. / epitope Machine learning Class Peptide PGIGFPGPPGPKGDQ non-ep. / epitope Attribute 1 Attribute 2

value 1 value 2 ... Class non-ep. / epitope Attribute representation Machine learning Class Peptide PGIGFPGPPGPKGDQ non-ep. / epitope

Attribute 1 Attribute 2 value 1 value 2 ... Class non-ep. / epitope M L Classifier Proability for epitope p

Attribute representation Machine learning Class Peptide PGIGFPGPPGPKGDQ non-ep. / epitope Attribute 1 Attribute 2 value 1 value 2 ...

Class non-ep. / epitope M L Classifier Proability for epitope p Attribute representation Machine learning Class Peptide PGIGFPGPPGPKGDQ non-ep. / epitope

Attribute representation 1 ... M L Classifier 1 Attribute representation 8 M L ... Classifier 8

Machine learning Class Peptide PGIGFPGPPGPKGDQ non-ep. / epitope Attribute representation 1 Attribute representation 8 ... M L M

L Classifier 1 ... Probabilities for epitope p1 p2 p3 p4 p5 Final proability for epitope p

Classifier 8 Class p6 p7 p8 non-ep. / epitope ML Meta classifier Machine learning Class

Peptide PGIGFPGPPGPKGDQ non-ep. / epitope Attribute representation 1 SVM (SMO), Logistic regression Attribute representation 8 ... M L

M L Classifier 1 ... Final proability for epitope p Classifier 8 Linear regression Probabilities for epitope p1 p2

p3 p4 p5 Class p6 p7 p8 non-ep. / epitope ML Meta

classifier Attribute representation 1 Amino-acid counts RRMISRMPIFYLMSG Count of A C D E F G H I 1 2 1 K

L M N P Q R S 1 2 3 1 3 T V W Y 1

Attribute representation 2 Amino-acid count differences RRMISRMPIFYLMSG Difference in counts of FG FI 0 1 FL FM FP FR FS FY GF GI 0 2 0 2

1 0 0 1 ... Attribute representation 3 Subsequence counts RRMISRMPIFYLMSG Count of RR RM MI 1

2 1 ... RRM RMI 1 1 MIS 1 ... ACDE ... ACDEF ... 0 0 Attribute representation 4 Amino-acid class counts

l l l l t l l s l l l l l t t RRMISRMPIFYLMSG bbnnnbnnnnnnnnn Count of tiny small large basic acidic neutral

3 1 11 3 0 12 ... Attribute representation 5 Amino-acid class subsequence counts l l l l t l l s l l l l l t t RRMISRMPIFYLMSG

bbnnnbnnnnnnnnn Count of ll lt tl ls sl tt 8 2

1 1 1 1 ... bb bn nb nn

1 2 1 10 ... Attribute representation 6 Amino-acid pair counts Rationale: antibodies may bind in two places due to their twochain structure. Antibody Peptide Attribute representation 6 Amino-acid pair counts Rationale: antibodies may bind in two places due to their twochain structure.

RRMISRMPIFYLMSG 123 Antibody 3 Peptide Count of pairs at distance (R,R) at 1 (R,M) at 2 (R,I) at 3 1 1 2 ...

(A,C) at 1 (A,C) at 2 0 0 ... Attribute representation 7 Amino-acids at distances from first + first amino acid Rationale: antibodies may bind in two places, first amino acid most accesible on the peptide array. Antibody Peptide Attribute representation 7 Amino-acids at distances from first + first amino acid Rationale: antibodies may bind in two places, first amino acid most accesible on the peptide array.

Antibody R RMISRMPIFYLMSG Peptide Count of at distance ... R at 1 1 ... M at 2 1 ... A at 3 C at 3

0 0 ... First R Attribute representation 8 Average amino-acid properties RRMISRMPIFYLMSG Hydrophobicity Size Polarity

Flexibility Accesibility 0.448 0.596 0.306 0.231 0.376 ... Attribute representation 9 (not used) Amino-acid counts with a difference

RRMISRMPIFYLMS G RRMISRMPIWYLMS G Equivalent for epitope prediction? Attribute representation 9 (not used) Amino-acid counts with a difference RRMISRMPIFYLMS G RRMISRMPIWYLMS G Count F as: 1F 0.8 W 0.4 Y ...

Equivalent for epitope prediction? Count W as: 1W 0.7 F 0.3 Y ... Attribute representation 9 (not used) Amino-acid substitution matrix A A C D ... F W Y

C D ... F W Y 1 1 1 1 0.8 0.4 0.7 1 0.3 1

Attribute representation 9 (not used) Amino-acid substitution matrix A A C D ... F W Y C D ... F

W Y 1 1 1 1 0.8 0.4 0.7 1 0.3 1 Optimize with a genetic algorithm to maximize classification accuracy Results training set Attribute representation Amino-acid counts

Amino-acid count differences Subsequence counts Amino-acid class counts Amino-acid class subsequence counts Amino-acid pair counts Amino acids at distances from the first Average amino-acid properties AUC Accuracy 0.870 80.7 % 0.868 80.3 % 0.867 80.5 % 0.873 81.2 % 0.866

80.5 % 0.865 80.6 % 0.873 81.2 % 0.863 80.3 % Results training set Attribute representation Amino-acid counts Amino-acid count differences Subsequence counts Amino-acid class counts Amino-acid class subsequence counts Amino-acid pair counts Amino acids at distances from the first Average amino-acid properties Combined

AUC Accuracy 0.870 80.7 % 0.868 80.3 % 0.867 80.5 % 0.873 81.2 % 0.866 80.5 % 0.865 80.6 % 0.873 81.2 % 0.863 80.3 %

0.881 83.3 % Results test set Attribute representation / dataset Best single / training set Combined / training set Combined / test set AUC Accuracy 0.873 81.2 % 0.881 83.3 % 0.883 83.7 % Results test set

Attribute representation / dataset Best single / training set (balanced) Combined / training set (balanced) Combined / test set (balanced) Combined / test set (original) AUC Accuracy 0.873 81.2 % 0.881 83.3 % 0.883 83.7 % 0.884 85.9 % Epitope : non-epitope = 1 : 1 Epitope : non-epitope = 1 : 3

Results test set Attribute representation / dataset Best single / training set (balanced) Combined / training set (balanced) Combined / test set (balanced) Combined / test set (original) EL-Manzalawy / test set (balanced) EL-Manzalawy / test set (original) State of the art: SVM + string kernel (EL-Manzalawy et al., 2008) Trained and tested on our data. AUC Accuracy 0.873 81.2 %

0.881 83.3 % 0.883 83.7 % 0.884 85.9 % 0.868 82.0 % 0.874 83.9 % Results test set Our results Balanced: 0.883 / 83.7 % Original: 0.884 / 85.9 % EL-Manzalawy Balanced: 0.868 / 82.0 % Original: 0.874 / 83.9 %

1. Introduction 2. Immune response prediction 3. Interpretation Rules Interpretable classifier: Interpretable attributes (frequencies, properties of amino acids) RIPPER (JRip) to induce rules Rules Interpretable classifier: Interpretable attributes (frequencies, properties of amino acids) RIPPER (JRip) to induce rules Property Aromaticity Low/high

High Applies to peptides 53.8 % If a peptide has a high aromaticity, it binds antibodies. This applies to 53.8 % of peptides that bind antibodies. (Aromaticity is the percentage of aromatic amino acids in the peptide.) Rules Property Aromaticity Polarity Frequency of tyrosine Hydrophobicity Frequency of arginine Summary factor 2 Acidity

Preference for -sheets Summary factor 5 Low/high High Low High Low High High Low Low High Applies to peptides 53.8 % 27.7 % 26.2 % 22.5 %

19.7 % 16.7 % 11.4 % 4.3 % 3.0 % Epitope propensity Frequency in peptides with epitopes, divided by frequency in peptides without epitopes Epitope propensity Aromatic Epitope propensity Non-polar

Epitope propensity Tyrosine (Un)classifiable peptides Simplified classifier: Interpretable attributes (frequencies, properties of amino acids) Logistic regression to train the classifier Peptides All AUC Accuracy 0.860 83.0 % (Un)classifiable peptides

Simplified classifier: Interpretable attributes (frequencies, properties of amino acids) Logistic regression to train the classifier Peptides All Classifiable Unclassifiable Classified correctly AUC Accuracy 0.860 83.0 % Classified incorrectly (Un)classifiable peptides

Simplified classifier: Interpretable attributes (frequencies, properties of amino acids) Logistic regression to train the classifier Peptides All Classifiable Unclassifiable AUC Accuracy 0.860 83.0 % 0.999 98.8 % 0.956 91.5 % Expected Strange?

(Un)classifiable rules Attribute Aromaticity Polarity Frequency of arginine Frequency of tyrosine Summary factor 5 Antigenicity Hydrophobicity Frequency of histidine Frequency of cysteine Preference for reverse turns Occurrence in turns Frequency of alanine Classifiable L/h Applies

High 74.3 % Low 58.7 % High 31.5 % High 20.7 % High 15.1 % High 7.3 % Low 4.7 % Low 3.9 % Unclassifiable L/h

Applies Low 53.3 % High 27.5 % Low 34.0 % Low 16.9 % Low 15.2 % Low 8.7 % High 6.5 % Low High Low High

10.4 % 10.4 % 10.4 % 8.7 % (Un)classifiable rules Attribute Aromaticity Polarity Frequency of arginine Frequency of tyrosine Summary factor 5 Antigenicity Hydrophobicity Frequency of histidine Frequency of cysteine Preference for reverse turns Occurrence in turns

Frequency of alanine Classifiable L/h Applies All: 53.8 % 74.3 % High Low 58.7 % All: 27.7 % High 31.5 % High 20.7 % High 15.1 % High 7.3 % Low

4.7 % Low 3.9 % Unclassifiable L/h Applies Low 53.3 % High 27.5 % Low 34.0 % Low 16.9 % Low 15.2 % Low 8.7 %

High 6.5 % Low High Low High 10.4 % 10.4 % 10.4 % 8.7 % (Un)classifiable epitope propensity (Un)classifiable peptides Simplified classifier: Interpretable attributes (frequencies, properties of amino acids) Logistic regression to train the classifier

Peptides All Classifiable Unclassifiable AUC Accuracy 0.860 83.0 % 0.999 98.8 % 0.956 91.5 % Strange? Not really! Inevitable or does it mean something? 2nd degree (un)classifiable peptides Unclassifiable peptides only

Simplified classifier Peptides All unclassifiable AUC Accuracy 0.956 91.5 % 2nd degree (un)classifiable peptides Unclassifiable peptides only Simplified classifier Peptides AUC Accuracy All unclassifiable 0.956 91.5 % Classified correctly

Classifiable unclassifiable Unclassifiable unclassifiable Classified incorrectly 2nd degree (un)classifiable peptides Unclassifiable peptides only Simplified classifier Peptides All unclassifiable Classifiable unclassifiable Unclassifiable unclassifiable AUC Accuracy 0.956 91.5 % 0.992 97.8 % 0.683

65.0 % 2nd degree (un)classifiable peptides Peptides All unclassifiable Classifiable unclassifiable Unclassifiable unclassifiable AUC Accuracy 0.956 91.5 % 0.992 97.8 % 0.683 65.0 % (Un)classifiable peptides Peptides

All Classifiable Unclassifiable AUC Accuracy 0.860 83.0 % 0.999 98.8 % 0.956 91.5 % Not inevitable! Inevitable or does it mean something? 2nd degree (un)cl. epitope propensity Conclusions Epitopes have common characteristics

Conclusions Epitopes have common characteristics Epitopes are parts of antigens that bind antibodies Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding Conclusions Epitopes have common characteristics Epitopes are parts of antigens that bind antibodies Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding

Epitope characteristics are not unexpected Conclusions Epitopes have common characteristics Epitopes are parts of antigens that bind antibodies Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding Epitope characteristics are not unexpected Two groups of epitopes: around 80 % typical (classifiable) around 20 % atypical (unclassifiable) Conclusions Epitopes have common characteristics Epitopes are parts of antigens that bind antibodies

Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding Epitope characteristics are not unexpected Two groups of epitopes: around 80 % typical (classifiable) around 20 % atypical (unclassifiable) Mostly generalpurpose antibodies? Mostly antigenspecific antibodies?

Recently Viewed Presentations

  • Presentazione standard di PowerPoint

    Presentazione standard di PowerPoint

    In some context, itseemsreasonnablenotonly to include technology in the chemistry curriculum, butalso to start from it.. Explosivegrowth of chemicalknowledge ... Unit 'Metals' of Salters' Science. Vocationaleducation.
  • Restrictive and Nonrestrictive Clauses

    Restrictive and Nonrestrictive Clauses

    RESTRICTIVE (ESSENTIAL) VS. NONRESTRICTIVE (NONESSENTIAL) CLAUSES. 1. Restrictive relative clauses are not set off by commas, while nonrestrictive relative clauses are. 2. As a general rule, the pronoun "that" should be used for restrictive relative clauses, and "which" should be...
  • Memory - faculty.umassd.edu

    Memory - faculty.umassd.edu

    Memory * * * OBJECTIVE 4| Contrast effortful processing with automatic processing, and discuss the next-in-line effect, the spacing effect and the serial position effect.
  • Humanistic

    Humanistic

    How does this relate to the Social Learning Theory? Humanistic. Strive for self-actualization (one's unique potential) Developing sense of self; subjective. Abraham Maslow (Self-actualization) & Carl Rogers ("The Self") How are these people similar?
  • SRSV - Cisco Unity Tools

    SRSV - Cisco Unity Tools

    In order to delete the orphan Videos from Mediasense (MS), a SysAgent task "Clean Orphan Video Recordings" has been created in Connection. This task sends the delete request to MS for deletion of the Video files and removes the orphan...
  • Lecture 1 - KSU

    Lecture 1 - KSU

    sin. and . shame, and often associated with feelings of guilt. Families have hidden away the disabled family member, keeping them out of school and excluded from any chance at having a meaningful role in society. Came about as "modern"...
  • ICCAD2002 Open Source Panel position statement

    ICCAD2002 Open Source Panel position statement

    Non-trivial effort to support platforms, configurations E.g., Magma was using an older C++ compiler we could not support Compilers are changing all the time Unfair comparisons and claims Make simple preprocessor claim new tool with better results Tune competing software...
  • Diapositiva 1 - Vicaría Educación

    Diapositiva 1 - Vicaría Educación

    Argumento Fuente El mundo del emprendimiento está dominado por personas con dislexia o déficit atencional Logan, J. (2009), "Dyslexic Entrepreneurs: The Incidence; Their Coping Strategies and Their business Skills". Dislexia. En la neurodiversidad existen una gran variedad de talentos Amstrong,...