From patterns to pathways. Causal analysis of gene

From patterns to pathways. Causal analysis of gene

From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany [email protected] www.biobase.de Pathway builder Array analyser TRANSPATH - mechanistic - semantic S/MARt DB Patho DB TRANSFAC Match Patch Catch CMFinder TRANSCompel Cytomer TRANSGenome TRANSPLORER The TRANSFAC System comprises 7 databases: TRANSFAC Professional Suite TRANSFAC Professional Transcription factor database

TRANSCompel Professional Composite elements database PathoDB Professional Pathologically altered transcription factors TRANSPRO Professional Collection of human promoter sequences S/MARt DB Professional Scaffold or Matrix Attached Regions databases Cytomer Ontology of cells, structures, organs TRANSPATH Professional Signal transduction pathways TRANSFAC Professional Transcription factor database trans cis Human genes Sequences and positions of AP-1 binding sites

glutathione Ptransferase hemoglobin, epsilon enhancer at -2500 TGATTTTTT -80 .. TGACATC Akt-2 IFN- -100 .. TGTCACC -89 .. Apo II TGACTCA -792 .. Melanotransferin TGAGTCA -2013 .. Collagenase TGAGTCA -72 .. proto-oncogene c-myc porphobilinogen deaminase TGATTTA

-335 .. TGACTCA -162 .. GM-CSF TGACTCA enhancer at -3500 Structure of regulatory regions of eukaryotic genes AP-1 AP-1 AP-1 CBF AP-1 NF-B NF-B c-Rel/p65 p50/p65 GM-CSF Homo sapiens CBF AP-1 TATTT NFAT NFAT CE NFAT

NFAT CE CE T-cell specific inducible enhancer at 3500 bp NFAT HMG Y(I) -114 -88 CD28 response element -54 CE Promoter ST +1 Protein-DNA and protein-protein interactions in gene transcriptional regulation. S 1 S 2 S 3 TF 2 TF 3 TF 1 T F IID

T F IIF T F IIA T F IIB T F IIE T F IIH R N A p ol II H is to n e a c e ty la se Transcription factors Sequencespecific DNA binding Non-DNA binding HAT Layer III Coactivator Layer II Layer I DN A adapter TF1 TF2 TF3

TF 4 TRANSFAC: relational scheme CLASS SPECIES FEATURES interacting factor SYNONYMS FACTOR MATRIX CELL Q gene expression METHOD SITE regulatory region SEQUENCE FUNCTIONAL ELEMENT GENE coding region Manual annotation of the databases: input client TRANSFAC: GENE table TRANSFAC: SITE table Structure of transcription factors

USF-1, dimer Structure of transcription factors oligomerization domain Ligandbinding domain Activation domain Protein-protein interaction domain DNA binding domain TRANSFAC: FACTOR table, protein sequence TRANSFAC: FACTOR table, protein domains TRANSFAC: FACTOR table, structural and functional features TRANSFAC: FACTOR table, links to other databases TRANSFAC: classification of transcription factors TRANSFAC: CLASS table TRANSFAC 8.1 (2004-03-31): number of factor entries for different species 1400 human plants 1200 1000

mouse other vertebrates 800 600 Fungi rat Other 400 fruit fly 200 0 TRANSFAC 8.1 (2004-03-31): distribution of experimentally known TFBS in 5 regions of genes. 800 700 600 500 400 300 200 100 0 TRANSFAC: FACTOR table, protein-DNA and proteinprotein interactions TRANSFAC: MATRIX table TRANSCompel Professional Composite elements database Mouse Interleukin-2 gene promoter

AP-1 COMPEL:C00050 NF-ATp ....... tgccacacaggtagactcttTTGAAAATAtgTGTAATAtgtaaaa catcgtgaca cccccatatt -96 -79 TGAGTCA AP-1 consensus ST Composite elements Minimal functional units where both protein-DNA and protein-protein interactions contribute to a highly specific pattern of gene expression and provide cross-coupling of different signal transduction pathways. F2 F1 Low level of transcription Low level of transcription F1 F2 Synergistic activation of transcription F1

F2 Combinatorial regulation by the composite elements N 1. Gene IgH ** , Mus musculus 2. IL-2, Homo sapiens 3. 4. Ets -283 : 6. Serum amyloid 1, Rattus norv IRF-1, Mus musculus AP-1 -167 : NF-B Il-2, Mus musculus

IgH ** , Homo sapiens AP-1 -268 : NFAT IL-2, Homo sapiens 5. 7. Scheme of CE -167 : -142 : AP-1 -142 : AP-1 Ets -117 : STAT-1 CBF -73 :

C/EBP -123 : Oct-2 -113 : NF-B -49 : -40 : NF-B Ternary complex NFATp - AP1 - DNA flat files Description of an evidence (experiment, cell type, two individual interactions) Link to the TRANSFAC GENE table Link to EMBL Link to the TRANSFAC FACTOR table Cross-coupling of signal transduction pathways M e m b ra n e re c e p to r C a2+ dependent canal

S rc Ras SH2 Ras SH3 P h o s p h o r y la t io n C a2+ Ca2+ G TP G DP PLC A d a p to rs P I3 -K C a2+ c y to p la s m IP 3 C a lc in e u r in P K B /A k t P N FATp ERK JN K N FATp

ERK N FATp N u c le u s c -F o s P38M APK JN K c-F o s IL -2 P P c -J u n -F o s c -J u n C o m p o s it e e le m e n t P38M APK c -J u n A T F -2 c -J u n A T F -2 A T F -2 Inducible/inducible 19 CEs ETS / AP-1 providing cross-coupling of Ras/Raf- and PKC-dependent signalling pathways; 15 CEs NFATp / AP-1 providing cross-coupling of Ca2+ - and PKC-dependent signalling pathways; Tissue-specific

32 Inducible 44 Cell-cycle dependent Dev. stagedependent Ubiquitous constitutive F1 F2 14 CEs NF-B / C/EBP NF-B is inducible by IL-1 and TNF-; C/EBP is inducible by IL-6. 119 1 2 39 Tissuespecific 2 3 60 Inducible 2 Cellcycle dep. 12 Dev. stagedependent Ubiquit. constitut. Inducible/constitutive

9 CEs ETS / Sp1 ETS factors are inducible through Ras/Raf- dependent signalling pathway; 5 CEs Smad / TEF3 Smads are inducible by TGF- signalling. Tissue-specific 32 Inducible 44 Cell-cycle dependent Dev. stagedependent Ubiquitous constitutive F1 F2 119 1 2 39 Tissuespecific 2 3 60 Inducible 2 Cellcycle dep. 12 Dev. stagedependent

Ubiquit. constitut. Inducible/tissue-restricted CEs Pit-1 / AP-1 Pit1 is pituitary-restricted transcription factor whereas AP-1 and Ets are ubiquitous inducible factors; Tissue-specific 32 Inducible 44 Cell-cycle dependent Dev. stagedependent Ubiquitous constitutive F1 F2 119 1 2 39 Tissuespecific 2 3 60 Inducible 2 Cellcycle dep. 12

Dev. stagedependent Ubiquit. constitut. Mechanisms of functioning of synergistic composite elements 1) F1 F2 S S1 S 2 C o o p e r a tiv e b in d in g t o D N A a n d te r n a r y c o m p le x fo r m a tio n F2 F1 S 1 2 2) F1 F2 S S1 S

2 3) F1 S 1 F2 F1 S 2 S im u lta n e o u s in t e r a c t io n o f a c tiv a tio n d o m a in s w ith th e c o m p o n e n ts o f t h e b a s a l c o m p le x F2 S 1 A n e w p r o te in s u r f a c e f o r D N A r e c o g n it io n c o u ld b e fo rm e d 2 Mechanisms of functioning of synergistic composite elements 4) F1 F2 S1 S2

F o r m in g a n e w p r o t e in s u r fa c e f o r in t e r a c t io n w it h th e b a s a l c o m p le x 5) F1 F1 s1 F2 F2 s2 Relief of autoinhibition as a result of proteinprotein interactions Mechanisms of functioning of synergistic composite elements 6) D N A b e n d in g b y o n e o f th e tr a n s c r ip tio n fa c to r s F1 S 1 F2 S2 7) D N A w r a p p in g a r o u n d a n u c le o s o m e a llo w s t r a n s c r ip tio n f a c to r s t o in t e r a c t F1 F2 8) H A T c o m p le x

F1 F2 S1 S2 R e c r u it m e n t o f a H A T c o m p le x b y o n e o f th e t r a n s c r ip tio n fa c to r s Mechanisms of functioning of antagonistic composite elements 1) H A T c o m p le x M u tu a lly e x c lu s iv e b in d in g o f fa c to r F 1 ( a c tiv a to r ) a n d F 2 (re p re s s o r) H D A C c o m p le x Mechanisms of functioning of antagonistic composite elements 2) H A T c o m p le x B in d in g o f F 2 ( r e p r e s s o r ) r e s u lts in th e c o n fo r m a tio n a l c h a n g e s o f F 1 ( a c t iv a t o r ) H D A C c o m p le x TRANSPATH Professional Database on signal transduction pathways TRANSPATH: map of IFN pathway TRANSPATH TRANSFAC

TRANSPATH: molecules Extracellular ligand Membrane receptor Adaptor Second messanger Kinase(s) Transcription factor Target gene TRANSPATH: molecule hierarchy IL-1/Toll receptor family family TLRs family complexes TLR4 TLR4(h):MyD88(h) TLR4(h) TLR4(h)pp TLR5 TLR4(m) ortholog TLR5(h) TLR4a(h)

basic TLR4b(m) isoform modified form TRANSPATH: reactions Enzyme Educts Products Binding Phosphorylation Dephosphoralation Degradation Acetylation Dissociation Transregulation Expression Activation ... The elementar reaction step C R A B Reaction R, catalyzed by catalyst C, converts substance A into substance B. TGF 1 R1

Pathway steps: T: TR2p TGF R-II NTP Pathway steps depict the signaling in a more biochemical way. NDP R2 TGF R-I T: TR2p :TR1p R3 Sma d2 Sma d2p R4 Sma d4 S2P: S4 R5 gene

tc In a semantic reaction, just individual key molecules are given. Semantic: TGF1 TGF-RII TGF-RI Smad2 Smad4 gene R1 R2 R3 R4 R5 Info about a specific molecule Many synonyms make sure, that you find your protein Parts of a molecule entry External database links allow identification of proteins easily Specific molecule (cont.) Disease information and GO terminology localization of human APP

Opens data entry of a specific reaction Parts of a molecule entry Specific reaction of APP(h) Evaluation of this reaction is based on experimental evidences Part of a reaction entry Signal transduction pathways Extracellular ligand Membrane receptor Adaptor Second messanger Kinase(s) Transcription factor Target gene Connecting path between two molecules Connection between one specific molecule (magenta) and a group of molecules (transcription factors in blue) Oncostatin M pathway B-cell antigen receptor pathway PDGF pathway Insulin pathway

Overview of a pathway hand-drawn map TRANSPATH: number of entries 12000 10000 8000 6000 4000 2000 0 Releas e Profes s ional Releas e Profes s ional Releas e Profes s ional 2.1 2.4 3.1 m olecules reactions references Statistics: TRANSPATH 5.1 and NetPro 1.1 Main tables Molecule Reaction Reference 18029 20199 8258 Molecules of mammalian origin Human 2503 Mouse 1653 Rat 810 + NetPro

+ 7333 + 30316 + 9582 3521 2025 1224 Prediction 26 588 predicted human gene products of which 30.8% (~9000) seem to be signal transduction relevant (Venter et al., 2001) => 28% coverage of predicted proteins in TRANSPATH TRANSFAC System From patterns to pathways Array analysis The starting point: A set of induced genes from microarray experiments Array analysis KEGG The conventional analysis: deduce the gene products and map them to the network of metabolic pathways biochemical effects Array analysis TRANSPATH Extension of conventional analysis:

map the induced gene products to the network of regulatory pathways biological effects Array analysis Identification of new targets KEGG TRANSPATH Reasoning of experimental findings: promoter analysis of induced genes connected to network mapping Array analysis Promoter analysis identifies additional target genes and extends the affected network promoter model TRANSGENOME database additional predicted genes extended predicted network Array analysis new target network analysis Causes TRANSPATH

promoter analysis TRANSFAC retrieval of upstream sequences TRANSGENOME microarray: set of induced genes assignment of gene products KEGG TRANSPATH regulatory network mapping metabolic network mapping Effects modeling of effects indirect hints on causes trans cis ? A C G T 9 8 4 8

N 2 3 2 22 T 1 1 2 25 T 0 1 2 26 T 1 13 15 0 S 0 3 26 0 G 0 29 0 0 C 0 0 29

0 G l 0 22 7 0 C 1 8 17 3 S l I (i ) f (b , i ) I (i ) f i q i 1 15 9 3 2 M min I (i ) f max 13 4

9 3 R 7 8 8 6 N (i ) i 1 l 13 1 7 8 D (1 ) (i ) i 1 I (i) f ( b , i ) ln( 4 f ( b , i )) b { A ,T , G , C } (2 ) TRANSPLORER (TRANScription exPLORER) is a software package for the analysis of transcription regulatory sequences. Currently, TRANSPLORER site prediction tool uses position weight matrices (PWM) collections. It is able to use several matrix sources: the largest and most up-to-date library of matrices derived from TRANSFAC Professional

database, other matrix libraries as well as any user-developed matrix libraries. This means that it provides an opportunity to search for a great variety of different transcription factor binding sites. A search can be made using all or subsets of matrices from the libraries. Search for most probable binding sites regulating gene expression Search for binding sites coinsiding with SNPs Mouse c-fos promoter (Matrix search for TF binding sites) 1 <------------V$IK1_01(0.86) -----...V$CREBP1CJUN_01(0.85) 2 <-----------V$IK2_01(0.90) -----...V$CREB_01(0.96) 3 ----------->V$AP2_Q6(0.87) <-------------V$GKLF_01(0.87) 4-->V$ATF_01(0.89) <-------V$MZF1_01(0.99) ----...V$ELK1_01(0.87) 5 <-----------V$AP2_Q6(0.92) <------------V$SP1_Q6(0.88) 6>V$AP1FJ_Q2(0.89) <-------------V$GKLF_01(0.85) 7>V$AP1_Q2(0.87) <-------------V$GKLF_01(0.86) 8->V$CREB_Q2(0.86) <---------V$CETS1P54_01(0.90) 9->V$CREB_Q4(0.90) <---------V$NRF2_01(0.90) 10 <-------------V$GC_01(0.88) 11 ----------->V$CAAT_01(0.87) 12 <------------V$TCF11_01(0.87) 13

----------->V$AP2_Q6(0.87) 14 <---------V$USF_Q6(0.93) 16 --------...V$ATF_01(0.94) 17 -------...V$AP1FJ_Q2(0.95) 20 -------...V$CREBP1_Q2(0.93) 21 -------...V$CREB_Q2(0.95) 23 ---...V$IK2_01(0.85) MMCFOS_1 GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG 420 1-->V$CREBP1CJUN_01(0.85) -------------->V$BARBIE_01(0.86) 2-->V$CREB_01(0.96) -------------->V$TATA_01(0.95) 3 ----------->V$CAAT_01(0.91) --------->V$AP4_Q5(0.95) 4----------->V$ELK1_01(0.87) --------------------->V$HEN1_01(0.87) 5 --------->V$AP4_Q5(0.88) <---...V$CMYB_01(0.93) 6 <---------V$CDPCR3HD_01(0.93) --...V$VMYB_02(0.89) 7 <--------------V$TATA_01(0.88) 8 --------------------->V$HEN1_02(0.87) 9 <---------------------V$HEN1_02(0.86) 10 <-----------------V$AP4_01(0.88) 11 ----------->V$LMO2COM_01(0.93)

12 <-----------V$LMO2COM_01(0.93) 13 <-----------V$MYOD_01(0.88) 17--->V$AP1FJ_Q2(0.95) <---------V$AP4_Q6(0.99) 20---->V$CREBP1_Q2(0.93) <---------V$MYOD_Q6(0.96) 21---->V$CREB_Q2(0.95) Transcription start 23-------->V$IK2_01(0.85) 24 <=========== E2F (0.80) MMCFOS_1 TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC 480 1 <-----------------V$CMYB_01(0.91) -------...V$ER_Q6(0.86) 2 <-----------V$LMO2COM_01(0.90) <----...V$TCF11_01(0.87) 3 --------->V$MYOD_Q6(0.90) -------->V$STAT_01(0.93) 4 --------->V$VMYB_01(0.89) <--------V$STAT_01(0.89) 5--------------V$CMYB_01(0.93) -------->V$LMO2COM_02(0.93) 6------>V$VMYB_02(0.89) <-----------V$CAAT_01(0.85) 7 -------->V$VMYB_02(0.88) 8 -------------->V$EVI1_04(0.86) 9 ------------->V$GATA1_02(0.93) 12 <------------V$ZID_01(0.85) 13

<----------V$CP2_01(0.97) 14 ---------->V$GATA_C(0.92) 15 ----------------->V$CMYB_01(0.86) 16 --------->V$CREL_01(0.91) 24 <=========== E2F (0.82) MMCFOS_1 CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA 540 Exon 2 sequence of human thyroid transcription factor-1 (TTF-1) gene (HS198161) (Matrix search for TF binding sites) HS198161_1 1------------V$AHRARNT_01(0.90) <-----------------V$NF1_Q6(0.85) 2--------V$NMYC_01(0.89) --------->V$AP4_Q5(0.91) 3------>V$USF_Q6(0.89) --------->V$AP4_Q6(0.85) 4------V$USF_C(0.86) ------------...V$YY1_02(0.86) 5 --------->V$AP4_Q5(0.91) 6 --------->V$AP4_Q6(0.86) 7 --------->V$AP4_Q5(0.92) 8 --------->V$AP4_Q6(0.86) 9 --------->V$AP4_Q5(0.86) ACGCGCAGCAGCAGGCGCAGCACCAGGCGCAGGCCGCGCAGGCGGCGGCAGCGGCCATCT 540 HS198161_1 1 ----------------->V$NF1_Q6(0.96)

2 <-----------------V$NF1_Q6(0.90) 3 --------->V$USF_Q6(0.87) 4------->V$YY1_02(0.86) ---------->V$CP2_01(0.88) 5 --------->V$AP4_Q5(0.92) ----------->V$CAAT_01(0.85) 6 --------->V$AP4_Q6(0.85) --------->V$AP4_Q5(0.86) 7 ------...V$CP2_01(0.86) ===========> E2F (0.81) 8 ===========> E2F (0.90) 9 CCGTGGGCAGCGGTGGCGCCGGCCTTGGCGCACACCCGGGCCACCAGCCAGGCAGCGCAG 600 HS198161_1 1 <---------V$CETS1P54_01(0.89) <--------...V$GATA_C(0.86) 2 ----------------->V$NF1_Q6(0.85) <-------...V$GATA1_02(0.90) 3 --------->V$CETS1P54_01(0.90) <-------...V$GATA1_03(0.92) 4 <--------------------V$R_01(0.88) <-----...V$LMO2COM_02(0.90) 5 <---------------V$AHRARNT_01(0.86) 6 ----------->V$AP2_Q6(0.95) 7---->V$CP2_01(0.86) <-------...V$GATA1_04(0.87) 8 <----...V$CETS1P54_01(0.87) ===========>

E2F (0.80) 9 GCCAGTCTCCGGACCTGGCGCACCACGCCGCCAGCCCCGCGGCGCTGCAGGGCCAGGTAT 660 1--V$GATA_C(0.86) <---------V$CETS1P54_01(0.89) 2------V$GATA1_02(0.90) --------...V$DELTAEF1_01(0.96) 3------V$GATA1_03(0.92) <---...V$CEBPB_01(0.88) 4---V$LMO2COM_02(0.90) 5 <-----------V$IK2_01(0.92) 6 <---------------V$E47_02(0.87) 7-----V$GATA1_04(0.87) 8-----V$CETS1P54_01(0.87) 9 <--------------V$E47_01(0.86) 10 ---------->V$DELTAEF1_01(0.99) 11 <-----------V$LMO2COM_01(0.94) 12 <-----------V$MYOD_01(0.87) 13 --------->V$MYOD_Q6(0.91) 14 ------->V$USF_C(0.93) HS198161_1 CCAGCCTGTCCCACCTGAACTCCTCGGGCTCGGACTACGGCACCATGTCCTGCTCCACCT 720 Enhanceosome Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the W, X, X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined Wbinding protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to here as the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the W, X, X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF1) are not required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation domains (AD), which contact the RNA polymerase II basal transcription machinery. Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66

Recognition method for T-cell specific Composite Elements NFAT/AP-1 AP-1 NFATp 5 ..WRGAAAA.. ..TGASTCA..3 8-12 bp A C G T 1 2 3 4 5 6 7 8 5 5 8 8 12 1 2 11 2 0 26 0 0 0 23 26 0

1 0 0 25 0 1 0 25 1 0 0 15 5 2 4 A C G T NFAT = -log(1-scoreNFAT) 1 2 3 4 5 6 7 8 9

19 3 16 9 4 2 5 36 4 36 3 2 4 13 33 2 29 8 5 2 0 0 0 47 2 44 0 1 47 0 0 0 2 8 24 13 AP-1 = -log(1-scoreAP-1) 6,7 5,7

4,7 NFAT/AP-1 (training) Random 3,7 2,7 Composite score 1.47 AP 1 4.7 wCE 17,0 NFAT NFAT 0.88 AP 1 3.5 1,7 0,7 0,7 1,2 1,7 2,2 2,7 3,2 3,7 4,2 4,7 Selection of motifs with high frequency in a window motif: WSG

TTTGGCGCGAAA window: [ ] Promoters of cell-cycle genes: ............. Exon 2 sequences: ............. } } Frequency of the motifs in the window Motifs found in the local context of E2F sites in promoters of cell cycle-related genes N M o tif( ) W in d o w 1 ) ( w ) [ 2

7 ,3 4 ] [ 3 9 ,4 1 ] [ 1 7 ,3 8 ] [ 1 3 ,1 6 ] [ 1 7 ,4 6 ] [ 2 1 ,2 6 ] [ 3 ,6 9 ] Y f

N2) f U tility 0 .0 0 4 8 /0 .0 0 4 1 = 1 .1 7 9 0 .0 1 1 2 /0 .0 0 3 2 = 3 .5 3 6 0 .0 8 5 1

/0 .0 3 4 1 = 2 .4 9 9 0 .0 6 7 5 /0 .0 0 9 5 = 7 .0 7 1 0 .1 2 3 3 /0 .0 5 3 6 = 2 .2 9 9 0

.0 3 3 7 /0 .0 0 0 0 0 .0 9 8 0 /0 .0 5 5 9 = 1 .7 5 4 0 .8 0 0 .7 5 0 .9 0 0 .7 9 0 .7 2 0

.8 0 0 .8 2 0 .3 9 4 0 .9 6 1 8 0 .5 3 5 3 0 .5 9 0 4 0 .2 2 3 0 .5 0 3 6 0 .5 9 5 0 .0 9

5 0 .2 2 9 7 0 .2 6 1 0 .5 6 6 = 5 .6 7 6 7 M G C G T T T C G S K H K C G V D W W

D W T T G S D M 8 V W S [ 7 ,6 6 ] 0 .1 2 5 8 /0 .1 9 3 2 = 0 .6 5 1 0 .9 1

9 1 0 1 1 H S W Y V T V B A Y [ 2 6 ,6 5 ] [ 1 9 ,3 4 ] [ 7 ,6 5 ] 0 .0 4 1 3 /0

.0 8 1 3 = 0 .5 0 8 0 .0 4 2 7 /0 .1 3 5 4 = 0 .3 1 5 0 .0 2 7 4 /0 .0 6 1 4 = 0 .4 4 7 0

.7 9 0 .7 1 0 .7 8 Negativsetics characteri Positivestics characteri 1 2 3 4 5 6 7 i Score of context: k d ( X ) i f (i , wi , X ) i 0 Human uracil DNA-glycosylase (E2F sites) -1000 +1 1000 3000 5000

7000 9000 + score of context -1000 +1 1000 3000 ttTTTGCCGCGAAAag 5000 q=0.92 7000 d=2.8 (known site) 9000 SITEVIDEO system Building of E2F site recognition program (step 2) SITEVIDEO system Building of E2F site recognition program (step 3) Composite modules w (1) 1 s s s

( 2) 2 (k ) (k ) 1 ... nk ... s s (1) qcut off ( 2) qcut off ... (k ) qcut off (1) ( 2) ... (k ) C max w ( 2) 1 (k )

(k ) q avr ( w) k 1, K K - number of TF matrixes Start of transcription ... Parameters of the model to be estimated (k ) q ( s q ( w) i ) (k ) avr i 1, nk (k ) (k ) q ( si ) qcut off (k ) si w Composite modules w (1) 1 s

( 2) 1 s s ( 2) 2 (k ) (k ) 1 ... nk ... s Start of transcription s (1) qcut off ( 2) qcut off ... (k ) qcut off (1) ( 2) ...

(k ) Genetic Algorithms ... Parameters of the model to be estimated Composite module in promoters of cell cycle-related genes Weight: qcut off TF matrix 1.000000 0.840072 V$E2F_19 0.954483 0.737637 V$TATA_01 0.888064 0.939687 V$CREB_01 0.816179 0.941583

V$SP1_Q6 0.039746 0.839702 V$TAL1BETAE47_01 40 Exon-2 sequences Cell cycle-related promoters No of sequences 30 20 (k ) C ( k ) qcut off 10 k 1, 5 0 -0,5 0,0 0,5 1,0 1,5

2,0 2,5 3,0 3,5 4,0 1 <------------V$IK1_01(0.86) -----...V$CREBP1CJUN_01(0. 2 <-----------V$IK2_01(0.90) -----...V$CREB_01(0.96) 3 ----------->V$AP2_Q6(0.87) <-------------V$GKLF_01(0.87) 4-->V$ATF_01(0.89) <-------V$MZF1_01(0.99) ----...V$ELK1_01(0.87) 5 <-----------V$AP2_Q6(0.92) <------------ V$SP1_Q6(0.88) 6>V$AP1FJ_Q2(0.89) <-------------V$GKLF_01(0.85) 7>V$AP1_Q2(0.87) <-------------V$GKLF_01(0.86) 8->V$CREB_Q2(0.86) <---------V$CETS1P54_01(0.90) 9->V$CREB_Q4(0.90) <---------V$NRF2_01(0.90) 10 <-------------V$GC_01(0.88) 11 ----------->V$CAAT_01(0.87) 12 <------------V$TCF11_01(0.87) 13

----------->V$AP2_Q6(0.87) 14 <---------V$USF_Q6(0.93) 16 --------...V$ATF_01(0.94) 17 -------...V$AP1FJ_Q2(0.95) 20 ------- ...V$CREBP1_Q2(0.93 21 ------- ...V$CREB_Q2(0.95) 23 ---...V$IK2_01(0.85) MMCFOS_1 GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG 420 Mouse c-fos promoter 1-->V$CREBP1CJUN_01(0.85) -------------->V$BARBIE_01(0.86) 2-->V$CREB_01(0.96) --------------> V$TATA_01(0.95) 3 ----------->V$CAAT_01(0.91) --------->V$AP4_Q5(0.95) 4----------->V$ELK1_01(0.87) --------------------->V$HEN1_01(0.87) 5 --------->V$AP4_Q5(0.88) <---...V$CMYB_01(0.93) 6 <---------V$CDPCR3HD_01(0.93) --...V$VMYB_02(0.89) 7 <-------------- V$TATA_01(0.88) 8 --------------------->V$HEN1_02(0.87) 9 <---------------------V$HEN1_02(0.86) 10

<-----------------V$AP4_01(0.88) 11 ----------->V$LMO2COM_01(0.93) 12 <-----------V$LMO2COM_01(0.93) 13 <-----------V$MYOD_01(0.88) 17--->V$AP1FJ_Q2(0.95) <---------V$AP4_Q6(0.99) 20----> V$CREBP1_Q2(0.93) <---------V$MYOD_Q6(0.96) 21----> V$CREB_Q2(0.95) Transcription star 23-------->V$IK2_01(0.85) 24 <----------- E2F (0.80) MMCFOS_1 TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC 480 Cell cycle composite module 1 <-----------------V$CMYB_01(0.91) -------...V$ER_Q6(0.86) 2 <-----------V$LMO2COM_01(0.90) <----...V$TCF11_01(0.87) 3 --------->V$MYOD_Q6(0.90) -------->V$STAT_01(0.93) 4 --------->V$VMYB_01(0.89) <--------V$STAT_01(0.89) 5--------------V$CMYB_01(0.93) -------->V$LMO2COM_02(0.93) 6------>V$VMYB_02(0.89) <-----------V$CAAT_01(0.85) 7 -------->V$VMYB_02(0.88) 8

-------------->V$EVI1_04(0.86) 9 ------------->V$GATA1_02(0.93) 12 <------------V$ZID_01(0.85) 13 <----------V$CP2_01(0.97) 14 ---------->V$GATA_C(0.92) 15 ----------------->V$CMYB_01(0.86) 16 --------->V$CREL_01(0.91) 24 <----------- E2F (0.82) MMCFOS_1 CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA 540 MMCFOS_1 1----------->V$ER_Q6(0.86) 2--------V$TCF11_01(0.87) 3 --------->V$AP4_Q5(0.91) 4 --------->V$AP4_Q6(0.87) 5 ---------->V$AP1FJ_Q2(0.93) 6 ---------->V$AP1_Q2(0.90) 7 ---------->V$AP1_Q4(0.87) 8 <-----------V$IK2_01(0.94) GCAGTGACCGCGCTCCCACCCAGCTCTGCTCTGCAGCTCC 580 Computationally predicted E2F target genes confirmed by in vivo footprint Gene Score ,q

(+) aaGCTCGCGCCACTgc (-) gcAGTGGCGCGAGCtt (-) gtCTTCGCGCGCGCtc Position rel. start of transcription -165 .. -176 -92 .. 103 -90 .. 79 -78 .. 89 79 .. 90 91 .. 80 169 .. 158 -513 .. -502 -298 .. -287 28 .. 39 40 .. 29 85 .. 96 -1384 .. -1395 -1009 .. -1020 -739 .. -750 -589 .. -578 -265 .. -276 -491 .. -502 -409 .. -420 -377 .. -366 -175 .. -164 -93 .. -82 -187 .. -176 -175 .. -186 8 .. 19 20 .. 9 -270 .. -259 -258 .. -269 -28 .. 39 (-) gtCCTGGCGCGCGGgc (+) cgCTTGGCGGGAGAta

-72 .. 83 -53 .. -42 0.83 0.87 1.18 -296 -> +14 <- (-) ttTTTGGCGCCGGCtg (-) ccGTGGGCGCGCGGgt -297 .. -308 -256 .. -267 0.97 0.81 2.91 -407 -> -41 <- CSNUCLEO (-) cgTTTGGCGCGGCTtg -296 .. -307 0.97 6.67 -538 -> -198 <- MMNUCLE (-) agTTTGGCGCGGCTtg

-306 .. -317 0.97 1.76 -531 -> -232 <- EMBL Chromatin crosslinking c-fos, Hs HSFOS JunB, Hs HS207341 tgf-1, Hs HSTGFB1P R Immunoprecipitation p14ARF, Hs Mcm4 (Cdc21), Hs mcm5 (P1cdc46), Hs PCR Von HippelLindau (VHL), Hs B-myb, Hs

AF082338 HSU63630 HS286B10 AF010238 HSBMYBD NA nucleolin, Hs nucleolin, Cg nucleolin, Ms HSNUCLEO O Sequence of the potential sites (-) (-) (+) (-) gcCTTGGCGCGTGTcc ggGGTGGCGCGCGGgc ccTCTGGCGCCACCgt acGGTGGCGCCAGAgg (+) gcTATCGCGCCAGAga (-) tcTCTGGCGCGATAgc (-) ggGCTGGCGCGGGCgg (+) (+) (+) (-)

(+) ctGTTTGCGGGGCGga ccCTTCGCGCCCTGgg ctCTTGGCGCGACGct agCGTCGCGCCAAGag ccTTTGCCGCCGGGga (-) (-) (-) (+) (-) ctCTCCGCGCGCGGga gtCTTGGCGACCGTtg ggCCTGGCGCCGGAct tgATTGGCGGATAGag acTTTCCCGCCCTGtg (-) (-) (+) (+) (+) gtTTTCGCGGGAAAac ctTTCAGCGCCCGTgc gcAGTGGCGCCTCCcg ggCGTGGCGCGGAGcc ctTGTCGCGCAGGTac (+) (-) (+) (-) agTTTCGCGCCAAAtt aaTTTGGCGCGAAAct ttTTTCCCGCGAAAct agTTTCGCGGGAAAaa

0.92 0.84 0.88 0.83 0.89 0.91 0.82 0.80 0.91 0.93 0.83 0.85 0.81 0.81 0.81 0.83 0.86 0.93 0.82 0.80 0.83 0.86 0.99 1.00 0.89 0.93 0.81 0.84 0.92 Score of context, d 2.92 Positions of PCR primers -201 -> +96 <-

-27 -> +313 <3.17 2.03 -122 -> +210 <- 4.11 -404 -> -143 <- 3.53 -667 -> -330 <- 4.39 4.91 -211 -> +88 <- 3.01 4.21 -137 -> +123 <2.22 G1 G1/S S G2 G1 G1/S

S G1/S-growth G1/S-cycle G2 Results of selection of a specific combinations of sites that distinguish G1/S cycle and G1/S growth promoters. (microarray data) a) R e la t iv e im p o rt a n c e C u t - o f f v a lu e (k ) q 0 .1 4 1 4 2 0 0 .3 8 9 9 4 1 0 .9 0 5 3 2 5 - 0 .5 9 5 2 5 9 - 0 .9 8 2 5 9 3 - 0 .8 1 4 9 4 3 M a t r ix A C M a t r ix I D M M M M M M V $E2F_19 V $A P4_Q 5

V $ IK 3 _ 0 1 V $PA X 2_01 V $C A P_01 V $O C T1_03 E2F and a set of additional factors can distinguish these two sets of promoters. AP-4 factors an ubiquitous factor that have similar structure of DNA binding domains as E2F and Myc main cell cycle regulators; IK3 (Ik-1...Ik-5 - a family of zink finger TF that play a role in development of the lymphocytes). Pax-2 factor is known to be involved in regulating cell cycle by inhibiting the p53 transcription. It is known that Oct-3 differentially phosphorylated during cell cycle and may have a role in the regulation of the G1/S growth promoters. As for Cup site, it was already speculated that the structure of the basal promoter may play an important role in differentiating gene expression during cell cycle (k ) cut off 0 .9 2 3 0 7 7 0 .9 4 7 4 3 4 0 .8 3 8 1 0 6 0 .8 5 6 0 5 5 0 .9 9 7 6 3 9 0 .7 3 4 6 9 7 10009 00175 00088 00098

00253 00137 b) Histogram of G1/S cycle vs. G1/S growth 5 4 Noof obs 3 2 1 0 -1,8 -1,6 -1,4 -1,2 -1,0 -0,8 -0,6 -0,4 -0,2 Site combination score 0,0 0,2

0,4 0,6 0,8 1,0 1,2 1,4 1,6 ... Jun Fos TGASTCA AP-1 NFAT human TNF promoter -107 AP-1 mast cells -74 NFAT T-cells NF-kB dendritic cells

VDR AP-1 C/EBP T-cells + ? Fuzzy puzzle hypothesis of the multipurpose structure of the eukaryotic promoters: of coding multiple regulatory messages in the same DNA sequence. A,B,C and D,E,F two sets of TF; 1,2 two sites in DNA; BC basal complex. A B BC C 1 2 D E BC F 1 2 Theres More Then One Way To Do It (Convergent evolution) AXX list of genes RefSeq

LocusLink symbol synonyms NM_ 002421 4312 MMP1 CLG, CN2 matrix metalloproteinase 1 (interstitial collagenase) NM_ 004530 4313 MMP2 CLG4, CLG4A matrix metalloproteinase 2 (gelatinase A, 72kD gelatinase, 72kD type IV collagenase) NM_ 000611 NM_ 001972 NM_ 005317 NM_ 005532 966 1991 3004 3429 CD59 ELA2 GZMM IFI27 CD59 antigen p18-20 (antigen identified by monoclonal MSK21, MIC11, MIN2, MIN1, MIN3

antibodies 16.3A5, EJ16, EJ30, EL32 and G344) elastase 2, neutrophil LMET1, MET1 granzyme M (lymphocyte met-ase 1) P27 interferon, alpha-inducible protein 27 NM_ 001548 NM_ 000565 NM_ 001565 NM_ 001572 NM_ 005564 NM_ 005567 3434 3570 3627 3665 3934 3959 IFIT1 IL6R SCYB10 IRF7 LCN2 LGALS3BP GARG-16, IFNAI1, G10P1, IFI56 interferon-induced protein with tetratricopeptide repeats 1 interleukin 6 receptor chemokine (C-X-C motif) ligand 10 IRF-7A interferon regulatory factor 7 NGAL lipocalin 2 (oncogene 24p3) 90K, MAC-2-BP lectin, galactoside-binding, soluble, 3 binding protein NM_ 002422 NM_ 002423

4314 MMP3 4316 MMP7 STMY, STMY1 MPSL1, PUMP-1 NM_ 004994 NM_ 004995 NM_ 002428 NM_ 002534 4318 4323 4324 4938 CLG4B MT1-MMP MT2-MMP IFI-4, OIASI, OIAS NM_ 002787 NM_ 004586 NM_ 007315 5683 PSMA2 6197 RPS6KA3 6772 STAT1 NM_ 003254 NM_ 003255 7076 TIMP1 7077 TIMP2 NM_ 000362 NM_ 003684 NM_ 006417 7078 TIMP3 8569 MKNK1

10561 IFI44 MMP9 MMP14 MMP15 OAS1 STAT91 CLGI, EPO, TIMP SFD MNK1 p44, MTAP44 matrix metalloproteinase 3 (stromelysin 1, progelatinase) matrix metalloproteinase 7 (matrilysin, uterine) matrix metalloproteinase 9 (gelatinase B, 92kD gelatinase, 92kD type IV collagenase) matrix metalloproteinase 14 (membrane-inserted) matrix metalloproteinase 15 (membrane-inserted) 2',5'-oligoadenylate synthetase 1 (40-46 kD) proteasome (prosome, macropain) subunit, alpha type, 2 ribosomal protein S6 kinase, 90kD, polypeptide 3 signal transducer and activator of transcription 1, 91kD tissue inhibitor of metalloproteinase 1 (erythroid potentiating activity, collagenase inhibitor) tissue inhibitor of metalloproteinase 2 tissue inhibitor of metalloproteinase 3 (Sorsby fundus dystrophy, pseudoinflammatory) MAP kinase-interacting serine/threonine kinase 1 interferon-induced protein 44 Extract promoters using TRANSGENOME AXX promoter set >ELA2 elastase 2, neutrophil; chrom=19p13.3; LocusLink=1991; 15-AUG-2002;length=1200

ggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactga ggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccg tatcacagggccctgggtaaactgaggcaggtgacacagctgcatgtggccggtatcacggggccctggataaacagagg caggcgacacagctgcatgtggccggtatcacggggccctgggtaaactgaggcaggcgaggccacccccatcaagtccc tcaggtctaggtttggcaggtttggcaaaaacacagcaacgctcggttaaatctgaatttcgggtaagtatatcctgggc ctcatttggaagagacttagattaaaaaaaaaacgtcgagaccagcccggccaacacggtgaaaccccgtctctactaaa aatacaaaaaattagccaggcgcagtggctcacgcctgtgatcccagcactctgggaggctgaggcaggcggatcacccg aggtcagatgttcaagaccagcctggccgacagggcgaaacactgtctctactacaaatacaaaaattagccgggagtgg tggcaggtgcctgtaatctcagctattcaggaggctgaggcaggagaatcacttgaacctgggaggcggaggttgccgtg agccgggatcacgccaccgcactccagcctgggcgatagagcaagactctgtctccaaaaaaataaattaaaaaacccac attgattatctgacatttgaatgcgattgtgcatcctgaattttgtctggaggccccacccgagccaatccagcgtcttg tcccccttctcccccttttcatcaacgccctgtgccaggggagaggaagtggagggcgctggccggccgtggggcaatgc aacggcctcccagcacagggctataagaggagccgggcgggcacggaggggcagagaccccggagccccagccccaccat gaccctcggccgccgactcgcgtgtcttttcctcgcctgtgtcctgccggccttgctgctggggggtgagtttttgagtc caacctcccgctgctccctctgtcccgggttctgttcccacctctccatagagggccccaccagtgtgggtccctcatcc >MMP3 matrix metalloproteinase 3 (stromelysin 1, progelatinase); chrom=11q22.3; LocusLink=4314; 15-AUG-2002;length=1200 aaagttttacaaaatgtcttcctctgaatatgtttagagtcttgcattcaagcatttattatacaccaataatgtgagca acactttacttgacaaagaaacagaaaagaaaggaaaggaagaaaacagaagagcatgaagagaaaatttaggatggatt ctgttcttcaacttcaaagcatctgctaatttgaatttagggaggaggggaaaaggttgaaagagaataagacatgtgta gaagacaaggacagagagaatttcagtccggtaagcaatgtaattcatttcagttctacaactatttatggagcagctac gtgggcccatcacccattaataaattggttacagaattaaaaccaacccaaagggaatatacttccttctttttcacaga ccctctttgttctattctgcccatgaggttttcctcctcaagaaccagcaaatccaacgacagtcaatagcaggcattac aaatcagattcagaaaaataaatcaccccttctaaatttcttctagatattatcttttatgttttgagtataattgtata tagtatagactatagctatgtatgtacactttccacttacatcttttatttgcttttataatgtctttcttaaaataaaa ctgcttttagaagttctgcacaattctgatttttaccaagtcaacctacttcttctctcaaaaggacaaacataaattgt ctagtgaattccagtcaatttttccagaagaaaaaaaatgctccagttttctcctctaccaagacaggaagcacttcctg gagattaatcactgtgttgccttgcaaaattgggaaggttgagagaaattagtaaagtaggttgtatcatcctactttga atttggaatgtttggaaatggtcctgctgccatttggatgaaagcaaggatgagtcaagctgcgggtgatccaaacaaac actgtcactctttaaaagctgcgctcccgaggttggacctacaaggaggcaggcaagacagcaaggcatagagacaacat agagctaagtaaagccagtggaaatgaagagtcttccaatcctactgttgctgtgcgtggcagtttgctcagcctatcca ttggatggagctgcaaggggtgaggacaccagcatgaaccttgttcaggtaattaacactaactgacctggccaggtggg >IL6R interleukin 6 receptor; chrom=1; LocusLink=3570; 15-AUG-2002;length=1200 ttctctccttcctttccttccttcccctctatccctccttccctccctccctccctcctcccttccttttctttctttct tttctttttttttttttctttccagacagggtctcactgtcatccaggctggagtagcagcccccaatcacggctcactg taccctggatctcccggactcaagcaattttcccacctcagcttccctagtagctgggactataggtgtgtaccaccaca cccagctaatttttaaatttttttatagaaatgggggtctcactttgttacacaggctggtctagaattcctggactgaa gcaatccacccacccggctctcccaaagtgttggggttacaggcgtgagccactgcccctggtgttagtgtctgtctgtc aagtcaggagggcagccatgaacgttctgatgtctactgagcacgtgtggcccagaccgtgtgtcaggtgtttaggtgcc atccacagaaccttcctaataaccctgggcagcataggctttcttatctctgacagatgaggaaatggagactcagattc tgaaccgaagtcacagacacagtagatggtaggtctaaatggggacccaggtctatctgactgcaaagtccaaaccgttt

ccttgcctctgctgcagcctgcgaggagcagctgggcagaaagactgtgcctttacggtggtgagtcttccgatgcccaa gcctcaccccagaccgatgaaatcagaatctctggagacccgacccagacattggtgggttttagggctcctggctgatt Composite module found in the AXX promoters Importance Core cut-off Matr. Cut-off AC Matrix --------------------------------------------- --------------------------------- 0.917751 0.323077 0.640828 0.276923 1.000000 0.159172 0.877000 1.000000 0.989000 0.840000 0.756000 0.869000 0.930000 M00062 V$IRF1_01 0.948000 M00339 V$ETS1_B 0.982000 M00199 V$AP1_C 0.853000 M00037 V$NFE2_01 0.760000 M00481 V$AR_01 0.866000 M00699 V$ICSBP_Q6 Histogram (tt1.STA 2v*188c) Percent of obs y = 13 * 0,42348 * normal (x; 1,503956; 0,895746) 100% 95% 90% 85% 80% 75% 70% 65%

60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% <= ,423 (,423;,847] (,847;1,27] VAR1 (1,27;1,694] (1,694;2,117] > 2,117 Interferon regulatory factor 1 Ets factors AP-1 NF-E2 an erythroid-specific factor Androgen receptor Interferon Consensus Sequence binding protein Sites in the AXX promoter set: Yes 0 1 2

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6 0.951000 1.742000 Char 1.941000 0.984000 0.876000 Char 0.772000 Char 1.681000 Char 0.964000 0.856000 0.764000 1.764000 Char 1.000000 0.880000 1.644000 Char 0.984000 Char 1.860000 0.939000 Char

1.987000 1.850000 0.812000 Char 0.868000 1.548000 Char 0.985000 0.862000 1.575000 Char 0.780000 Char 1.966000 0.853000 Char Char 1.921000 1.715000 Char 0.802000 Char 0.975000 1.766000 Char 1.866000 1.852000 Char 1.569000 1.892000 Char 0.760000 Char 1.886000 0.810000 Char 0.765000 Char 0.948000 0.873000 Char 1.892000 0.885000 Char Char = = = = =

= = = = = = = = = = = = = = = = = = = = 0.78964 1.50025 0.77200 1.68100 1.59327 2.52852 0.63057 1.85648 2.59763 1.78836 2.44492 0.78000 1.49608 0.00000 2.33563 0.80200 2.08100 2.00731 1.87015 0.76000

2.54087 0.76500 0.54803 1.87725 0.00000 = = = = = = = = = = = = = = = = 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 Sites in the other human promoters

Not 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6 Char Char Char Char Char Char Char Char Char Char Char Char Char Char Char Char ELA2 elastase 2, neutrophil MMP3 matrix metalloproteinase 3 IL6R interleukin 6 receptor MMP2 matrix metalloproteinase 2 OAS1 2',5'-oligoadenylate synthetase 1

MMP1 matrix metalloproteinase 1 TIMP1 tissue inhibitor of metalloproteinas STAT1 signal transducer and activator of t MMP9 matrix metalloproteinase 9 MMP15 matrix metalloproteinase 15 MMP7 matrix metalloproteinase 7 MMP14 matrix metalloproteinase 14 CD59 CD59 antigen p18-20 LCN2 lipocalin 2 (oncogene 24p3) GZMM granzyme M (lymphocyte met-ase 1) IFI27 interferon, alpha-inducible protein TIMP3 tissue inhibitor of metalloproteinas IFIT1 interferon-induced protein with tetr IFI44 interferon-induced protein 44 MKNK1 MAP kinase-interacting serine/threon IRF7 interferon regulatory factor 7 TIMP2 tissue inhibitor of metalloproteinas LGALS3BP lectin, galactoside-binding, solu SCYB10 PSMA2 InsR Insulin pathway ? Signaling network analysis Insulin Part of the insulin signaling network in TRANSPATH InsR STAT1 Ras AhR targets Gene expression Log(Experiment/Control)

10 8 6 4 2 0 -2 -4 Composite model correlate with the expression level log(Experiment/Control) TSS 10 8 +1000 S41 distance = 0.417599 D2:0.658627 SIG:0.000000 MIN_LENGTH 300 0.000000 3.581248 1.000000 0.933000 M00026 V$AHR_Q5 2.942371 1.000000 0.917000 M00639 V$HNF6_Q6 0.798865 0.844000 0.900000 M00220 V$SREBP1_01 0.409376 0.962000 0.926000 M00173 V$AP1_Q2 0.055716 0.959000 0.989000 M00726 V$USF2_Q6 -1.329975 1.000000 0.959000 M00235 V$AHRARNT_01 -0.713625 1.000000 0.918000 M00156 V$RORA1_01 -0.668375 0.903000 0.854000 M00201 V$CEBP_C V$AHRARNT_01 6 predicted expression -1000

4 2 0 -4 -2 0 2 4 -2 -4 real expression V$AHR_Q5 6 8 10 Composite module found in promoters of differentially expressed genes in liver of growth hormone-deficient mice (Sma1). 40 400 35 30 300 25 250 20

obs 0.1040 * V$CETS1P54_02(0.949) -50- V$TCF4_Q5(0.908) 0.0751 * V$TCF1P_Q6(0.726) -50- V$STAT6_01(0.861) 0.0728 * V$SF1_Q6(0.684) -50V$SMAD3_Q6(0.833) 0.0419 * V$ELK1_02(0.862) -50- V$GRE_C(0.842) 450 350 No of 0.0983 * V$TCF11MAFG_01(0.821) 0.0471 * V$FOXO4_01(0.961) 0.0301 * V$IPF1_Q4(0.852) 0.0410 * V$AR_01(0.851) 0.0766 * V$GR_Q6(0.971) 0.0482 * V$STAT1_02(0.995) 0.0508 * V$CEBPB_01(0.98) 0.0281 * V$STAT5A_02(0.826) 200 15 150 10 100 50 5 0 0 -0.1

0.0 Non-changed genes 0.1 0.2 0.3 0.4 Sma1 Norm 0.5 differentially expressed genes Results of the ArrayAnalyzer search upstream from TFs resulting in identifying: growth hormone (GH) and receptor tyrosine kinases (RTK) as potential key molecules involved in differential expression of the genes in liver of growth hormone-deficient mice (Sma1). 4 TRANSPATH and tools, ArrayAnalyzer and PathwayBuilder At the next step, one can map the transcription factors found at the previous step on the signaling network of the TRANSPATH. If the factors found are parts of the same cascades that have been

suggested on the step 1, then probability is increased that those factors are responsible for the coordinated gene regulation. Feedback loops in activating immune cells through NF-AT/AP-1 cytokines, chemokines membrane receptors adaptor proteins PI3K Ras, Raf Calcineurin, Ca2+ binding proteins ERK, JNK, MAPK NF-ATs Jun, Fos NF-AT/Jun:Fos Groups that are statistically enriched by potential target genes for Jun:Fos and NFATs (as shown in the table above). Other groups that contain potential target genes for Jun:Fos and NFATs. +

+ c-ras htf9a RanBP1 Ras + + + Ran Raf + ? + + c-myc c-Myc + + B-myb + + MEK + c-Ets B-Myb +

+ JNK Erk-1 + cycE cdk2 c-jun + + c-Fos c-Jun ++ + cycD1 + cycE _ cycE cdc2 + + + c-fos c-ets + ++ + cdc2 erk-1 cdk4 + Network controlling S

phase entry in response to a proliferative signal cycD3 cycD3 cdk4 cycD1 cdk4 + rb1 p pRB + e2f-1 Enzymes of nucleotide metabolism: dhfr, tk, cad _ E2F-1 DP-1 ada, odc, ts pRB p Factors and enzymes of replication DNA pol , cdc6, ori1 cdc21, cdc46, p1 co-factor Histones: H1, H2B-143,H3-143

Nucleolines S-phase entry TFBS identification via pattern search Phylogenetic footprint of promoter regions of nucleolin genes 1 <===========V$CREB_02(0.85) ============================================================================= 2 <=======V$CREB_01(0.82) MMNUCLEO GGCCCGCTCATCAGCCCGAGGGAACCCTAGG--CC------TTCCGGCGTTCT------423 MMNUCLEO TCTCCCCAC-CACACCAGGAAGTCACCTCTCTCA----------ACCTG---GAGTTATA 225 RNNUCIA1 GGCCCACTAAACGGCCCGAATGAACTCTAGG--CC------TTCCGGCGCTCT------435 1 <===========V$CREB_02(0.85) CSNUCLEO GGCC-GCGAGCTGGCCCCAGTGG-CTCTAGG--CCCTCAACTTCCGGCGCTCTCCGGCTC 450 2 <=======V$CREB_01(0.82) HSNUCLEO TGCCTCCAAAAGGGCCAACGGGAACTCCGCGGTCCCTGAACTTCCGGTGCTGGAGG---A 448 RNNUCIA1 TCTCCCACCACACACCAGGAAGTCACCTCTCTGA----------ACCTG---GAGTTATA 221 *** * *** * * * * ** ****** * * 1 <===========V$CREB_02(0.85) =============================================================================

2 <=======V$CREB_01(0.82) MMNUCLEO -TCAGCAGGACCACGCGGCG---------------------------------------442 CSNUCLEO CCTCC-AGCACACACCAGGAAGTCACCTCTCCGAGACCGTCCCCATCAG---GAGTTAAA 229 RNNUCIA1 -CCAGCTCTTCAGCGCGGCGAACGTTCTAGGCCCCTGAGAAGTCCACCGGGAGGCGCAGG 494 1 <===============V$TH1E47_01(0.85) CSNUCLEO CTCAGCGGGAACGCGCGGCGAGCAGTTGAGGCCGCCGCGGATTCCAACGGGTTGGGGACG 510 HSNUCLEO TGGCCCTGT-GAGGCCAGAAAGTTACTTCTCCGAGGCCAGTTCCCCATGTCTGAGAAATA 229 HSNUCLEO CTCCTCGCTCCAGGGCCACCAGGAGCCGCGGC---------------------GTGAGTG 487 ** * **** **** ** **** * * *** * * * * ** * ============================================================================= ============================================================================= MMNUCLEO --------------GGGGGAAA-----GCACCGAGAAACGCCCAGACCACCTGAGCATCG 483 1 <==========V$DELTAEF1_01(0.82) RNNUCIA1 TTTCCGCTACGCGAGGGGGAAA-----TCCCCGAGAAATGCCCAGACCACCTAAGCACAG 549 MMNUCLEO CCTACCG-CGAGAGGTCACCGACATTACATGGATCGCTTGTGCACTGCTCGTA--CACAC 282 CSNUCLEO

TTCGC----AGCGCGGGGGATGCTCGGGCCACCCACCACCCCCCCACCCCCCCGGCCACG 566 1 <======== ==V$DELTAEF1_01(0.87) HSNUCLEO CGTGCCGGAACCGAGGGCGGGG-----TCTCTGAGGAACTCCAAGGCTGCCCAAGCCTAC 542 RNNUCIA1 CCTACCG-CGTGAGGTCA--GAGATTAAATGGACTGTTTGTGCACTGCTCACA--CACAC 276 *** * * * ** * ** ** 1 <======== ==V$DELTAEF1_01(0.84) ============================================================================= CSNUCLEO TCTACCG-CGCGAGGTTG--GACATTAAGCGAGCTGTTTGAGCACTGCACACAGGCGCGC 286 MMNUCLEO CCGCCC--------ATGCTGCCTCGGAACACCTGAGGGAATCCGGGCCACGCCGCCACCT 535 1 <========= =V$DELTAEF1_01(0.84) RNNUCIA1 ACGTCC--------ATGCGGCGTACGGATACCTGAGGGAATCCGGGCCATACCGCCACCT 601 HSNUCLEO TCTCCCAACTTGAGGTTCT-GTGGGGTAGGGGAGGGTTCGTGACTTTCTCACAGAAAACC 288 CSNUCLEO AGGCCCGGAGCTCCAGGTAGCAGTGCAGCACTAGGCGGCGTCCGGGCCACGCCGCCCAAT 626 ** ** * ***** * * * * * * * * * * *

HSNUCLEO GGACCC---------AGCCACATTGGCGAACC----GGAGACCGCCCGATTCCACCACC588 ============================================================================= ** * * ** ** *** * * ** ** 1 <=======V$NKX25_02(0.84) 2 =========>V$CETS1P54_01(0.87)============================================================================= 1 <=======V$E2F_02(1.00) MMNUCLEO ACACACGCAC------------AACTGCTTTTATTAGGAGCT----CTCAGGAAAGCGGG 326 MMNUCLEO ACCCGCG--CCTCACACACAAGCCGCGCCAAACTCGCCCGTCCCACTGCGCAGGCGTGGG 593 1 <=======V$NKX25_02(0.84) 1 <=======V$E2F_02(1.00) 2 =========>V$CETS1P54_01(0.87) RNNUCIA1 ACTCGCG--CCTCACTC--AAGCCGCGCCAAACTCGCGCGTTTCACTGCGCAGGCGTGTA 657 RNNUCIA1 ACACACGCGCGCGCGCGCGCGAAATTGCTTTTATTAGGAGCT----CTCAGGAAAGTGGT 332 1 <=======V$E2F_02(1.00) 1 =======>V$NKX25_02(0.82) TCCCCCGAGCCCCTTCCACAAGCCGCGCCAAACGGGTCTG---CACCGCGCAGGCG--GC 681 2 <==========V$DELTAEF1_01(0.81)CSNUCLEO

1 <=======V$E2F_02(1.00) 3 =========>V$CETS1P54_01(0.84) HSNUCLEO -CCCGCGCTCCCCTCAC--AGCCGGCGCCAAAAACGCCAGTCCCACGACGCAGGC----640 CSNUCLEO ACACACGCACGC----------AACTGCCTTTATTGGGAGCTGTCTCTCAGGAGAACAGC 336 * * ** ** * * * * ******** * * *** ******* 1 <=======V$NKX25_02(0.83) 2 <==========V$DELTAEF1_01(0.81) 3 =========>V$CETS1P54_01(0.86) HSNUCLEO TCGTACAGACCC-------CGCCACTGCCTTTATTAACAGCT----CTCAGGAGACTGCC 337 * ** * * *** ****** **** ******* * HSNUCLEO - Homo sapiens; ============================================================================= CSNUCLEO - Cricetulus griseus; MMNUCLEO GACTCGCATCA---TAGCCAAG----AAGCCGTTCGCGAC-TCCGCGGAGAACAGGCCGA 378 RNNUCIA1 GGCTCGCATCAGGCTACCACAGCC--AAGAGGACCGCCACCTCTACCGAGGGCAGGCCAA 390 MMNUCLEO - Mus musculus; CSNUCLEO GGCCCGCGGCGCAACACTAGAGCCCCGGGATGTTCTCGGC-TCTGCCGAGGGCAG-CCGA 394

RNNUCIA1 Rattus norvegicus HSNUCLEO TGCAGGAGGGGGGTCGCTCCGGCC---CCATGCTCGCGGG-CAAGCAGGGATAAG--CTG 391 * * * * * * * * * ** * A T G C 1) A T 2) A T 3) A T G G G C C C

Result of comparison of four different pattern discovery programs on the sets of simulated sequences with implanted TF binding sites for one matrix; y-axis: the averaged sum of squared differences between reveled matrix and the original one; x-axis: values, that are the probabilities of consensus nucleotide in each position of the matrix. 1,000 Kernel MEME CONSENSUS GIBBS 0,800 0,600 0,400 0,200 GIBBS CONSENSUS M EM E Kernel 0,000 0,65 0,7 0,75 0,8 0,85 0,9 0,95 Table 1. Comparison of 3 programs performing the best for the low levels of value.

0,65 0,7 Kernel 0,205 0,165 MULTIPROFILER 0,208 0,255 PROJECTION 0,260 0,304 Three mechanisms of biopolymer evolution Gradual evolution by fixation of multiple substitutions (Protein functional centres) Edited bipolymer by fixation of a small number of substitutions (Protein folding) Evolution at once by fixation of single substitutions (Regulatory regions of eukaryotic genes) Thank you ! www.biobase.de

Recently Viewed Presentations

  • Diseases of The Oral Cavity and Oropharynx

    Diseases of The Oral Cavity and Oropharynx

    DISEASES OF THE ORAL CAVITY Prof. İlhan TOPALOĞLU Otolaryngology Department Yeditepe University School of Medicine * The tonsil is nestled in a fossa formed by the muscular anterior and posterior tonsillar pillars (palatoglossus and palatopharyngeus) and lying superficial to the...
  • Baseline Cost and Schedule for the ARCS Instrument

    Baseline Cost and Schedule for the ARCS Instrument

    ARCS Construction Project Review DOE Germantown Aug. 9, 2004 Status of Procurements for the ARCS Instrument Doug Abernathy ARCS Hardware Project Manager
  • Chapter #, Title - Computer Science at CCSU

    Chapter #, Title - Computer Science at CCSU

    You could write a class that implements certain methods (such as compareTo) without formally implementing the interface (Comparable) However, formally establishing the relationship between a class and an interface allows Java to deal with an object in certain ways. Interfaces...
  • O que voc deve saber sobre REINO PLANTAE

    O que voc deve saber sobre REINO PLANTAE

    REINO PLANTAE Os animais dependem direta ou indiretamente das plantas, pois, para produzir suas substâncias orgânicas, necessitam de matéria-prima e de energia captada por meio da fotossíntese realizada pelos vegetais.
  • The History of Aboriginal Peoples in Canada

    The History of Aboriginal Peoples in Canada

    Status Indians. Those who have legal . Rights under the Indian. Act. Non-status Indians. Those who have given up their legal status as Indians, while still retaining their cultural identity. Indian. Used only when referring to legislation, wen used in...
  • IIW Doc. III-1396-06 S.I. Kuchuk-Yatsenko, I.V. Ziakhor E.O ...

    IIW Doc. III-1396-06 S.I. Kuchuk-Yatsenko, I.V. Ziakhor E.O ...

    FLASH-BUTT WELDING FOR COMPRESSED NATURAL GAS (SNG) TRANSPORTATION AND STORAGE Ihor Ziakhor, Ph. D. The Coselle is a unique system, invented in Canada, for Compressed Natural Gas (CNG) transportation and storage in coil of small diameter pipe. Potential markets for...
  • Section 1.4 Tools and Techniques of Biology

    Section 1.4 Tools and Techniques of Biology

    Electron Microscope. Uses a beam of electrons to magnify an image (not light) Images are always black and white. Not for viewing live organisms. Electrons have a wavelength 100,000 times smaller than visible light
  • Planning and Analysis as Essential Components of ...

    Planning and Analysis as Essential Components of ...

    The University is now on a cycle of planned maintenance, as opposed to deferred maintenance. Results - From an Accreditation Perspective " The University of Delaware has every reason to take enormous pride in what it has accomplished over the...