# Image and Multimedia Mining

CMU SCS Image and Multimedia Mining Christos Faloutsos CMU CMU SCS Outline Problem definition and motivation Indexing and feature extraction Features for images (DWT, morphology, etc) Tools and case studies: PCA, ICA, Random Walks Conclusions MLD school 06 Copyright: C. Faloutsos, 2006 2 CMU SCS Problem #1: Similarity search

[Bob Murphy] Sub-cellular protein localization patterns MLD school 06 Copyright: C. Faloutsos, 2006 3 CMU SCS Problem #1: Similarity search Mitoch. Nucleolar Actin Endosomal Tubulin MLD school 06 Copyright: C. Faloutsos, 2006 4 CMU SCS

\$price Problem#1: Similarity search: \$price 1 365 day \$price 1 365 day distance function? 1 365 day MLD school 06

Copyright: C. Faloutsos, 2006 5 CMU SCS Problem #2: Mining [Bob Murphy] Sub-cellular protein localization patterns MLD school 06 Copyright: C. Faloutsos, 2006 6 CMU SCS \$price Problem#2: Mining \$price 1 365

day \$price 1 365 day distance function? 1 365 day MLD school 06 Copyright: C. Faloutsos, 2006 7 CMU SCS Problem definition Given: many sequences (images, video-clips)

x1 , x2 , , xt , (y1, y2, , yt, ) Find similar sequences / images / video-clips patterns; clusters; outliers MLD school 06 Copyright: C. Faloutsos, 2006 8 CMU SCS Motivation - applications: Images medical/biological images (training; research)

biometrics/security satellite image analysis photo collections museum images logos ... MLD school 06 Copyright: C. Faloutsos, 2006 9 CMU SCS Motivation - applications: Video surveillance TV news processing (eg., summarization www.informedia.cs.cmu.edu) 3-d images (medical / biological) 3-d and 4-d datasets (x,y,z, time, temp., humidity etc) - weather/environment monitoring MLD school 06

Copyright: C. Faloutsos, 2006 10 CMU SCS Motivation - applications: Time sequences Financial, sales, economic series Medical (ECGs/EKGs, monitoring) civil infrastructure; automobile traffic monitoring MLD school 06 Copyright: C. Faloutsos, 2006 11 CMU SCS Motivation chlorine concentrations Phase 1

Phase 2 Phase 3 : : : : : : : : : : : : water distribution network (w/ Jeanne Vanbriesen, CMU) MLD school 06 Copyright: C. Faloutsos, 2006 12

CMU SCS Motivation - Applications (contd) Weather, environment/anti-pollution computer network traffic monitoring data-center traffic monitoring (self-* system at CMU) MLD school 06 Copyright: C. Faloutsos, 2006 13 CMU SCS Outline Problem definition and motivation Indexing and feature extraction Features for images (DWT, morphology, etc) Tools and case studies: PCA, ICA, Random Walks Conclusions

MLD school 06 Copyright: C. Faloutsos, 2006 14 CMU SCS Indexing Problem: given a set of time sequences, find the ones similar to a desirable query sequence MLD school 06 Copyright: C. Faloutsos, 2006 15 CMU SCS \$price \$price

1 365 day \$price 1 365 day distance function: by expert 1 365 day MLD school 06 Copyright: C. Faloutsos, 2006 16 CMU SCS

Idea: GEMINI Eg., find stocks similar to MSFT Seq. scanning: too slow How to accelerate the search? [Faloutsos96] MLD school 06 Copyright: C. Faloutsos, 2006 17 CMU SCS GEMINI - Pictorially eg,. std S1 F(S1) 1 365 day

F(Sn) Sn eg, avg 1 MLD school 06 365 day Copyright: C. Faloutsos, 2006 18 CMU SCS GEMINI Solution: Quick-and-dirty' filter: extract n features (numbers, eg., avg., etc.) map into a point in n-d feature space organize points with off-the-shelf spatial access method (SAM) discard false alarms

MLD school 06 Copyright: C. Faloutsos, 2006 19 CMU SCS Examples of GEMINI Time sequences: DFT (up to 100 times faster) [SIGMOD94]; [Kanellakis+], [Mendelzon+] MLD school 06 Copyright: C. Faloutsos, 2006 20 CMU SCS Examples of GEMINI Also, on many other data types: Images (QBIC) [JIIS94] tumor-like shapes [VLDB96]

video [Informedia + S-R-trees] automobile part shapes [Kriegel+97] MLD school 06 Copyright: C. Faloutsos, 2006 21 CMU SCS Indexing - SAMs Q: How do Spatial Access Methods (SAMs) work? A: they group nearby points (or regions) together, on nearby disk pages, and answer spatial queries quickly (range queries, nearest neighbor queries etc) For example: MLD school 06 Copyright: C. Faloutsos, 2006 22

CMU SCS R-trees adv [Guttman84] eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> I disk page AC F B D MLD school 06 E G H J Copyright: C. Faloutsos, 2006

23 CMU SCS adv R-trees eg., w/ fanout 4: P1 P3 AC F B P2 D MLD school 06 E I G H

P4 J A B C D E Copyright: C. Faloutsos, 2006 H I J F G 24 CMU SCS adv R-trees eg., w/ fanout 4: P1 P3 AC

F B P2 D MLD school 06 E I G P1 P2 P3 P4 H P4 J A B C D E Copyright: C. Faloutsos, 2006 H I J

F G 25 CMU SCS adv R-trees - range search? P1 P3 AC F B P2 D MLD school 06 E I G P1 P2 P3 P4

H P4 J A B C D E Copyright: C. Faloutsos, 2006 H I J F G 26 CMU SCS adv R-trees - range search? P1 P3

AC F B P2 D MLD school 06 E I G P1 P2 P3 P4 H P4 J A B C D E Copyright: C. Faloutsos, 2006 H I

J F G 27 CMU SCS Conclusions Fast indexing: through GEMINI feature extraction and (off the shelf) Spatial Access Methods [Gaede+98] MLD school 06 Copyright: C. Faloutsos, 2006 28 CMU SCS Outline Problem definition and motivation Indexing and feature extraction Features for images (DWT, morphology,

etc) Tools and case studies: PCA, ICA, Random Walks Conclusions MLD school 06 Copyright: C. Faloutsos, 2006 29 CMU SCS Outline Problem definition and motivation Indexing and feature extraction DFT, DWT etc SVD/PCA MDS and FastMap ... MLD school 06 Copyright: C. Faloutsos, 2006

30 CMU SCS DFT and cousins very good for compressing real signals more details on DWT: later MLD school 06 Copyright: C. Faloutsos, 2006 31 CMU SCS DFT and stocks Fourier appx actual Dow Jones Industrial index, 6/18/200112/21/2001 12000.00

10000.00 8000.00 6000.00 4000.00 2000.00 0.00 1 11 21 31 MLD school 06 41 51 61 71 81

91 101 111 121 Copyright: C. Faloutsos, 2006 32 CMU SCS DFT and stocks Fourier appx actual 12000.00 10000.00 8000.00 6000.00 4000.00 2000.00 0.00 1 11

21 31 41 51 Log(ampl) 61 71 81 91 101 111 121 81 91 101 111 121 Series1 10000000

Dow Jones Industrial index, 6/18/200112/21/2001 just 3 DFT coefficients give very good approximation 1000000 100000 10000 1000 100 10 1 1 11 21 31 MLD school 06 41

51 61 71 freq Copyright: C. Faloutsos, 2006 33 CMU SCS Outline Problem definition and motivation Indexing and feature extraction DFT, DWT etc SVD/PCA MDS and FastMap ... MLD school 06

Copyright: C. Faloutsos, 2006 34 CMU SCS SVD THE optimal method for dimensionality reduction (under the Euclidean metric) MLD school 06 Copyright: C. Faloutsos, 2006 35 CMU SCS brightness SVD - Motivation area

MLD school 06 Copyright: C. Faloutsos, 2006 36 CMU SCS brightness SVD - Motivation area MLD school 06 Copyright: C. Faloutsos, 2006 37 CMU SCS SVD: gives best axis to project brightness

SVD dim. reduction v1 minimum RMS error MLD school 06 area Copyright: C. Faloutsos, 2006 38 CMU SCS Singular Value Decomposition (SVD) SVD (~LSI ~ KL ~ PCA ~ spectral analysis...) LSI: S. Dumais; M. Berry KL: eg, Duda+Hart PCA: eg., Jolliffe Details: [Press+], [Faloutsos96] MLD school 06

Copyright: C. Faloutsos, 2006 39 CMU SCS SVD Extremely useful tool (also behind PageRank/google and Kleinbergs algorithm for hubs and authorities) But may be slow: O(N * M * M) if N>M any approximate, faster method? MLD school 06 Copyright: C. Faloutsos, 2006 40 CMU SCS SVD shorcuts random projections (Johnson-Lindenstrauss

thm [Papadimitriou+ pods98]) MLD school 06 Copyright: C. Faloutsos, 2006 41 CMU SCS Random projections pick enough random directions (will be ~orthogonal, in high-d!!) distances are preserved probabilistically, within epsilon (also, use as a pre-processing step for SVD [Papadimitriou+ PODS98]) MLD school 06 Copyright: C. Faloutsos, 2006 42 CMU SCS

Outline Problem definition and motivation Indexing and feature extraction DFT, DWT etc SVD MDS and FastMap ... MLD school 06 Copyright: C. Faloutsos, 2006 43 CMU SCS MDS / FastMap but, what if we have NO points to start with? (eg. Time-warping distance) A: Multi-dimensional Scaling (MDS) ; FastMap

MLD school 06 Copyright: C. Faloutsos, 2006 44 CMU SCS MDS/FastMap O1 O2 O3 O4 O1 0 1 1

100 100 O2 1 0 1 100 100 O3 1 1 0 100 100 O4

100 100 100 0 1 O5 100 100 100 1 0 MLD school 06 O5 Copyright: C. Faloutsos, 2006 ~100 ~1 ~1 45 CMU SCS

MDS Multi Dimensional Scaling MLD school 06 Copyright: C. Faloutsos, 2006 46 CMU SCS FastMap Multi-dimensional scaling (MDS) can do that, but in O(N**2) time FastMap [Faloutsos+95] takes O(N) time MLD school 06 Copyright: C. Faloutsos, 2006 47 CMU SCS

FastMap: Application VideoTrails [Kobla+97] scene-cut detection (about 10% errors) MLD school 06 Copyright: C. Faloutsos, 2006 48 CMU SCS Conclusions - Practitioners guide Similarity search in time sequences 1) establish/choose distance (Euclidean, timewarping,) 2) extract features (SVD, DWT, MDS), and use an SAM (R-tree/variant) or a Metric Tree (Mtree) 2) for high intrinsic dimensionalities, consider sequential scan (it might win) MLD school 06 Copyright: C. Faloutsos, 2006 49

CMU SCS Books William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P. Flannery: Numerical Recipes in C, Cambridge University Press, 1992, 2nd Edition. (Great description, intuition and code for SVD) C. Faloutsos: Searching Multimedia Databases by Content, Kluwer Academic Press, 1996 (introduction to SVD, and GEMINI) MLD school 06 Copyright: C. Faloutsos, 2006 50 CMU SCS References Agrawal, R., K.-I. Lin, et al. (Sept. 1995). Fast Similarity Search in the Presence of Noise, Scaling and Translation in Time-Series Databases. Proc. of VLDB, Zurich, Switzerland.

Babu, S. and J. Widom (2001). Continuous Queries over Data Streams. SIGMOD Record 30(3): 109-120. Breunig, M. M., H.-P. Kriegel, et al. (2000). LOF: Identifying Density-Based Local Outliers. SIGMOD Conference, Dallas, TX. Berry, Michael: http://www.cs.utk.edu/~lsi/ MLD school 06 Copyright: C. Faloutsos, 2006 51 CMU SCS References Ciaccia, P., M. Patella, et al. (1997). M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB. Foltz, P. W. and S. T. Dumais (Dec. 1992). Personalized Information Delivery: An Analysis of Information Filtering Methods. Comm. of ACM (CACM) 35(12): 5160. Guttman, A. (June 1984). R-Trees: A Dynamic Index Structure for Spatial Searching. Proc. ACM SIGMOD, Boston, Mass. MLD school 06

Copyright: C. Faloutsos, 2006 52 CMU SCS References Gaede, V. and O. Guenther (1998). Multidimensional Access Methods. Computing Surveys 30(2): 170-231. Gehrke, J. E., F. Korn, et al. (May 2001). On Computing Correlated Aggregates Over Continual Data Streams. ACM Sigmod, Santa Barbara, California. MLD school 06 Copyright: C. Faloutsos, 2006 53 CMU SCS References Gunopulos, D. and G. Das (2001). Time Series Similarity Measures and Time Series Indexing. SIGMOD

Conference, Santa Barbara, CA. Hatonen, K., M. Klemettinen, et al. (1996). Knowledge Discovery from Telecommunication Network Alarm Databases. ICDE, New Orleans, Louisiana. Jolliffe, I. T. (1986). Principal Component Analysis, Springer Verlag. MLD school 06 Copyright: C. Faloutsos, 2006 54 CMU SCS References Keogh, E. J., K. Chakrabarti, et al. (2001). Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. SIGMOD Conference, Santa Barbara, CA. Kobla, V., D. S. Doermann, et al. (Nov. 1997). VideoTrails: Representing and Visualizing Structure in Video Sequences. ACM Multimedia 97, Seattle, WA. MLD school 06

Copyright: C. Faloutsos, 2006 55 CMU SCS References Oppenheim, I. J., A. Jain, et al. (March 2002). A MEMS Ultrasonic Transducer for Resident Monitoring of Steel Structures. SPIE Smart Structures Conference SS05, San Diego. Papadimitriou, C. H., P. Raghavan, et al. (1998). Latent Semantic Indexing: A Probabilistic Analysis. PODS, Seattle, WA. Rabiner, L. and B.-H. Juang (1993). Fundamentals of Speech Recognition, Prentice Hall. MLD school 06 Copyright: C. Faloutsos, 2006 56 CMU SCS

References Traina, C., A. Traina, et al. (October 2000). Fast feature selection using the fractal dimension,. XV Brazilian Symposium on Databases (SBBD), Paraiba, Brazil. MLD school 06 Copyright: C. Faloutsos, 2006 57 CMU SCS References Dennis Shasha and Yunyue Zhu High Performance Discovery in Time Series: Techniques and Case Studies Springer 2004 Yunyue Zhu, Dennis Shasha ``StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time' VLDB, August, 2002. pp. 358-369. Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. The Design of an Acquisitional Query Processor for Sensor Networks. SIGMOD, June 2003, San Diego, CA.

MLD school 06 Copyright: C. Faloutsos, 2006 58 CMU SCS Outline Problem definition and motivation Indexing and feature extraction Features for images (DWT, morphology, etc) Tools and case studies: PCA, ICA, Random Walks Conclusions MLD school 06 Copyright: C. Faloutsos, 2006 59 CMU SCS

How to extract features? Eg.: [Bob Murphy]: protein localization images ER giantin LAMP Mito Actin TfR MLD school 06 gpp130 Nucleolin Tubulin Copyright: C. Faloutsos, 2006 DNA

60 CMU SCS How to extract features? Eg.: ER giantin LAMP Mito Actin TfR MLD school 06 gpp130 area(?) ER Mit

Nucleolin Tubulin Copyright: C. Faloutsos, 2006 brightness(? DNA 61 CMU SCS (Natural) image features [Flickner+, 95] from IBMs QBIC: Color Shape Texture MLD school 06 Copyright: C. Faloutsos, 2006 62

CMU SCS (Natural) image features [Flickner+, 95] from IBMs QBIC: Color: average R/G/B; color histograms Shape Texture MLD school 06 Copyright: C. Faloutsos, 2006 63 CMU SCS Images - shapes distance function: Euclidean, on the area, perimeter, and 20 moments [QBIC, 95] Q: other features / distance functions? MLD school 06 Copyright: C. Faloutsos, 2006

64 CMU SCS Images - shapes (A1: turning angle) A2: wavelets A3: morphology: dilations/erosions ... MLD school 06 Copyright: C. Faloutsos, 2006 65 CMU SCS Images - shapes

angle (A1: turning angle) A2: wavelets A3: morphology: dilations/erosions ... MLD school 06 Copyright: C. Faloutsos, 2006 length 66 CMU SCS Wavelets - example

ttp://grail.cs.washington.edu/projects/query/ Wavelets achieve *great* compression: 20 MLD school 06 400 100 # coefficients Copyright: C. Faloutsos, 2006 16,000 67 CMU SCS Wavelets - intuition Edges (horizontal; vertical; diagonal) MLD school 06 Copyright: C. Faloutsos, 2006

68 CMU SCS Wavelets - intuition Edges (horizontal; vertical; diagonal) recurse MLD school 06 Copyright: C. Faloutsos, 2006 69 CMU SCS Wavelets - intuition Edges (horizontal; vertical; diagonal) http://www331.jpl.nasa.gov/public/ wave.html MLD school 06 Copyright: C. Faloutsos, 2006

70 CMU SCS Wavelets Many wavelet basis: Haar Daubechies (-4, -6, -20) Gabor ... MLD school 06 Copyright: C. Faloutsos, 2006 71 CMU SCS Gabor Function

We can extend the function to generate Gabor filters by rotating and dilating MLD school 06 Copyright: C. Faloutsos, 2006 72 CMU SCS Wavelets: Extremely useful Excellent compression / feature extraction, for natural images fast to compute ( O(N) ) MLD school 06 Copyright: C. Faloutsos, 2006 73 CMU SCS

Crush intro to Haar wavelets The simplest to code, explain and understand MLD school 06 Copyright: C. Faloutsos, 2006 74 CMU SCS Wavelets - DWT Similarly, DFT suffers on short-duration waves (eg., baritone, silence, soprano) value time MLD school 06 Copyright: C. Faloutsos, 2006 75 CMU SCS

Haar Wavelets subtract sum of left half from right half repeat recursively for quarters, eight-ths, ... MLD school 06 Copyright: C. Faloutsos, 2006 76 CMU SCS Wavelets - construction adv x0 x1 x2 x3 x4 x5 x6 x7 MLD school 06 Copyright: C. Faloutsos, 2006 77 CMU SCS

Wavelets - construction level 1 d1,0 MLD school 06 s1,0 d1,1 s1,1 + adv ....... x0 x1 x2 x3 x4 x5 x6 x7 Copyright: C. Faloutsos, 2006 78 CMU SCS Wavelets - construction level 2 d2,0 d1,0 MLD school 06

adv s2,0 s1,0 d1,1 s1,1 + ....... x0 x1 x2 x3 x4 x5 x6 x7 Copyright: C. Faloutsos, 2006 79 CMU SCS Wavelets - construction adv etc ... d2,0 d1,0 MLD school 06

s2,0 s1,0 d1,1 s1,1 + ....... x0 x1 x2 x3 x4 x5 x6 x7 Copyright: C. Faloutsos, 2006 80 CMU SCS Wavelets - construction Q: map each coefficient on the time-freq. plane d2,0 adv f s2,0

t d1,0 MLD school 06 s1,0 d1,1 s1,1 + ....... x0 x1 x2 x3 x4 x5 x6 x7 Copyright: C. Faloutsos, 2006 81 CMU SCS Wavelets - construction Q: map each coefficient on the time-freq. plane d2,0 adv f

s2,0 t d1,0 MLD school 06 s1,0 d1,1 s1,1 + ....... x0 x1 x2 x3 x4 x5 x6 x7 Copyright: C. Faloutsos, 2006 82 CMU SCS Haar wavelets - code #!/usr/bin/perl5 # expects a file with numbers # and prints the dwt transform # The number of time-ticks should be a power of 2 # USAGE

# haar.pl my @vals=(); my @smooth; # the smooth component of the signal my @diff; # the high-freq. component # collect the values into the array @val while(<>){ @vals = ( @vals , split ); } MLD school 06 adv my \$len = scalar(@vals); my \$half = int(\$len/2); while(\$half >= 1 ){ for(my \$i=0; \$i< \$half; \$i++){ \$diff [\$i] = (\$vals[2*\$i] - \$vals[2*\$i + 1] )/ sqrt(2); print "\t", \$diff[\$i]; \$smooth [\$i] = (\$vals[2*\$i] + \$vals[2*\$i + 1] )/ sqrt(2); } print "\n"; @vals = @smooth; \$half = int(\$half/2); }

print "\t", \$vals[0], "\n" ; # the final, smooth component Copyright: C. Faloutsos, 2006 83 CMU SCS Images - shapes (A1: turning angle) A2: wavelets A3: morphology: dilations/erosions ... MLD school 06 Copyright: C. Faloutsos, 2006 84

CMU SCS Other shape features Morphology (dilations, erosions, openings, closings) [Korn+, VLDB96] structuring element shape (B/W) R=1 MLD school 06 Copyright: C. Faloutsos, 2006 85 CMU SCS Other shape features Morphology (dilations, erosions, openings, closings) [Korn+, VLDB96] structuring element

R=0.5 R=1 shape R=2 MLD school 06 Copyright: C. Faloutsos, 2006 86 CMU SCS Other shape features Morphology (dilations, erosions, openings, closings) [Korn+, VLDB96] structuring element R=0.5 R=1 shape R=2

MLD school 06 Copyright: C. Faloutsos, 2006 87 CMU SCS Morphology: closing fill in small gaps very similar to alpha contours MLD school 06 Copyright: C. Faloutsos, 2006 88 CMU SCS Morphology: closing fill in small gaps closing, with R=1

MLD school 06 Copyright: C. Faloutsos, 2006 89 CMU SCS Morphology: opening closing, for the complement = trim small extremities MLD school 06 Copyright: C. Faloutsos, 2006 90 CMU SCS Morphology: opening closing, for the complement = trim small extremities opening

with R=1 MLD school 06 Copyright: C. Faloutsos, 2006 91 CMU SCS Morphology Closing: fills in gaps Opening: trims extremities All wrt a structuring element: MLD school 06 Copyright: C. Faloutsos, 2006 92 CMU SCS Morphology Features: areas of openings (R=1, 2, ) and closings

MLD school 06 Copyright: C. Faloutsos, 2006 93 CMU SCS Morphology resulting areas: pattern spectrum translation ( and rotation) independent As described: on b/w images can be extended to grayscale ones (eg., by thresholding) MLD school 06 Copyright: C. Faloutsos, 2006 94 CMU SCS

Conclusions Features for color, shape, texture Shape: wavelets; math. morphology texture: wavelets we didnt elaborate: color: color histograms; avg R/G/B and many more MLD school 06 Copyright: C. Faloutsos, 2006 95 CMU SCS References Faloutsos, C., R. Barber, et al. (July 1994). Efficient and Effective Querying by Image Content. J. of Intelligent Information Systems 3(3/4): 231-262. Faloutsos, C. and K.-I. D. Lin (May 1995). FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. Proc. of ACM-SIGMOD, San Jose, CA. Faloutsos, C., M. Ranganathan, et al. (May 25-27, 1994). Fast Subsequence Matching in Time-Series Databases.

Proc. ACM SIGMOD, Minneapolis, MN. MLD school 06 Copyright: C. Faloutsos, 2006 96 CMU SCS References Christos Faloutsos, Searching Multimedia Databases by Content, Kluwer 1996 MLD school 06 Copyright: C. Faloutsos, 2006 97 CMU SCS References Flickner, M., H. Sawhney, et al. (Sept. 1995). Query by Image and Video Content: The QBIC System. IEEE Computer 28(9): 23-32.

Goldin, D. Q. and P. C. Kanellakis (Sept. 19-22, 1995). On Similarity Queries for Time-Series Data: Constraint Specification and Implementation (CP95), Cassis, France. MLD school 06 Copyright: C. Faloutsos, 2006 98 CMU SCS References Charles E. Jacobs, Adam Finkelstein, and David H. Salesin. Fast Multiresolution Image Querying SIGGRAPH '95, pages 277-286. ACM, New York, 1995. Flip Korn, Nikolaos Sidiropoulos, Christos Faloutsos, Eliot Siegel, Zenon Protopapas: Fast Nearest Neighbor Search in Medical Image Databases. VLDB 1996: 215-226 MLD school 06 Copyright: C. Faloutsos, 2006 99

CMU SCS Resources - software and urls http://www.dsptutor.freeuk.com/jsanalyser/ FFTSpectrumAnalyser.html : Nice java applets for FFT http://www.relisoft.com/freeware/freq.html voice frequency analyzer (needs microphone) MLD school 06 Copyright: C. Faloutsos, 2006 100 CMU SCS Resources: software and urls xwpl: open source wavelet package from Yale, with excellent GUI http:// monet.me.ic.ac.uk/people/gavin/java/wave letDemos.html

: wavelets and scalograms MLD school 06 Copyright: C. Faloutsos, 2006 101 CMU SCS Books William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P. Flannery: Numerical Recipes in C, Cambridge University Press, 1992, 2nd Edition. (Great description, intuition and code for DFT, DWT) C. Faloutsos: Searching Multimedia Databases by Content, Kluwer Academic Press, 1996 (introduction to DFT, DWT) MLD school 06 Copyright: C. Faloutsos, 2006 102

CMU SCS Outline Problem definition and motivation Indexing and feature extraction Features for images (DWT, morphology, etc) Tools and case studies: PCA, ICA, Random Walks Conclusions MLD school 06 Copyright: C. Faloutsos, 2006 103 CMU SCS ICA Independent Component Analysis better than PCA/SVD also known as blind source separation (the `cocktail discussion problem) MLD school 06

Copyright: C. Faloutsos, 2006 104 CMU SCS Intuition behind ICA: area perimeter MLD school 06 Copyright: C. Faloutsos, 2006 105 CMU SCS Intuition behind ICA: area PCA perimeter

MLD school 06 Copyright: C. Faloutsos, 2006 106 CMU SCS Intuition behind ICA: area ICA PCA perimeter MLD school 06 Copyright: C. Faloutsos, 2006 107 CMU SCS Intuition behind ICA:

MLD school 06 Copyright: C. Faloutsos, 2006 108 CMU SCS Intuition behind ICA: PCA finds the hyperplane. MLD school 06 Copyright: C. Faloutsos, 2006 109 CMU SCS Intuition behind ICA: Dimensionality reduction PCA finds the hyperplane.

MLD school 06 ICA finds the correct patterns. Copyright: C. Faloutsos, 2006 110 CMU SCS ICA in action stock prices - find groups / outliers! MLD school 06 Copyright: C. Faloutsos, 2006 111 CMU SCS Eg.: Hidden variables in stock prices DJIA (29 companies) Unimodal data

Alcoa American Express Boeing Caterpill ar Citi Group MLD school 06 (Q) What patterns do you see? Copyright: C. Faloutsos, 2006 112 CMU SCS Eg.: Hidden variables in stock prices Alcoa Discovery non-obvious hidden variables.

(How to do this automatically?) American Express Boeing Caterpillar Citi Group MLD school 06 Copyright: C. Faloutsos, 2006 113 CMU SCS adv Hidden variables Dow Jones Industrial Average Alcoa Americ an

Express Goal: Boeing Find meaningful hidden variables Caterpil lar Data point Time tick (week) Weekly DJIA closing prices (01/02/1990-08/05/2002) n=660 data points, m=29 companies MLD school 06 Copyright: C. Faloutsos, 2006

114 CMU SCS Math (A1) How did stock prices change? 29 AA1, , XOM1 H11, H12, , H1m B11, B12, , B1m = AAn, , XOMn Hn1, Hn2, , Hnm

Time tick ? ? Bm1, Bm2, , Bmm Hidden variable n=660 Time tick MLD school 06 Copyright: C. Faloutsos, 2006 115 CMU SCS Hidden variables by AutoSplit

Caterpil lar B1,CAT B1,INTC Intel 0.94 0.64 0.63 0.03 Hidden v#1 (meaning??) MLD school 06 B2,INTC B2,CAT Hidden v#2 (meaning??) Copyright: C. Faloutsos, 2006 More

116 CMU SCS Hidden variables by AutoSplit Caterpil lar B1,CAT B1,INTC Intel 0.94 General trend MLD school 06 0.64 0.63 0.03 Copyright: C. Faloutsos, 2006

Internet bubble More B2,INTC B2,CAT 117 CMU SCS Making the most of ICA/AutoSplit Pattern discovery (hidden variables) Clusters (tech vs non-tech stocks) Outlier detection MLD school 06 Copyright: C. Faloutsos, 2006 118 CMU SCS

Companies related to hidden variable 1 B1,j Highest Lowest Caterpillar 0.938512 AT&T 0.021885 Boeing 0.911120 WalMart 0.624570 MMM

0.906542 Intel 0.638010 Coca Cola 0.903858 Home Depot 0.647774 Du Pont 0.900317 Hewlett-Packard 0.658768 MLD school 06

Hidden variable 1 : General trend Copyright: C. Faloutsos, 2006 119 CMU SCS Outlier detection General trend Outli er AT&T United Technologies Walmart Exxon Mobil MLD school 06

Copyright: C. Faloutsos, 2006 120 CMU SCS Conclusions for ICA Better than PCA Actually, uses PCA as a first step! MLD school 06 Copyright: C. Faloutsos, 2006 121 CMU SCS References Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press. Jolliffe, I. T. (1986). Principal Component Analysis, Springer Verlag. Aapo Hyvarinen, Juha Karhunen, and Erkki Oja Independent Component Analysis, John Wiley &

Sons, 2001. MLD school 06 Copyright: C. Faloutsos, 2006 122 CMU SCS References Jia-Yu Pan, Hiroyuki Kitagawa, Christos Faloutsos, and Masafumi Hamamoto. AutoSplit: Fast and Scalable Discovery of Hidden Variables in Stream and Multimedia Databases. PAKDD 2004 MLD school 06 Copyright: C. Faloutsos, 2006 123 CMU SCS

Outline ... Tools and case studies: PCA, ICA, Random Walks ICA; case study: visual vocabulary (ViVo) Random Walks Conclusions MLD school 06 Copyright: C. Faloutsos, 2006 124 CMU SCS ViVo: cat retina mining with Ambuj Singh, Mark Verardo, Vebjorn Ljosa, Arnab Bhattacharya (UCSB) Jia-Yu Tim Pan, HJ Yang (CMU) MLD school 06 Copyright: C. Faloutsos, 2006

125 CMU SCS Retina, its image, and the detachment retina stained by 3 antibodies (R,G,B) Layers of tissues MLD school 06 Copyright: C. Faloutsos, 2006 126 CMU SCS Computer Scientists View of Retinal Detachment normal

MLD school 06 detachment Copyright: C. Faloutsos, 2006 7 days after 127 CMU SCS Detachment Development Normal 1 day after detachment 7 days after detachment MLD school 06

28 days after detachment Copyright: C. Faloutsos, 2006 3 days after detachment 3 months after detachment 128 CMU SCS Data and Problem (Problem) What happens in retina after detachment? What tissues (regions) are involved? How do they change over time? How will a program convey this info? More than classification we want to learn what classifier learned MLD school 06 Copyright: C. Faloutsos, 2006

129 CMU SCS Visual vocabulary? news: president, minister, economic sports: baseball, score, penalty MLD school 06 Copyright: C. Faloutsos, 2006 130 CMU SCS Visual Vocabulary (ViVo)

generation Visual vocabulary Step 2: Extract tile features MLD school 06 Copyright: C. Faloutsos, 2006 Feature 2 8x12 tiles Step 3: ViVo generation Step 1: Tile image V1 V2 131

Feature 1 CMU SCS ID Biological interpretation ViVo V1 V10 V8 V11 MLD school 06 Description Condition GFAP in inner retina (Mller cells) Healthy Healthy outer segments of rod

photoreceptors Healthy Redistribution of rod opsin into cell bodies of rod photoreceptors Detached Co-occurring processes: Mller cell Detached hypertrophy and rod opsin redistribution Copyright: C. Faloutsos, 2006 132 CMU SCS Which tissue is significant on 7-day? MLD school 06 Copyright: C. Faloutsos, 2006 133

CMU SCS Automatic ViVo-annotation of images MLD school 06 A tile represents a ViVo vk if the largest coefficient of the tile is along the kth basis vector A ViVo vk represents a class ci if the majority of its tiles are in that class For each image, the representative ViVos for the class are automatically highlighted Copyright: C. Faloutsos, 2006 134 CMU SCS Conclusion

ICA can help us find ``good hidden variables (= visual vocabulary words) MLD school 06 Copyright: C. Faloutsos, 2006 135 CMU SCS Outline ... Tools and case studies: PCA, ICA, Random Walks ICA image captioning - Random Walks Conclusions MLD school 06 Copyright: C. Faloutsos, 2006 136

CMU SCS Image captioning: correlation between images and terms sea, sun, sky, water MLD school 06 cat, forest, grass, tiger Copyright: C. Faloutsos, 2006 ?, ?, ?, ? 137 CMU SCS Idea

each modality into a set of nodes (could have many more modalities: audio, motion, etc etc) random walk, on the resulting m-partite graph MLD school 06 Copyright: C. Faloutsos, 2006 138 CMU SCS Image captioning: extract regions sea, sun, sky, water MLD school 06 cat, forest, grass, tiger Copyright: C. Faloutsos, 2006

?, ?, ?, ? 139 CMU SCS Mixed-Media Graph Object-Attribute-Value (OAV) links region . r1 image term r2 . i1

. t1 r3 r4 r5 i2 t2 t3 r7 r6 . t4 t5 t6

r8 ij . t7 t8 From training images MLD school 06 r9 Copyright: C. Faloutsos, 2006 . ri ri+1 itest

. Image to caption 140 CMU SCS Mixed-Media Graph region . r1 image term r2 . i1 . t1

r3 r4 r5 i2 t2 t3 r7 r6 . t4 t5 t6 r8

ij . t7 t8 From training images MLD school 06 r9 Copyright: C. Faloutsos, 2006 . ri ri+1 itest . Image to caption

141 CMU SCS Mixed-Media Graph Nearest Neighbor (NN) links region . r1 image term r2 . i1 . t1

r3 r4 r5 i2 t2 t3 r7 r6 . t4 t5 t6 r8

ij . t7 t8 From training images MLD school 06 r9 Copyright: C. Faloutsos, 2006 . ri ri+1 itest . Image to caption

142 CMU SCS Random Walk with Restart (at Image to caption t=0) region . r1 image term r2 . i1 . t1 MLD school 06

r3 r4 r5 i2 t2 t3 r7 r6 . t4 t5 t6 r8

r9 ij . t7 t8 Copyright: C. Faloutsos, 2006 . ri ri+1 itest . 143 CMU SCS

Random Walk with Restart (at Image to caption t=1) region . r1 image term r2 . i1 . t1 MLD school 06 r3

r4 r5 i2 t2 t3 r7 r6 . t4 t5 t6 r8 r9 ij

. t7 t8 Copyright: C. Faloutsos, 2006 . ri ri+1 itest . 144 CMU SCS Random Walk with Restart (at Image to caption

t=2) region . r1 image term r2 . i1 . t1 MLD school 06 r3 r4 r5

i2 t2 t3 r7 r6 . t4 t5 t6 r8 r9 ij .

t7 t8 Copyright: C. Faloutsos, 2006 . ri ri+1 itest . 145 CMU SCS Random Walk with Restart (at Image to caption t=3) region

. r1 image term r2 . i1 . t1 MLD school 06 r3 r4 r5 i2 t2

t3 r7 r6 . t4 t5 t6 r8 r9 ij . t7 t8

Copyright: C. Faloutsos, 2006 . ri ri+1 itest . 146 CMU SCS Random Walk with Restart (at Image to caption t=0) region . r1 image

term r2 r3 r4 r5 r7 r6 r8 r9 . ri ri+1 (1-c)/2

. . i1 t1 i2 t2 t3 ij . t4 t5 t6 t7 . t8

(1-c)/2 itest . c Query/Restart node Blue step: restart at the query node (with probability c) Red step: randomly walk among one link (with prob. 1-c) MLD school 06 Copyright: C. Faloutsos, 2006 147 CMU SCS Random Walk with Restart (at Image to caption t>0) (1-c)/3 region

. r1 image term r2 r3 r4 (1-c)/3 r5 r7 r6 r8 r9 i1

i2 ij . ri ri+1 c (1-c)/3 . . . itest Query/Restart node . t1

t2 t3 t4 t5 t6 t7 t8 . Blue step: restart at the query node (with probability c) Red step: randomly walk among one link (with prob. 1-c) MLD school 06 Copyright: C. Faloutsos, 2006 148

CMU SCS Steady state probability of RWR Image to caption region . r1 image term . . r2 r3 r4 i1 t1 r5

i2 t2 t3 r7 r6 ij . t4 t5 r8 t6 t7 r9

. . t8 ri ri+1 itest . Query/Restart node Dark green: higher probability to be visited Light green: lower probability to be visited MLD school 06 Copyright: C. Faloutsos, 2006 149 CMU SCS

Steady state probability of RWR Image to caption region . r1 image term . . r2 r3 r4 i1 t1 r5 i2

t2 t3 r7 r6 ij . t4 t5 r8 t6 t7 r9 .

. t8 ri ri+1 itest . Query/Restart node Dark green: higher probability to be visited Light green: lower probability to be visited MLD school 06 Copyright: C. Faloutsos, 2006 150 CMU SCS Compute the steady state probability

Math u (1 c) Au cv 1 u c(I (1 c) A) v u : steady state probability v : restart vector A : transition matrix c : restart probability MLD school 06 Copyright: C. Faloutsos, 2006 151 CMU SCS Putting MMG+RWR to work

Tasks: Image captioning: GCap [Pan+, MDDE 2004] ( Multi-modal retrieval: MMSS [Pan+, ICDM 2004] ) MLD school 06 Copyright: C. Faloutsos, 2006 152 CMU SCS GCap: MMG+RWR for image captioning Image to caption region . r1 image term r2

. i1 . t1 r3 r5 i2 t2 Predicted caption: cat MLD school 06 r4 t3 r7 r6

. t4 t5 t6 r8 r9 ij . t7 t8 grass tiger Copyright: C. Faloutsos, 2006 .

ri ri+1 itest . Query/Restart node 153 CMU SCS Examples of captioning results Image Truth cat, grass, tiger, water grass, cat, tiger, Our caption water

mane, cat, lion, grass sun, water, tree, sky lion, grass, cat, mane tree, water, building, sky Predicted terms are listed in the order of likeliness. MLD school 06 Copyright: C. Faloutsos, 2006 154 CMU SCS Extension: other cross-modal queries e.g., text-to-text . r1 r2 .

i1 . t1 MLD school 06 r3 r4 r5 i2 t2 t3 r7 r6 . t4

t5 t6 r8 r9 ij . t7 t8 Copyright: C. Faloutsos, 2006 . ri ri+1

itest . 155 CMU SCS (Extension) Other cross-modal queries For example, the term-to-term correlation Given a term, find other similar terms. Term 1 Branch Birds Bridge Water Car Tracks MLD school 06 2

Night Arch Street 3 Owl Sky 4 Nest Stone Buildings Turn Copyright: C. Faloutsos, 2006 5 Hawk Boats prototype 156

CMU SCS Extension: group captioning . r1 r2 . i1 . t1 MLD school 06 r3 r4 r5 i2

t2 t3 r6 r7 . t4 t5 Copyright: C. Faloutsos, 2006 r8 ij r9 . .

ri ri+1 itest . 157 CMU SCS (Extension) Captioning images in groups Image Truth sun, water, tree, sky sun, clouds, sky, horizon sun, water

GCap caption tree, people, sky, water water, tree, people, sky sky, sun Group caption MLD school 06 sky, water, tree, sun Copyright: C. Faloutsos, 2006 158 CMU SCS Summary GCap = MMG+RWR, for image captioning

Easy to apply to all kinds of cross-modal data Outperforms the best previous image captioning results MLD school 06 Copyright: C. Faloutsos, 2006 159 CMU SCS OVERALL CONCLUSIONS Powerful tools for image/multimedia mining: GEMINI + R-trees for similarity search Feature extraction with Wavelets, PCA, ICA Cross-modal mining / auto-captioning: graph-based methods + Random Walks MLD school 06 Copyright: C. Faloutsos, 2006

160 CMU SCS christos cs.cmu.edu www.cs.cmu.edu/~christos MLD school 06 Copyright: C. Faloutsos, 2006 161

## Recently Viewed Presentations

• Bactrim needs large volumes of diluent because it is very insoluble. Potentiates warfarin. Bacteriostatic . Inhibit sequential steps in folic acid synthesis. The higher doses of Bactrim that cause bone marrow suppression are the doses that are used to treat...
• Linking Threats to Assets in Complex Ecological and Socio-Economic Systems: Qualitative Modelling for Tourism Development in North Western Australia
• Ceramics. Ceramic comes from Keramos (Greek for pottery) Early forms of ceramics were formed by the heat treatment of Clay. E.g. Kaolinite clay after firing above 1000°C forms Porcelain a ceramic
• Rome was the site of many famous temples, but one of the most impressive is the Pantheon with its enormous open dome. It still stands today, reconsecrated and redecorated as a church. Living in Rome. There were two basic types...
• Cheminformatics for Computational Chemistry and Computer-Aided Molecular Discovery R. S. Pearlman1, Y. Wu1, K. M. Smith2, and B. B. Masek2 1 Laboratory for the Development of CADD Software, College of Pharmacy, University of Texas, Austin TX
• Thematic . evaluations normally include: Analysis of the Bank's relevant policies and strategies. Analysis of the portfolio of projects. Interviews with relevant parties. In-depth evaluation of a sample of individual operations (below) Preparation of a thematic synthesis report including recommendations....
• TED 2013 : Feb 25 ~ Mar 01 - 2013, Longbeach, CA, USA Technology Build a school in the Cloud Sugata Mitra Lets teach kids to code Mitch Resnick A universal translator for surgeo
• Briseis is led from Achilles' tent to Agamemnon's. Reading packet: 281-82. Ajax. After Achilles, Ajax son of Telamon was the mightiest of the Greek heroes in the Trojan War. Ajax was a huge man, head and shoulders larger than the...