# Computer Vision I: Introduction - Stanford AI Lab Segmentation of Time Varying Images (and tracking) J. Kosecka, CS 223b Some slides From Computer Vision book D. Forsythe, J. Ponce Technique: Shot Boundary Detection Find the shots in a sequence of video shot boundaries usually result in big differences between

succeeding frames Strategy: compute interframe distances declare a boundary where these are big J. Kosecka, cs223b Possible distances frame differences histogram differences block comparisons edge differences Applications:

representation for movies, or video sequences find shot boundaries obtain most representative frame supports search Technique: Background Subtraction If we know what the background looks like, it is easy to identify interesting bits

Applications Person in an office Tracking cars on a road surveillance J. Kosecka, cs223b Approach: use a moving average to estimate background image subtract from current frame

large absolute values are interesting pixels trick: use morphological operations to clean up pixels Image Differencing J. Kosecka, cs223b Image Differencing: Results 1 frame difference

J. Kosecka, cs223b 5 frame difference Motion detection Background subtraction create an image of the stationary background by averaging a long sequence for any pixel, most measurements will be from the background computing the median measurements, for example, at each pixel, will with high probability assign that pixel the true background intensity - fixed threshold on differencing used to find

foreground pixels can also compute a distribution of background pixels by fitting a mixture of Gaussians to set of intensities and assuming large population is the background - adaptive thresholding to find foreground pixels J. Kosecka, cs223b A 300-Frame Sequence with a Busy Background click to start movie J. Kosecka, cs223b

Motion Detection Difference a frame from the known background frame even for interior points of homogeneous objects, likely to detect a difference this will also detect objects that are stationary but different from the background typical algorithm used in surveillance systems Motion detection algorithms such as these only work if the camera is stationary and objects are moving against a fixed background J. Kosecka, cs223b

Background Subtraction: Results Confidence corresponds to gray-level value. High confidence bright pixels, low confidence dark pixels. J. Kosecka, cs223b Background modeling: colorbased At each pixel model colors (r,g,b) or gray-level values g. The following equations are used to recursively estimate the mean and the variance at each pixel: = + (1 ) z t+1

t+1 t t2+1 = ( t2 + ( t+1 t ) 2 ) + (1 )( zt+1 t+1 ) 2 where zt+1 is the current measurement. The mean and the variance can both be time varying. The constant is set empirically to control the rate of adaptation (0< A pixel is marked as foreground if given red value r (or for any other say g) or b) we have | r measurement,

|> 3 max( , t J. Kosecka, cs223b r rcam Background model rcam is the variance of the camera noise, can be estimated from image differences of any two frames. If we compute differences for all channels, we can set a pixel as foreground if any of the

differences is above the preset threshold. Noise can be cleaned using connected component analysis and ignoring small components. Similarly we can model the chromaticity values rc, gc and use them for background subtraction: rc=r/(r+g+b), gc=g/(r+g+b) J. Kosecka, cs223b Background model: edge-based Model edges in the image. This can be done two different ways: Compute models for edges in a the average background image Subtract the background (model) image and

the new frame; compute edges in the subtraction image; mark all edges that are above a threshold. The threshold can be learned from examples The edges can be combined (color edges) or computed separately for all three color channels J. Kosecka, cs223b Foreground model Use either color histograms (4-bit per color), texture features, edge histograms to model the foreground Matching the foreground objects between

frames: tracking Can compare foreground regions directly: shift and subtract. SSD or correlation: M, N are two foreground regions. n C= n M(i, j N(i, j i j

= n n = n n [ M(i, j 2 N (i, j2 ]/ 2 J. Kosecka, cs223b

i = j= i = j= Histogram Matching Histogram Intersection min{h (i), h (i)} I (h , h ) = max{h (i), h (i)} c b c

b c b i i Chi Squared Formula ( hc (i ) hb (i )) ( hc , hb ) =2 hc (i ) + hb (i )

i 2 J. Kosecka, cs223b 2 Surveillance: J. Kosecka, cs223b Interacting people Background Subtraction

J. Kosecka, cs223b Background Subtraction J. Kosecka, cs223b Courtesy G. Hager Courtesy G. Hager J. Kosecka, cs223b Courtesy G. Hager J. Kosecka, cs223b

Adaptive Human-Motion Tracking Acquisition Decimation by factor 5 Motion detector Validation Skin color detector Grayscale convers. Motion Skin

colorpresence presence RGB to HSV convers. Image differencing Big contour presence Hue-saturat. Limiter Motion history im. Segmentation Skin color binary im.

Image closing Average travelled distance Segmentation Adaptation Continuous adaptation Motion initialization Tracking Distance scoring Contour to target

assignment Event creation J. Kosecka, cs223b Narrative-level output Adaptive Human-Motion Tracking J. Kosecka, cs223b QuickTime and a YUV420 codec decompressor are needed to see this picture.

J. Kosecka, cs223b Courtesy G. Hager