The Kinect body tracking pipeline -

The Kinect body tracking pipeline -

The Kinect body tracking pipeline Oliver Williams, Mihai Budiu Microsoft Research, Silicon Valley With slides contributed by Johnny Lee, Jamie Shotton NASA Ames, February 14, 2011 Outline

Hardware overview The body tracking pipeline Learning a classifier from large data Conclusions 2

What is Kinect? 3 ~2000 people Caveat: we only have knowledge about a small part of this process. 4 Input device

5 The Innards Source: iFixit 6 The vision system

RGB camera IR camera IR laser projector Source: iFixit 7

RGB Camera Used for face recognition Face recognition requires training Needs good illumination 8 The audio sensors

4 channel multi-array microphone Time-locked with console to remove game audio 9 Prime Sense Chip Xbox Hardware Engineering dramatically improved upon Prime Sense reference design performance Micron scale tolerances on large components Manufacturing process to yield ~1 device / 1.5 seconds

10 Projected IR pattern 11 Source: Depth computation

Source: 12 Depth map Source: 13 Kinect video output

30 HZ frame rate 57deg field-of-view 8-bit VGA RGB 640 x 480 11-bit monochrome 320 x 240 14

XBox 360 Hardware Triple Core PowerPC 970, 3.2GHz Hyperthreaded, 2 threads/core 500 MHz ATI graphics card DirectX 9.5 512 MB RAM 2005 performance envelope Must handle real-time vision AND

a modern game Source: 15 THE BODY TRACKING PIPELINE 16

Generic Extensible Architecture Expert 1 fuses the hypotheses Expert 2 Arbiter Expert 3 probabilistic

Raw Sensor data Skeleton Stateless Statefull estimates

Final estimate 17 One Expert: Pipeline Stages Sensor Body Part Classifier

Depth map Body Part Identification Background segmentation Player separation Skeleton

18 Sample test frames 19 Constraints No calibration - no start/recovery pose - no background calibration

- no body calibration Minimal CPU usage Illumination-independent 20 The test matrix body size

hair FOV body type clothes angle pets furniture 21

Preprocessing Identify ground plane Separate background (couch) Identify players via clustering 22 Two trackers Hands + head tracking

Body tracking not exposed through SDK 23 The body tracking problem Classifier

Input Depth map Runs on GPU @ 320x240 Output Body parts 24

Training the classifier Start from ground-truth data depth paired with body parts Train classifier to work across pose scene position Height, body shape

25 Getting the Ground Truth (1) Use synthetic data (3D avatar model) Inject noise 26

Getting the Ground Truth (2) Motion Capture: - Unrealistic environments - Unrealistic clothing - Low throughput 27 Getting the Ground Truth (3)

Manual Tagging: - Requires training many people Potentially expensive Tagging tool influences biases in data. Quality control is an issue 1000 hrs @ 20 contractors ~= 20 years 28

Getting the Ground Truth (4) Amazon Mechanical Turk: - Build web based tool Tagging tool is 2D only Quality control can be done with redundant HITS 2000 frames/hr @ $0.04/HIT -> 6 yrs @ $80/hr

29 Classifying pixels Compute P(ci|wi) pixels i = (x, y) body part ci image window wi example image windows

window moves with classifier Learn classifier P(ci|wi) from training data randomized decision forests 30 Features -

( ) -- depth of pixel x in image I = (u,v) -- parameter describing offets u and v 31 From body parts to joint positions

Compute 3D centroids for all parts Generates (position, confidence)/part Multiple proposals for each body part Done on GPU 32 From joints positions to skeleton Tree model of skeleton topology

Has cost terms for: Distances between connected parts (relative to body size) Bone proximity to body parts Motion terms for smoothness 33 Where is the skeleton?

34 LEARNING THE BODY PARTS CLASSIFIER FROM A MOUNTAIN OF DATA 35 Learn from Data Training examples Machine learning

Classifier 36 Cluster-based training Classifier Training examples

Machine learning DryadLINQ > Millions of input frames > 1020 objects manipulated

Sparse, multi-dimensional data Complex datatypes (images, video, matrices, etc.) Dryad 37 Data-Parallel Computation Application

SQL Language Execution Storage Parallel Databases Sawzall, Java

Sawzall,FlumeJava MapReduce GFS BigTable SQL LINQ, SQL

Pig, Hive DryadLINQ Scope Hadoop HDFS S3 Dryad

Cosmos Azure SQL Server 38 Dryad = 2-D Piping Unix Pipes: 1-D grep | sed | sort | awk | perl

Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50 39 Virtualized 2-D Pipelines 40 Virtualized 2-D Pipelines

41 Virtualized 2-D Pipelines 42 Virtualized 2-D Pipelines 43

Virtualized 2-D Pipelines 2D DAG multi-machine virtualized 44 Fault Tolerance

LINQ => DryadLINQ Dryad 46 LINQ = .Net+ Queries Collection collection; bool IsLegal(Key); string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 47 DryadLINQ Data Model .Net objects Partition

Collection 48 DryadLINQ = LINQ + Dryad Vertex code Collection collection; bool IsLegal(Key k);

string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; Query plan (Dryad job) Data

collection C# C# C# C#

results 49 Language Summary Where Select GroupBy OrderBy Aggregate

Join 50 machine Highly efficient parallellization time 51

CONCLUSIONS 52 Huge Commercial Success 53 Tremendous Interest from Developers

54 Consumer Technologies Push The Envelope Price: 6000$ Price: 150$ 55

Unique Opportunity for Technology Transfer 56 I can finally explain to my son what I do for a living 57

Recently Viewed Presentations

  • Bilateral Project, Modul III, Funded by ANCS, Program

    Bilateral Project, Modul III, Funded by ANCS, Program

  • Process Improvement: Is It Always Worth It? Damon

    Process Improvement: Is It Always Worth It? Damon

    Data-driven approach to eliminate defects in a process. Strives to eliminate variability. ASQ Black Belt. Certification involving six sigma principles. Kaizen. Japanese word for improvement. Continuous improvement involving all levels of employees.
  • Cost Modeling for Sustainable Services

    Cost Modeling for Sustainable Services

    Like many of you, we are being asked to transition to a partial cost-recovery operating mode, and to do so we need to understand fully our costs in order to place our service offerings on a truly sustainable footing. To...
  • CISSP Common Body of Knowledge - OpenSecurityTraining

    CISSP Common Body of Knowledge - OpenSecurityTraining

    CISSP® Common Body of Knowledge Review:Cryptography Domain - Part 2. Version: 5.9.2. ... Polyalphabetic (or running key) cipher. Concealment. Modern ciphers: Block cipher. Stream cipher. Steganography. Combination - - Review of Part 1. Hash Function Cryptography.
  • Criminal Law - Southcorner Barber

    Criminal Law - Southcorner Barber

    What is Criminal Law? Indictable Offences- major/serious, "true crime" Summary Offences- minor, "true crime" as well. Hybrid offences- allows for the Crown to direct the path of legislation as they see fit. Quasi-Criminal- regulatory offences- non- serious, not technically crimes...
  • Ch. 8 Defensive Driving - Fillmore Central School / Overview

    Ch. 8 Defensive Driving - Fillmore Central School / Overview

    A driver can become distracted from safe driving by use of a mobile telephone (like a cellular telephone). In New York State, it is a traffic infraction, and you could . pay a fine up to $100 . and five...
  • Diapositive 1

    Diapositive 1

    Les justiciers de Merlin Le visiteur de la nuit Le chevalier au dragon Chris Pepper travaille beaucoup à l'ordinateur avec divers programmes de graphismes dont Adobe Photoshop, Adobe Illustrator, Kinetix 3D Studio, Painter et Paint Shop Pro.
  • Delirium (When things really do go bump in

    Delirium (When things really do go bump in

    Has an additional disturbance in cognition (e.g., memory deficit, disorientation, language, visuospatial ability, or perception) Not accounted for by other neurocognitive disorders. Caused by a general medical condition; can be multiple etiologies (Can be caused by a general medical condition...