IBM Research Extensible Language Interface for Robot Manipulation
IBM Research Extensible Language Interface for Robot Manipulation Jonathan Connell Exploratory Computer Vision Group Etienne Marcheret Speech Algorithms & Engines Group Sharath Pankanti (IBM Yorktown) Michiharu Kudoh (IBM Tokyo) Risa Nishiyama (IBM Tokyo) IBM Research Much of Intelligence Based on Two Illusions Animal part = mobility, perception, and reaction People flock around robots and readily anthropomorphize them Real-world action seems to convey a feeling of aliveness Responsiveness to changes in environment conveys sense of mind
Key point in the embodied / situated agents viewpoint Human part = learning by being told Bulk of human knowledge contained in culture, largely passed verbally No one discovers how to cook macaroni and cheese someone explains Lack of communication makes even people (e.g. foreigners) seem less human Goal is to fuse these two parts into a harmonious whole Analogy to a Turing machine Core is a simple finite state machine controller (= language interpreter) Addition of tape vastly increases computational power (= learning from language) 2 IBM Research Required Innate Mechanisms Segmentation Division of the world into spatial regions (partial segmentation okay) Positive space regions are objects, people, and surface Negative space regions are places and passages
Comparison Objects have properties like color and size that are different Objects have relations to other objects such as position Actions Operators can be indexed to operate on certain objects Most have expected continuation and / or end conditions Time Physical motions have expected durations Actions can be sequenced based on completion More complex actions can be built from simpler ones Language interpretation ties into all these pre-existing (animal) abilities Nouns, adjectives, prepositions, verbs, adverbs, conjunctions 3 IBM Research ELI: A Fetch-and-Carry Robot Use speech, language, and vision to learn objects & actions But not from lowest level like what is a word or what visual properties signal an object Build in as much as is practical
Save learning for terms not knowable a priori Names for particular items or rooms in a house How to perform special tasks like clean up Example dialog: command following verb learning noun learning advice taking Round up my mug. I dont know how to round up your mug. Walk around the house and look for it. When you find it bring it back to me. I dont know what your mug looks like. It is like this but sort of orange-ish. OK I could not find your mug. Try looking on the table in the living room. OK Here it is!
Potential use in eldercare scenario a service dog with less slobber 4 IBM Research Capabilities Illustrated Through 4 Part Video camera Arm and camera removed from robot and mounted on table Simplifies problem by reducing the degrees of freedom arm OTC medications (Advil & Gaviscon) 5 IBM Research Multi-Modal Interaction (video part 1) Features:
Automatically finds objects Selects by position, size, color Grabs selected object Understands pronoun reference Can ask clarifying questions Handles user pointing Robot points for emphasis 6 IBM Research Noun Learning Scenario (video part 2) Features: Builds visual models Adds new nouns to grammar Identifies objects from models Passes object to/from user Model = size + shape + colors Matching = nearest neighbor dist = w[i] * | v[i] m[i] | 7
IBM Research Once objects have names, more properties are available Oversee operation of physical robot to provide more intelligent action Eli Robot at Watson Vision ASR Parser Vocabulary Visual models Reasoning Semantic memory Action models Talk
Kinematics Brainy Response System at Tokyo Objects Sequencer context update Network Lifelog vetoes, recommendation Could envision a similar extension using RoboEarth online resource 8 Archive Retrieve
IBM Research Manipulation with Intelligent Backend (video part 3) Features: Vetoes actions based on DB Picks alternates using ontology Checks for valid dose interval Real-time cloud connection Alice aspirin DB Gavagai problem 9 lifelog history NO antacid Rolaids Tums
(requested) (present) 7:14 AM xxxxx 8:39 AM zzzzz 9:01 AM took Tylenol IBM Research Verb Learning Scenario (video part 4) Features: Learns action sequences Handles relative motion commands Responds to incremental positioning Applies new actions to other objects
poke 10 point 1.0 out 1.0 out -1.0 IBM Research ELI Arm Demos Video Also available on YouTube: http://www.youtube.com/watch?v=M2RXDI3QYNU 11
IBM Research Summary of Abilities Perception Automatically detects and counts visual objects Understands colors, sizes, and overall positions Action Can successfully reach for seen objects Can grasp and deposit objects in real world Language Parses and responds appropriately to speech commands Understands pointing and uses pointing itself Properly interprets object passing interactions Reasoning Knows limitation about what it can see, reach, and grab Asks clarifying questions when there are ambiguities Can alter actions based on known facts, histories, and ontologies Learning Acquires new visual object models and corresponding words Can verbally train and name a sequence of indexical actions
Differences from some AGI work Complete approach attacking core problem (language as tape) Concrete, physical, and implemented system (all integrated) 12 IBM Research Extensions What is still missing? Acquiring new data by observation & interaction Filling in holes in learned representations & procedures Fixing inaccuracies in taught knowledge Free the robot from top-down imperatives! Add initiative a smart assistant will look for answers himself Improvisation if something does not match perfectly, try a variation Experiential learning better to pick up a cup by rim instead of base 13
The best-first search algorithm Heuristic search of a hypothetical state space 13 A trace of the execution of best-first-search Heuristic search of a hypothetical state space with open and closed states highlighted The start state, first moves, and goal state...
Microsoft Vision & Roadmap. Dan Alling. Senior Product Manager. Connected Systems Division. Agenda. ... CommSee: 30k users, 100+ apps. Primary branch experience. World-wide IT support for 400k+ users. Increases productivity, reduces deployment time by 83 percent.
Such fact-supported theories are not "guesses" but reliable descriptions of the real world. Formation of the Universe The Big Bang Theory (BBT) Red Shift Shift of wavelengths of light towards the red end of the spectrum; happens as a light...
Basic and Fundamental Principles of Health Promotion Prof. Elba N. Ortiz MSN FNP-BC Catedrática Auxiliar Programa de FNP, RCM, UPR The original model included these four constructs: Perceived susceptibility (an individual's assessment of their risk of getting the condition) Perceived...
Project name LFRic after Lewis Fry Richardson. ... IO. Coupling (internal and external) Modular science components. Field operations (local partition) ... PSyKAl structure and earliest developments based on GungHo collaboration computational science recommendations.
In the Young's slits experiment, the effect of light coming through both slits is not simply the sum of the effects of the two beams coming through the two slits individually. ... Microscopic realism OR macroscopic "counter-factual definiteness" BELL'S THEOREM...
Agenda. Cisco Smart Services. Smart Net Total Care Review. Smart Net Total Care Use CaseScenarios. To help you understand Smart Net Total Care and the value it can provide, I will first give some insight on the changes we see...
Ready to download the document? Go ahead and hit continue!