Wednesday, February 18, 2009

See What I'm Saying

Developments in Gesture Recognition
Contributed editorial appearing in
Scientific Computing & Instrumentation 21:10, September 2004, pg. 12.

I took the most important course I have ever taken during my junior year in high school — Business Typing. At first it was a very slow go. I had previously pecked around the keys a few times, but I knew if I was headed to college I had better get all 10 fingers working in concert. The first few weeks were used to build up the strength in our little fingers to get solid q's, a's, and right-brackets out of the old manual typewriters. Soon, the entire class was ablaze with ringing bells and noisy platen returns as we all struggled toward the coveted 75-words-per-minute rating.

Touch-typing immediately seemed like child's play on the electronic keyboards of our new IBM PCs in FORTRAN class, and we Business Typing graduates ripped through hundreds of punch cards in COBOL with nary a hanging chad. As word-processing, spreadsheet and procedural programming applications emerged, the keyboard/mouse combination felt like a natural interface. Soon after, notebook computers integrated the keyboard and mouse into something more compact with a touch pad or joystick, but the solution remained virtually unchanged.

This is the fourth year our program has supplied handheld PDAs to our majors. Not having an application in mind, we wanted to explore their integration into the laboratory and classroom in situ. The most surprising initial shortcoming is the PDA's near-zero input bandwidth. If you don't have a PDA to appreciate this, simply unplug the keyboard from your desktop computer and then accomplish your normal tasks using only your left mouse button. We are developing WiFi data acquisition and voice messaging applications to utilize wireless input modes, but informing students they can take notes in class and organize their schedules using only a stylus is definitely false advertising.

Handhelds and applications that are not primarily driven by word processing are bleating for alternative, non-keyboard input modes. IBM Research demonstrated their "DreamSpace" system at the 1997 Comdex computer exhibition in Las Vegas integrating their ViaVoice voice recognition application with digital body motion and gesture capture hardware running on a multiprocessor 200-MHz Pentium Pro under Windows NT 4. The system responded to natural commands such as "move this graphic object from here, to here" as the user gestured and spoke in a conversational voice - no mouse, and no keyboard. IBM predicted the cost/performance ratio of available hardware would make gesture and voice recognition commercially feasible by 2002. Unfortunately, the lead researcher for the team, Dr. Mark Lucente, left IBM in 1999 and, apparently, the project lost its champion.

Leading universities, including the Hand Gesture Tracking and Recognition team at Stanford, have addressed a large number of technical challenges including the determination of location, attitude, and shape of hands and fingers in a cluttered image. The current state-of-the art is past the technical impediments and on to the "how do we use the information" stage. Stanford's Interactive Workspaces Project is incorporating their gesture technology into workspaces called "iRooms" equipped with large displays, cameras, microphones and wireless transceivers that permit users to share data and images and manipulate the displays while evaluating new input and collaboration technologies.

Commercially, one of the first applications to utilize gesture recognition is for televised weather forecasting. Cybernet Systems Corp. in Ann Arbor, MI, introduced "GestureStorm" in late 2003. Television meteorologists often stand before a green screen and their image is merged with maps and animations for broadcast. Typically, the meteorologist must hit time cues during their presentation, advance maps using a remote, or even request the next map from off-camera system controllers. GestureStorm utilizes both body tracking and gesture recognition technology to allow the forecaster to control the pace and position of maps and animations using simple hand motions. Simply gesturing a circle around an area of interest zooms in on geographic regions containing late-breaking violent storms and posts their latest conditions from the weather database. Using other casual gestures, radar loops are popped up and advanced, paused, or rewound during the impromptu forecast in a fashion similar to the Tom Cruise character in the movie Minority Report.

Cybernet's other products for the defense and medical industries will undoubtedly result in technology transfer into the scientific visualization sector. I'm not in a big hurry to mothball my keyboard, but utilizing the large amount of hand waving I do while lecturing is very appealing.
blog comments powered by Disqus