Motion Gestures for Mobile Interaction

Hand motion -- pointing, gesturing, grasping, shaking, tapping -- is a rich channel of communication. We point and gesture while we talk; we grasp tools to extend our capabilities; we grasp, rotate, and shake items to explore them. Yet, the rich repertoire of hand motion is largely ignored in interfaces to mobile computation: the user of a modern smartphone generally holds the device stationary while tapping or swiping its surface. Why are so many possible affordances ignored? Certainly not for technical reasons, as smartphones contain an evolving set of sensors for recognizing movement of the phone, including accelerometers, gyroscopes and cameras. However, beyond rotating to change screen orientation or shaking to shuffle songs, little has been done to enable rich gestural input through device motion.

This page describes some of our on-going work in design, recognition and characterizing of motion gestures to control modern smartphones.

Designing Motion Gestures

Little is known about best-practices in motion gesture design for the mobile computing paradigm. To address this issue, we performed a elicitation study for motion gestures which elicits natural gestures from end-users as follows: given a task to perform with the device (e.g. answer the phone, navigate East in a map), participants were asked to specify a motion gesture that would execute that task. Results from our study demonstrated that a consensus exists among our participants on parameters of movement and on mappings of motion gestures onto commands. We use this consensus to develop a taxonomy for motion gestures and to specify an end-user inspired motion gesture set.

The implications of this research to the design of smartphone appliances are two-fold. First, from the perspective of smartphone application designers, the taxonomy of physical gestures and our understanding of agreement for user-defined gestures allow the creation of a more natural set of user gestures. They also allow a more effective mapping of motion gestures onto commands invoked on the system. Second, from the perspective of companies that create smartphones and smartphone operating systems, this study provides guidance in the design of sensors (i.e. what features of three dimensional motion must we distinguish between) and toolkits (i.e. what gestures should be recognized and accessible to application context) to support motion gesture interaction at both the application and the system level.

Audio Cues to Support Motion Gesture Interaction

Our work on motion gestures found that the lack of feedback on attempted motion gestures made it difficult for participants to diagnose and correct errors, resulting in poor recognition performance and user frustration. As a result, we developed a training and feedback technique, Glissando, which uses audio characteristics to provide feedback on the system’s interpretation of user input. This technique enables feedback by verbally confirming correct gestures and notifying users of errors in addition to providing continuous feedback by mapping distinct musical notes to each of three axes and manipulating pitch to specify both spatial and temporal information.

Results from our studies demonstrate that Glissando provides adequate feedback to users both with and without continuous feedback, though provision of continuous feedback is more preferred. Additionally, we show that while users have difficulty with strict time limits, temporal information can be provided via Glissando’s continual audio feedback by manipulating the tempo of the reference gesture. Finally, our studies show that adding audio feedback conveys temporal information better than visual demonstration alone.

Improving the Reliability of Motion Gesture Input

A major technical barrier for adopting motion-gesturebased interaction is the need for high recognition rates with low false positive conditions. This obstacle limits the potential of good interaction design since many proposed physical motions are indistinguishable from everyday motions (i.e., walking). Hence, these gestures suffer from the inherent difficulty that they generate a large number of false positives.

To overcome this obsticle, we have proposed two methods to distinguish input motion from everyday motion: a delimiter and bi-level thresholding.

DoubleFlip: A Motion Gesture Delimiter for Mobile Interaction

Our initial work highlighted a major technical barrier for adopting motion-gesture based interaction, the need to be able to distinguish motion input from everyday motion. This problem is exacerbated by the fact that the results from the study demonstrated that motion input should mimic normal movement and thus have very similar kinematic profiles. To address the problem of segmenting motion gestures from everyday movement, I created DoubleFlip, a unique motion gesture designed to act as an input delimiter for mobile motion gestures (i.e., signaling to the system that the following motion should be considered as a gesture). Based on a collection of 2,100 hours of motion data captured from 99 users, I demonstrated that DoubleFlip is distinct from everyday motion, can easily be recognized, and can be performed quickly using minimal physical space. Therefore, the DoubleFlip gesture provides an "always on" input event for mobile interaction.

Bi-Level Threshold Recognition for Mobile Motion Gestures

Another technique for discriminating motion gestures from random device movement is to create a threshold, i.e. a criterion value, that optimizes the trade-off between false positives (accidental activations) and false negatives (failed attempts to perform a gesture). If the criterion value is too permissive, many false positives will occur. However, if the criterion value is too restrictive, it may become very difficult for the system to reliably identify intentional user gestures from its input stream. The designers of systems frequently use visualization techniques like receiver operating characteristic (roc) curves to identify the best criterion value for a recognizer. Despite this, a majority of the motion gesture research uses delimiters, not criterion values, presumably because of the difficulty of selecting a criterion value that appropriately balances false positives and false negatives.

We address the challenge of non-activations by creating a novel, bi-level thresholding technique for selecting a criterion value that is both appropriately restrictive and yet does not yield a prohibitively high number of false negatives. Our bi-level thresholding technique works as follows: if a user-performed gesture does not meet a strict threshold, we then consider the gesture using a relaxed threshold – a more permissive criterion value – and wait to see if a similar motion follows it. The system will recognize a gesture either if the end-user performs a tightly thresholded motion gesture (i.e. success in the first instance), or if the user performs two relaxed thresholded gestures within a short period of time.

Our results show that, when available, the bi-level thresholding technique frequently catches input gestures that ahn optimized criterion function misses. The end result is that end-users need fewer attempts to successfully activate motion gestures using bi-level thresholding.

Characterizing Motion Gestures as an Input Modality

Motion gestures have attractive features that recommend them as a mechanism for issuing commands on a smartphone. First, these motion gestures expand the input bandwidth of modern smartphones. For example, motion gestures can either serve as modifiers of surface gestures, or they can be mapped to specific device commands. Second, alongside the increase in bandwidth, motion gestures can represent a set of shortcuts for smartphone commands. For actions performed using the touchscreen, the phone must typically be in a specific state, e.g. a specific application must be running, or a specific toolbar must be invoked, whereas for motion gestures, the commands mapped to the gestures can be always available. Finally, motion gestures may require less visual attention than taps or gestures on the touchscreen because the physical location of the smartphone can be sensed via proprioception. As a result of the potential advantages of motion gestures for smartphone input, researchers have explored various aspects of the design of motion gesture interaction.

One specific advantage of proprioceptive sensing is that motion gestures may be particularly beneficial as an input modality in a subset of tasks where the user is distracted while using the smartphone. There are many examples of distracted input on smartphones. For example, users frequently access email and text messages on their smartphone while walking. Therefore, users must split their attention between the task of navigating their physical environment and navigating information on the smartphone screen. As another example, users frequently invoke brief commands on their smartphones while driving. While it may be undesirable to have a user interact with their device while driving, users will continue to perform short commands.

While motion gestures have many theoretical advantages as an input technique for distracted users, little is known how motion gestures compare to on-screen input for distracted interaction. Our current work is to compare and analyze the relative cognitive cost of motion gestures, tap and surface gestures as input for smartphone devices under conditions of light distraction.