The future of smart environments is likely to involve both passive and active interactions on the part of users. Depending on what sensors are available in the space, users may make use of multimodal interaction modalities such as hand gestures or voice commands. There is a shortage of robust yet controlled multimodal interaction datasets for smart environment applications. One application domain of interest based on current state-of-the-art is authentication for sensitive or private tasks, such as banking and email. We present a novel, large multimodal dataset for authentication interactions in both gesture and voice, collected from 106 volunteers who each performed 10 examples of each of a set of hand gesture and spoken voice commands chosen from prior literature (10,600 gesture samples and 13,780 voice samples). We present the data collection method, raw data and common features extracted, and a case study illustrating how this dataset could be useful to researchers. Our goal is to provide a benchmark dataset for testing future multimodal authentication solutions, enabling comparison across approaches.

Sarah Morrison-Smith, Aishat Aloba, Hangwei Lu, Brett Benda, Shaghayegh Esmaeili, Gianne Flores, Jesse Smith, Nikita Soni, Isaac Wang, Rejin Joy, Damon L. Woodard, Jaime Ruiz, and Lisa Anthony. 2020. MMGatorAuth: A Novel Multimodal Dataset for Authentication Interactions in Gesture and Voice. In Proceedings of the 2020 International Conference on Multimodal Interaction (ICMI ’20). Association for Computing Machinery, New York, NY, USA, 370–377. DOI:https://doi.org/10.1145/3382507.3418881

@inproceedings{10.1145/3382507.3418881,
author = {Morrison-Smith, Sarah and Aloba, Aishat and Lu, Hangwei and Benda, Brett and Esmaeili, Shaghayegh and Flores, Gianne and Smith, Jesse and Soni, Nikita and Wang, Isaac and Joy, Rejin and Woodard, Damon L. and Ruiz, Jaime and Anthony, Lisa},
title = {MMGatorAuth: A Novel Multimodal Dataset for Authentication Interactions in Gesture and Voice},
year = {2020},
isbn = {9781450375818},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3382507.3418881},
doi = {10.1145/3382507.3418881},
abstract = {The future of smart environments is likely to involve both passive and active interactions on the part of users. Depending on what sensors are available in the space, users may make use of multimodal interaction modalities such as hand gestures or voice commands. There is a shortage of robust yet controlled multimodal interaction datasets for smart environment applications. One application domain of interest based on current state-of-the-art is authentication for sensitive or private tasks, such as banking and email. We present a novel, large multimodal dataset for authentication interactions in both gesture and voice, collected from 106 volunteers who each performed 10 examples of each of a set of hand gesture and spoken voice commands chosen from prior literature (10,600 gesture samples and 13,780 voice samples). We present the data collection method, raw data and common features extracted, and a case study illustrating how this dataset could be useful to researchers. Our goal is to provide a benchmark dataset for testing future multimodal authentication solutions, enabling comparison across approaches.},
booktitle = {Proceedings of the 2020 International Conference on Multimodal Interaction},
pages = {370–377},
numpages = {8},
keywords = {gesture, biometrics, authentication, multimodal, voice, datasets},
location = {Virtual Event, Netherlands},
series = {ICMI '20}
}