TY - JOUR
T1 - A multimodal LIBRAS-UFOP Brazilian sign language dataset of minimal pairs using a microsoft Kinect sensor
AU - Cerna, Lourdes Ramirez
AU - Cardenas, Edwin Escobedo
AU - Miranda, Dayse Garcia
AU - Menotti, David
AU - Camara-Chavez, Guillermo
N1 - Funding Information:
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001 , the Graduate Program in Computer Science (PPGCC) at the Federal University of Ouro Preto (UFOP) , the FAPEMIG (proc. CEX APQ 01517-17) and CNPq (Brazilian Funding Agencies).
Funding Information:
This study was financed in part by the Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior ? Brasil (CAPES) ? Finance Code 001, the Graduate Program in Computer Science (PPGCC) at the Federal University of Ouro Preto (UFOP), the FAPEMIG (proc. CEX APQ 01517-17) and CNPq (Brazilian Funding Agencies).
Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2021/4/1
Y1 - 2021/4/1
N2 - Sign language recognition has made significant advances in recent years. Many researchers show interest in encouraging the development of different applications to simplify the daily life of deaf people and to integrate them into the hearing society. The use of the Kinect sensor (developed by Microsoft) for sign language recognition is steadily increasing. However, there are limited publicly available RGB-D and skeleton joint datasets that provide complete information for dynamic signs captured by Kinect sensor, most of them lack effective and accurate labeling or stored in a single data format. Given the limitations of existing datasets, this article presents a challenging public dataset, named LIBRAS-UFOP. The dataset is based on the concept of minimal pairs, which follows specific categorization criteria; the signs are labeled correctly, and validated by an expert in sign language; the dataset presents complete RGB-D and skeleton data. The dataset consists of 56 different signs with high similarity grouped into four categories. Besides, a baseline method is presented that consists of the generation of dynamic images from each multimodal data, which are the input to two flow stream CNN architectures. Finally, we propose an experimental protocol to conduct evaluations on the proposed dataset. Due to the high similarity between signs, the experimental results using a baseline method reports a recognition rate of 74.25% on the proposed dataset. This result highlights how challenging this dataset is for sign language recognition and let room for future research works to improve the recognition rate.
AB - Sign language recognition has made significant advances in recent years. Many researchers show interest in encouraging the development of different applications to simplify the daily life of deaf people and to integrate them into the hearing society. The use of the Kinect sensor (developed by Microsoft) for sign language recognition is steadily increasing. However, there are limited publicly available RGB-D and skeleton joint datasets that provide complete information for dynamic signs captured by Kinect sensor, most of them lack effective and accurate labeling or stored in a single data format. Given the limitations of existing datasets, this article presents a challenging public dataset, named LIBRAS-UFOP. The dataset is based on the concept of minimal pairs, which follows specific categorization criteria; the signs are labeled correctly, and validated by an expert in sign language; the dataset presents complete RGB-D and skeleton data. The dataset consists of 56 different signs with high similarity grouped into four categories. Besides, a baseline method is presented that consists of the generation of dynamic images from each multimodal data, which are the input to two flow stream CNN architectures. Finally, we propose an experimental protocol to conduct evaluations on the proposed dataset. Due to the high similarity between signs, the experimental results using a baseline method reports a recognition rate of 74.25% on the proposed dataset. This result highlights how challenging this dataset is for sign language recognition and let room for future research works to improve the recognition rate.
KW - CNN
KW - Dynamic images
KW - Minimal pairs
KW - RGB-D data
KW - Sign language dataset
KW - Sign language recognition
UR - http://www.scopus.com/inward/record.url?scp=85096427678&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2020.114179
DO - 10.1016/j.eswa.2020.114179
M3 - Artículo (Contribución a Revista)
AN - SCOPUS:85096427678
SN - 0957-4174
VL - 167
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 114179
ER -