Real-Time Sign Language Recognition and Translation in Humanoid Robots Using Transformer-Based Model with a Knowledge Graph
DOI:
https://doi.org/10.51519/journalisi.v7i1.992Keywords:
Sign Language Translation, Human-Robot Interaction, NAO Robot, Transformer Model, Gesture Recognition, Knowledge Graph, Webots SimulationAbstract
For millions of deaf-mute individuals, sign language is the only means of communication; this creates barriers in daily interactions with non-signers, leading to the exclusion of these individuals in many areas of daily life. To address this, we propose a real-time sign language translation system using a Transformer model enhanced with a knowledge graph, designed for Human-Robot Interaction (HRI) with NAO robots. Our system bridges the communication gap by translating gestures into natural language (text). We used the RWTH-PHOENIX-Weather 2014T dataset for initial training, achieving a BLEU score of 29.1 and a Word Error Rate (WER) of 18.2% surpassing the baseline model. Due to the domain shift between human gestures and NAO robot gestures, we created a NAO-specific dataset and fine-tuned the model using transfer learning to accommodate an adapted environment and kinematic constraints that do not match the environment in which the robot was deployed. This reduced the WER to 17.6% and increased the BLEU score to 29.9. We tested our model’s capability with dynamic and practical HRI scenarios through comparative experiments in Webots. Integrating a knowledge graph into our model improved contextual disambiguation, significantly enhancing translation accuracy for gestures that weren't clear. Through effectively translating gestures into natural language, our system demonstrates strong potential for practical robotic applications that promote social accessibility.
Downloads
References
P. Markellou, M. Rigou, and S. Sirmakessis, "A Web Adaptive Educational System for People with Hearing Difficulties," Educ. Inf. Technol., vol. 5, pp. 189–200, 2000, doi: 10.1023/A:1009606818900.
D. Avola, M. Bernardi, L. Cinque, G. L. Foresti, and C. Massaroni, "Exploiting Recurrent Neural Networks and Leap Motion Controller for the Recognition of Sign Language and Semaphoric Hand Gestures," IEEE Trans. Multimedia, vol. 21, no. 1, pp. 234–245, Jan. 2019, doi: 10.1109/TMM.2018.2856094.
J. Li, J. Zhong, and N. Wang, "A Multimodal Human-Robot Sign Language Interaction Framework Applied in Social Robots," Front. Neurosci., vol. 17, 2023, doi: 10.3389/fnins.2023.1168888.
O. Koller, N. C. Camgoz, H. Ney, and R. Bowden, "Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos," IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 9, pp. 2306–2320, Sep. 2020, doi: 10.1109/TPAMI.2019.2911077.
N. C. Camgoz, S. Hadfield, O. Koller, H. Ney, and R. Bowden, "Neural Sign Language Translation," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, 2018, pp. 7784–7793, doi: 10.1109/CVPR.2018.00812.
J. Forster, C. Schmidt, and O. Koller, "Extensions of the Sign Language Recognition and Translation Corpus RWTH-PHOENIX-Weather," in Proc. Int. Conf. Lang. Resour. Eval. (LREC), 2014, pp. 1911–1916.
S. Tamura and S. Kawasaki, "Recognition of Sign Language Motion Images," Pattern Recognit., vol. 21, no. 4, pp. 343–353, Jan. 1988, doi: 10.1016/0031-3203(88)90048-9.
T. Starner, J. Weaver, and A. Pentland, "Real-Time American Sign Language Recognition Using Desk and Wearable Computer-Based Video," IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 12, pp. 1371–1375, Dec. 1998, doi: 10.1109/34.735811.
T. W. Chong and B. G. Lee, "American Sign Language Recognition Using Leap Motion Controller with Machine Learning Approach," Sensors, vol. 18, no. 10, Oct. 2018, doi: 10.3390/s18103554.
W. Qi, S. E. Ovur, Z. Li, A. Marzullo, and R. Song, "Multi-Sensor Guided Hand Gesture Recognition for a Teleoperated Robot Using a Recurrent Neural Network," IEEE Robot. Autom. Lett., vol. 6, no. 3, pp. 6039–6045, Jul. 2021, doi: 10.1109/LRA.2021.3089999.
P. Kumar, H. Gauba, P. P. Roy, and D. P. Dogra, "A Multimodal Framework for Sensor-Based Sign Language Recognition," Neurocomputing, vol. 259, pp. 21–38, Oct. 2017, doi: 10.1016/j.neucom.2016.08.132.
J. J. Bird, A. Ekárt, and D. R. Faria, "British Sign Language Recognition via Late Fusion of Computer Vision and Leap Motion with Transfer Learning to American Sign Language," Sensors, vol. 20, no. 18, Sep. 2020, doi: 10.3390/s20185151.
D. Wu et al., "Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 8, pp. 1583–1597, Aug. 2016, doi: 10.1109/TPAMI.2016.2537340.
Y. Wu and T. S. Huang, "Vision-Based Gesture Recognition: A Review," in Proc. Int. Conf. Comput. Vis., 1999, pp. 103–115, doi: 10.1007/3-540-46616-9_10.
J. F. Lichtenauer, E. A. Hendriks, and M. J. T. Reinders, "Sign Language Recognition by Combining Statistical DTW and Independent Classification," IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11, pp. 2040–2046, 2008, doi: 10.1109/TPAMI.2008.123.
R. Kaluri and C. H. P. Reddy, "An Enhanced Framework for Sign Gesture Recognition Using Hidden Markov Model and Adaptive Histogram Technique," Int. J. Intell. Eng. Syst., vol. 10, no. 3, pp. 11–19, Jun. 2017, doi: 10.22266/ijies2017.0630.02.
A. Tharwat, T. Gaber, A. E. Hassanien, M. K. Shahin, and B. Refaat, "SIFT-Based Arabic Sign Language Recognition System," Adv. Intell. Syst. Comput., vol. 334, pp. 359–370, 2015, doi: 10.1007/978-3-319-13572-4_30.
R. Cui, H. Liu, and C. Zhang, "A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training," IEEE Trans. Multimedia, vol. 21, no. 7, pp. 1880–1891, Jul. 2019, doi: 10.1109/TMM.2018.2889563.
W. Jintanachaiwat et al., "Using LSTM to Translate Thai Sign Language to Text in Real Time," Discover Artif. Intell., vol. 4, no. 1, Dec. 2024, doi: 10.1007/s44163-024-00113-8.
B. Saunders, N. C. Camgoz, and R. Bowden, "Progressive Transformers for End-to-End Sign Language Production," Apr. 2020.
X. Hei, C. Yu, H. Zhang, and A. Tapus, "A Bilingual Social Robot with Sign Language and Natural Language," in Proc. ACM/IEEE Int. Conf. Human-Robot Interact., IEEE Comput. Soc., Mar. 2024, pp. 526–529, doi: 10.1145/3610978.3640549.
S. Wang, X. Zuo, R. Wang, and R. Yang, "A Generative Human-Robot Motion Retargeting Approach Using a Single RGBD Sensor," IEEE Access, vol. 7, pp. 51499–51512, 2019, doi: 10.1109/ACCESS.2019.2911883.
B. Zhang, M. Müller, and R. Sennrich, "SLTUNET: A Simple Unified Model for Sign Language Translation," arXiv Preprint, May 2023.
P. Xie, T. Peng, Y. Du, and Q. Zhang, "Sign Language Production with Latent Motion Transformer," in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2024, pp. 3024–3034.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
Y. Hamidullah, J. van Genabith, and C. España-Bonet, "Sign Language Translation with Sentence Embedding Supervision," in Proc. 62nd Annu. Meeting Assoc. Comput. Linguistics (ACL), Bangkok, Thailand, Aug. 2024, pp. 425–434, doi: 10.18653/v1/2024.acl-short.40.
T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and G. Bouchard, "Complex Embeddings for Simple Link Prediction," in Proc. 33rd Int. Conf. Mach. Learn., 2016, vol. 48, pp. 2071–2080.
M. Gochoo et al., "Fine-Tuning Vision Transformer for Arabic Sign Language Video Recognition on Augmented Small-Scale Dataset," in Proc. IEEE Int. Conf. Syst. Man Cybern., IEEE, 2023, pp. 2880–2885, doi: 10.1109/SMC53992.2023.10394501.
M. Q. Li, B. C. M. Fung, and S.-C. Huang, "On the Effectiveness of Incremental Training of Large Language Models," in Proc. 12th Int. Conf. Large-Scale AI Systems (LSAIS), Nov. 2024, pp. 456–468, doi: 10.1145/lsais.2024.00113.
D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in Proc. 3rd Int. Conf. Learn. Representations (ICLR), Dec. 2014, pp. 1–15, doi: 10.48550/arXiv.1412.6980.
T. Sellam, D. Das, and A. P. Parikh, "BLEURT: Learning Robust Metrics for Text Generation," in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics (ACL), Apr. 2020, pp. 7881–7893, doi: 10.18653/v1/2020.acl-main.704.
C. Camargo, J. Gonçalves, M. Conde, F. J. Rodríguez-Sedano, P. Costa, and F. J. García-Peñalvo, "Systematic Literature Review of Realistic Simulators Applied in Educational Robotics Context," Jun. 02, 2021, MDPI AG, doi: 10.3390/s21124031.
L. H. Juang, "The Cooperation Modes for Two Humanoid Robots," Int. J. Soc. Robot., vol. 13, no. 7, pp. 1613–1623, Nov. 2021, doi: 10.1007/s12369-021-00753-1.
M. Q. Li, B. C. M. Fung, and S.-C. Huang, "On the Effectiveness of Incremental Training of Large Language Models," in Proc. 12th Int. Conf. Large-Scale AI Syst. (LSAIS), Nov. 2024, pp. 456–468, doi: 10.1145/lsais.2024.00113.
D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in Proc. 3rd Int. Conf. Learn. Represent. (ICLR), Dec. 2014, pp. 1–15, doi: 10.48550/arXiv.1412.6980.
T. Sellam, D. Das, and A. P. Parikh, "BLEURT: Learning Robust Metrics for Text Generation," in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics (ACL), Apr. 2020, pp. 7881–7893, doi: 10.18653/v1/2020.acl-main.704.
C. Camargo, J. Gonçalves, M. Conde, F. J. Rodríguez-Sedano, P. Costa, and F. J. García-Peñalvo, "Systematic Literature Review of Realistic Simulators Applied in Educational Robotics Context," Sensors, vol. 21, no. 12, pp. 4031, Jun. 2021, doi: 10.3390/s21124031.
L. H. Juang, "The Cooperation Modes for Two Humanoid Robots," Int. J. Soc. Robot., vol. 13, no. 7, pp. 1613–1623, Nov. 2021, doi: 10.1007/s12369-021-00753-1.
Downloads
Published
Issue
Section
License
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














