eISSN: 2093-8462 http://jesk.or.kr
Open Access, Peer-reviewed
Kewei Zhang
, Younghwan Pan
10.5143/JESK.2025.44.2.95 Epub 2025 May 02
Abstract
Objective: The aim of this study is to investigate the key user experience factors between graphical user interface (GUI) and conversational user interface (CUI) in human-vehicle interaction systems.
Background: Vehicle interface systems have evolved significantly over the past few decades. Early systems primarily relied on graphical interfaces, using touchscreens, physical buttons, and knobs to control functions such as navigation, climate, and media. However, with the increasing development of artificial intelligence (AI) and voice recognition technologies, vehicle manufacturers have introduced intelligent dialogue systems that allow for voice-driven control. These intelligent systems are seen as a promising solution to enhance safety, usability, and convenience by reducing cognitive load and manual distractions for drivers. Despite this evolution, systematic usability evaluation between these two interaction paradigms remain limited. Investigating the characteristics, key user factors, and challenges of GUI and CUI systems is critical for improving human-vehicle interaction design.
Method: A systematic literature review was conducted to identify and analyze studies on user experience factors interaction devices, and evaluation methods related to both graphical and voice-based interactions in vehicle interface systems. Research articles from 2014 to 2024 were included in this review. A total of 43 studies were selected and analyzed.
Results: This literature review identified and categorized key user experience factors for both graphical and intelligent dialogue interfaces in vehicle systems. While graphical interfaces emphasize usability, efficiency, learnability, responsiveness, cognitive load and safety. Conversational interfaces are focused on naturalness, efficiency, responsiveness, usability, accuracy of recognition, trust, safety and anthropomorphism. The review highlighted the importance of considering shared user experience factors in vehicle interface design.
Conclusion: This study systematically identified and analyzed key user experience factors of GUI and CUI in human-vehicle interaction. While GUI offers a robust platform for visually-intensive tasks, VUI introduces a more natural and conversational interaction model. The findings highlight the need for hybrid systems that integrate the strengths of both interaction paradigms to improve user experience. However, the studies reviewed did not provide extensive performance metrics for all interaction types, suggesting a need for further research that includes standardized evaluation criteria across different scenarios.
Application: The findings from this study can inform the design and evaluation of vehicle interface systems, guiding manufacturers in the development of both graphical and intelligent dialogue systems. Understanding the key factors will assist in creating interfaces that are more intuitive, efficient, and satisfying for drivers, ultimately contributing to improved usability in modern vehicles.
Keywords
Human-vehicle interaction Graphical User Interface (GUI) Conversational User Interface (CUI) User experience factors
In recent years, advancements in cutting-edge technologies such as autonomous driving, artificial intelligence (AI), and 5G connectivity have profoundly influenced the automotive industry. According to IHS Markit, the global market for intelligent vehicle cockpits is projected to reach $68.1 billion by 2030. Modern vehicles are increasingly integrating features like advanced driver-assistance systems (ADAS) and intelligent conversational assistants, making driving experiences more efficient, seamless, and engaging (Garikapati and Shetiya, 2024). Intelligent human-vehicle interaction system is the key to future automotive development and the evolution of in-Vehicle Human-Machine Interface (iHMI) has been pivotal in redefining how users engage with vehicles (DanNuo et al., 2019). Traditional interaction paradigms, dominated by physical controls such as buttons, dials, and touchscreens, are gradually giving way to more technical-driven systems powered by motion capture and speech recognition technologies (Tan et al., 2022).
GUI offers an information-intensive visual interaction method, while CUI relies on voice commands to enable hands-free operation. This transition from traditional visual-driven interfaces to speech-driven systems introduces new opportunities and challenges for user experience which is also the focus of human-vehicle interaction design research (Li et al., 2021). However, existing research primarily focuses on the independent exploration of GUI or CUI technologies and performance (Chen et al., 2024; Jianan and Abas, 2020), but lacks systematic studies on their usability analysis.
This study searches and integrates the literature in related fields to explore the user experience factors of GUI and CUI in driving scenarios. The focus is to analyze their characteristics from the in-vehicle interactive interface perspective, and extract the key UX factors. A clear understanding of the user experience factors that should be considered in GUI and CUI's interaction system supports the optimization of user experience in the next generation of in-vehicle interaction systems, and these factors can be applied as scales in future iHMI interaction evaluation studies. The following detailed research questions were set to achieve the research objectives.
RQ1: What research are GUI and CUI currently conducted from a user experience perspective?
RQ2: What are the key UX factors that influence GUI and CUI?
For the research question 1, the theoretical background to this investigation arises from Human-Computer Interaction (HCI) theory, which suggests that user satisfaction is significantly impacted by how well a system aligns with user expectations in terms of usability, efficiency, and cognitive load (Norman, 2013). This shift toward GUI to CUI represents a frontier in human-vehicle interaction research, as the new form of interface paradigm addresses new user needs. The necessity of examining both systems from a UX perspective lies in identifying the distinct challenges, and the method is to sort out and summarize the existing literature research. Addressing this gap in the literature is critical, as current studies often examine these interfaces independently, limiting the insights on how they may complement one another within the same vehicle system.
For the research question 2, user experience (UX) is an important consideration in the design and evaluation of interactive systems, especially in high-risk environments such as driving. In the context of vehicular interfaces, a growing body of literature identifies various factors that influence the effectiveness of GUIs and CUIs, such as usability, responsiveness, safety, and efficiency (Farooq et al., 2019). However, there is still a lack of comprehensive understanding of the key user experience factors that influence both interfaces in vehicle systems. Finding key user experience factors through a combination of qualitative and quantitative research methods can help maximize the user experience by seamlessly integrating graphical and conversational user interfaces. In addition, understanding these factors will allow for the development of more standardized evaluation criteria for future in-vehicle systems, which will be critical for comparative performance evaluations across platforms and technologies.
This study employed the following research methodology (Figure 1). First, based on the work of Ruijten et al. (Ruijten et al., 2018), intelligent vehicle interfaces were categorized into GUI (Graphical User Interface) and CUI (Conversational User Interface). This categorization was used to determine the relevant database search keywords.
Subsequently, a systematic literature review was conducted using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) methodology, a widely cited framework that provides a checklist of essential items for reporting systematic reviews and meta-analyses (Moher et al., 2009). The review focused on three major international academic databases: ACM Digital Library, Web of Science (WOS), and IEEE Xplore. ACM and IEEE Xplore include specialized journals on human-computer interaction, providing extensive resources on human-vehicle interaction, while WOS covers broader research in user experience design as a supplement. The search query was constructed using Boolean operators, combining keywords from three categories: GUI, CUI, and IHMI (keywords listed in Table 1).
Context |
Keywords |
GUI |
"In-Vehicle graphical user interface" OR
"In-Vehicle visual user interface" |
CUI |
"In-Vehicle conversational user interface" OR
"In-Vehicle speech-based Interface" |
IHMI |
"In-Vehicle
human-machine interface" OR "IHMI" OR "human vehicle
interface" OR "Driver-Vehicle
Interaction" OR "in-vehicle interaction" |
The search was conducted in December 2024. The process of paper retrieval and selection is illustrated in Figure 2. A total of 264 papers were initially collected, including 116 from Web of Science, 63 from ACM Digital Library, and 85 from IEEE Xplore. Only research articles were included, excluding conference reports, review papers, and workshop proposals. After removing 23 duplicate papers, only studies focusing on IHMI (In-Vehicle Human-Machine Interface) systems were retained, excluding those related to eHMI (External Human-Machine Interface), resulting in 112 eligible papers. These 112 papers were further screened based on their titles and abstracts. Articles irrelevant to human-computer interaction and user experience—such as those focusing solely on algorithms, software, or toolkits—were excluded, leaving 37 papers that met all inclusion criteria for detailed review.
3.1 RQ1 Analysis of related research
To address Research Question 1, we conducted a systematic literature review. Referenced on the study from Kim et al. (2015), which classified iHMI (In-Vehicle Human-Machine Interface) from the perspective of UX, this paper categorizes GUI interaction types into three categories: physical buttons, touch screens, and gestures, a total of 27 studies in the three categories. The CUI interaction is represented by speech, a total of 13 studies in the three categories. Three of the articles contain studies of both GUI and CUI interactions.
3.1.1 Graphical interface
The graphical interface in in-vehicle systems refers to the visual display of information to drivers, using icons, text, colors, and other graphical elements as visual cues. These interfaces provide intuitive and efficient prompts for information and operations (Gao et al., 2024). The core objective of such interfaces is to visually communicate vehicle status and operational feedback to drivers, allowing them to interact via screens, HUDs, or physical controls (Tan et al., 2022). In recent years, graphical interfaces have expanded their applications to areas such as dynamic information display (Trivedi, 2007), multitasking management (Harrison and Hudson, 2009), and user behavior feedback (Brouet, 2015). They demonstrate significant advantages in enhancing drivers' cognitive efficiency and operational accuracy (Jansen et al., 2022). However, potential cognitive load remains a key challenge, especially in high-dynamic environments where drivers frequently shift their attention. This could result in operational errors or distractions (Beringer and Maxwell, 1982). Therefore, the impact of graphical interfaces on safety is a critical aspect of user experience in human-vehicle interaction. This study focuses on three main GUI interaction modes: physical buttons, touch screens, and gestures.
Physical button
Physical buttons, a long-established form of interaction in in-vehicle systems, are characterized by their simple structure, reliable operation, and highly intuitive tactile feedback (Morvaridi Farimani, 2020). This traditional interaction method has demonstrated excellent applicability in driving environments, particularly in scenarios requiring quick actions or low attentional demands (Beringer and Maxwell, 1982). Research indicates that physical button design should emphasize functional modularization and interface simplification, enabling drivers to perform operations quickly with limited attention. For example, arranging frequently used buttons within natural reach of the steering wheel or central console can significantly reduce operation time and attention shifts (Harrison and Hudson, 2009).
However, the static nature of traditional physical buttons limits their functional scalability, making it challenging to meet the complex demands of modern in-vehicle interaction systems (Yan and Xie, 1998). To address this challenge, recent advancements in dynamic button technologies have emerged. These integrate tactile feedback modules into visual displays, enabling real-time layout adjustments based on task requirements, thereby enhancing interface flexibility and adaptability (Jansen et al., 2022). For instance, designs incorporating deformable buttons can simulate the tactile sensation of physical buttons, allowing users to experience realistic feedback during screen operations, effectively merging the functions of touch screens and traditional buttons (Fumelli et al., 2024). Furthermore, Brouet's research indicates that physical buttons exhibit shorter response times in emergency scenarios (Brouet, 2015). In high-speed driving or complex traffic environments, drivers can quickly activate emergency braking buttons, ensuring greater safety. Although physical buttons have a low learning curve, making them easier for drivers to master, excessive button design can lead to interface clutter, increasing cognitive load and the risk of operational errors (Zaletelj et al., 2007). Therefore, physical button layouts should align with drivers' natural motion habits, such as placing frequently used buttons within the driver's natural arm extension range. Additionally, secondary functions should include appropriate visual indicators and feedback to support rapid operations (Yeo et al., 2015).
Touch based interaction
Touch screen is a graphical representation of buttons that can be resized to fit the screen space (Lee and Zhai, 2009). With technological advancements, this highly efficient interaction method has been applied to functions such as navigation, multimedia, and climate control, becoming a mainstream iHMI interaction method in the past decade (Zhang et al., 2023).
The core advantage of touchscreens lies in their ability to offer highly customizable interfaces, enabling users to perform various tasks through single or multi-touch gestures, such as zooming maps, swiping menus, and selecting options. This intuitive interaction significantly reduces the learning curve for users (Talbot, 2023). Touchscreens can display layered interfaces to present various types of information, optimizing multitasking management in driving environments. For instance, navigation information, vehicle status, and entertainment system data can be displayed simultaneously, allowing drivers to quickly access critical data (Mandujano-Granillo et al., 2024). Multitouch technology is a notable innovation for touchscreens, enabling complex tasks through gesture recognition. For example, users can adjust map scales with two-finger zooming or switch playlists with swipe gestures, substantially enhancing operational efficiency and user experience (Leftheriotis and Chorianopoulos, 2011).
While touch-based interfaces have advantages such as simplicity, rich information display, and entertainment value over traditional physical buttons (Large et al., 2019), their use for common secondary vehicle controls and infotainment services often demands excessive visual attention. This may adversely affect driving performance and vehicle control, increasing risks for drivers and other road users. Therefore, the design of touch screens is important for user safety. Bae et al. (2023) indicated that the larger touch buttons and higher screen positions in in-vehicle information systems reduce distraction and improve performance.
In comparison to physical buttons, touchscreens lack tactile feedback, leading to decreased task performance and increased cognitive load as drivers must visually confirm their actions more frequently (Duolikun et al., 2023; Ferris et al., 2016; Wang et al., 2024). To address this issue, researchers propose integrating tactile feedback technology into touchscreens. For example, embedding deformable buttons or vibration cues into the touchscreen surface can provide realistic tactile feedback during operation, reducing visual dependency (Harrison and Hudson, 2009). Furthermore, Sharma et al. suggest integrating augmented reality (AR) technology into in-vehicle touchscreens to overlay navigation and other data directly onto the interface, significantly enhancing information accessibility and intuitiveness (Sharma et al., 2024). This integration aims to improve drivers' situational awareness, interaction quality, and overall driving experience. However, AR interface design should prioritize clarity and relevance to ensure that the displayed information does not overwhelm or distract drivers. This includes careful consideration of visual elements, such as boundary shapes and symbols for object detection and navigation (Merenda et al., 2018).
Gesture based interaction
Gesture-based interaction aims to achieve more intuitive, natural, and direct interactions while reducing visual distractions and enhancing safety. By leveraging machine vision or sensor technologies to capture drivers' hand movements, gesture-based systems facilitate touch-free interaction, effectively minimizing physical contact with the interface. Various technologies have been explored to achieve these goals. Lee et al. introduced a steering wheel finger-extension gesture interface combined with a heads-up display, increasing emergency response speed by 20% (Lee and Yoon, 2020). This system allows users to control audio and climate functions while keeping both hands on the steering wheel. Another interface method proposed by Lee et al. integrates gesture control with heads-up displays, projecting frequently used audio and climate controls from the central console onto a heads-up display menu (Lee et al., 2015). Drivers can operate these controls using specific hand gestures while maintaining their grip on the steering wheel. This approach effectively addresses operational failures or accidental triggers when drivers' hands are slippery or gloved. D'Eusanio et al. (2020) developed a natural user interface (NUI) based on dynamic gestures captured using RGB, depth, and infrared sensors. This system is designed for challenging automotive environments, aiming to minimize driver distraction during operation.
For gesture recognition systems, accuracy and reliability remain significant technical challenges. Complex backgrounds, varying lighting conditions, and environmental interference can reduce sensor accuracy, resulting in misrecognition or non-responsiveness (Kareem Murad and H. Hassin Alasadi, 2024). Differentiating true user intent from accidental actions remains a critical obstacle (Li et al., 2024). The selection of sensors, particularly visual sensors, plays a pivotal role in system performance and design (Berman and Stern, 2012). To address these challenges, researchers are exploring adaptive models and personalized approaches to improve the accuracy and stability of gesture recognition (Li et al., 2024). For instance, Čegovnik and Sodnik (2016) proposed a prototype recognition system based on LeapMotion controllers for in-vehicle gesture interactions. Additionally, advancements in hand modeling, feature extraction, and machine learning algorithms continue to enhance the functionality of gesture recognition systems (Kareem Murad and H. Hassin Alasadi, 2024).
3.1.2 Conversational user interface
The development of conversational user interfaces (CUI) in HMI aims to deliver a safer, more intuitive, and connected driving experience. CUIs enable drivers to perform hands-free and voice-controlled interactions for navigation (Large et al., 2019), infotainment (Jakus et al., 2015), and diagnostics (Ruijten et al., 2018), thereby reducing cognitive load and enhancing safety. According to Large et al. (2019), CUI equipped with AI-powered conversational agents offering empathy and a sense of control during autonomous driving journeys are particularly effective in building trust and improving user experience. With advancements in AI and its deeper integration into human-vehicle interaction systems, CUIs are anticipated to become indispensable tools for bridging the gap between humans and autonomous driving systems, leveraging their human-centric adaptive designs (Bastola et al., 2024).
Speech based interaction
Speech interaction is the most widely adopted auditory interaction mode in automotive HMI systems. Rooted in natural language processing (NLP) technology, it facilitates seamless communication between drivers and vehicle systems via voice commands (Politis et al., 2018). Compared to visual interfaces, the primary advantage of speech interaction lies in eliminating the need for manual operations and visual attention, significantly reducing cognitive load during driving (Murali et al., 2022). For instance, drivers can directly adjust navigation routes, control multimedia playback, or make phone calls via voice commands, enhancing operational efficiency and driving safety (Mandujano-Granillo et al., 2024).
Driven by AI, conversational human-vehicle interactions have become increasingly natural and human-like. Ruijten et al. (2018) indicates that conversational interfaces mimicking human behavior can significantly boost trust and acceptance of autonomous vehicles. Users often prefer interfaces with confident and human-like characteristics, which enhance their overall journey experience and sense of control. Integrating generative AI tools to create empathetic user interfaces that understand and respond to human emotions further strengthens the interaction between users and vehicles. This approach aims to make autonomous driving more convenient and enjoyable by designing context-responsive systems that cater to users' emotional states (Choe et al., 2023).
However, speech interaction also has limitations. Studies highlight noise interference and speech recognition errors as major issues. For example, environmental noise in high-speed driving or busy urban streets can disrupt voice commands, leading to unresponsive or erroneous system responses (Sokol et al., 2017). Additionally, variations in drivers' linguistic habits, accents, and speech speeds may affect the system's comprehension capabilities (Jonsson and Dahlbäck, 2014). Stier et al. (2020) reveals differences in speech patterns and syntactic complexity between human-human and human-machine interactions under varying driving complexities. To enhance the efficiency and safety of in-car speech interactions, developing adaptive speech output systems that consider individual user needs, personality traits, and contextual demands is crucial for optimizing user experience. To address these challenges, researchers propose speech recognition models enhanced by deep learning technologies. These models incorporate context-aware functions and real-time semantic analysis, significantly improving the adaptability of speech systems (Guo et al., 2021; Tyagi and Szénási, 2024).
Future designs for speech interaction can further optimize user experience by integrating multimodal technologies, such as speech combined with touchscreens or gestures. For instance, speech systems can dynamically adjust interaction strategies based on real-time detection of drivers' intentions and operational needs, providing smarter and more personalized services (Farooq et al., 2019; Kaplan, 2009). By enhancing the precision, adaptability, and multilingual support of speech recognition, speech interaction is expected to become an increasingly efficient and safe core technology in future human-vehicle interaction systems (Fumelli et al., 2024).
3.2 RQ2 What are the key UX factors?
3.2.1 Extraction of user experience factors
To answer the second research question, we standardized the various terms related to UX used in the 23 studies to ensure consistency. The factors related to UX mentioned in each paper were obtained through extraction (see Table 2).
Interface |
Type
of interaction |
Author
(Year) |
UX
factors |
GUI |
Physical button |
Jung et al. (2021) |
Efficiency, Usability, Recognizability |
Tan et al. (2022) |
Usability, Efficiency, Learnability |
||
Zhong et al. (2022) |
Usability, Safety |
||
Huo et al. (2024) |
Responsiveness, Usability |
||
Yi et al. (2024) |
Usability, Safety |
||
Detjen et al. (2021) |
Responsiveness, Recognizability |
||
Touch screen |
Jung et al. (2021) |
Learnability, Usability, Entertainment, Safety |
|
Murali (2022) |
Responsiveness, Usability |
||
Huo et al. (2024) |
Usability, Responsiveness |
||
Nagy et al. (2023) |
Cognitive load, Efficiency |
||
Kim et al. (2014) |
Usability, Safety, Responsiveness |
||
Čegovnik
et al. (2020) |
Responsiveness, Cognitive load, Stimulation |
||
Zhang et al. (2023) |
Usability, Safety |
||
Farooq et al. (2019) |
Usability, Responsiveness, Safety, Efficiency |
||
Zhong et al. (2022) |
Usability, Safety |
||
Detjen et al. (2021) |
Safety |
||
Gesture |
Tan et al. (2022) |
Error rate, Usability |
|
Čegovnik
et al. (2020) |
Error rate, Usability, Cognitive load, Efficiency |
||
Zhang et al. (2023) |
Safety, Novelty |
||
Bilius and Vatavu
(2020) |
Cognitive load, Novelty, Usability, Efficiency |
||
Zhang et al. (2022) |
Error rate, Cognitive load, Safety |
||
Graichen and |
Trust,
Usability, Novelty, Stimulation |
||
Detjen et al. (2021) |
Efficiency |
||
CUI |
Speech |
Jakus et al. (2015) |
Efficiency, Novelty |
Tan et al. (2022) |
Responsiveness, Usability, Accuracy of recognition |
||
Ruijten et al. (2018) |
Naturalness, Anthropomorphism, Trust |
||
Deng et al. (2024) |
Accuracy of recognition, Trust |
||
Xie et al. (2024) |
Responsiveness, Usability |
||
Murali et al. (2022) |
Entertainment, Naturalness, Safety |
||
Banerjee et al. (2020) |
Accuracy of recognition, Safety |
||
Johnson (2021) |
Accuracy of recognition, Safety, Personalization, |
||
Ruijten et al. (2018) |
Anthropomorphism, Trust |
||
Detjen et al. (2021) |
Naturalness, Anthropomorphism, Accuracy of recognition |
3.2.2 Key user experience factors and analysis
After summarized the UX factors in the literature above, we further extracted the UX factors that are commonly mentioned in the interfaces of GUI and CUI, and we regarded two or more factors used in the literature as important and summarized them, (see Table 3) in which Usability appeared 17 times in GUI, Efficiency appeared 6 times, Learnability appeared 2 times, the Cognitive load appeared 5 times, Responsiveness appeared 7 times, and Safety appeared 11 times; in CUI Naturalness appeared 3 times, Efficiency appeared 2 times, Responsiveness appeared 2 times, Usability appeared 3 times, Accuracy of Recognition appears 5 times, Trust appears 3 times, Safety appears 3 times, and Anthropomorphism appears 3 times. The key point UX factors as follows:
Interface |
Key UX factors |
Consideration |
GUI |
Usability |
Driver can use and interact
with the system easily, effectively and conveniently |
Efficiency |
Ability to present critical information and enable task completion
quickly |
|
Learnability |
Driver can understand and master the visual system's
functionality, ensuring |
|
Cognitive load |
Mental effort required by
drivers to process and understand information |
|
Responsiveness |
Ability to provide immediate and accurate feedback to user inputs,
ensuring |
|
Safety |
Ability to minimize driver distraction and support safe driving
behaviors by |
|
CUI |
Naturalness |
Ability to enable intuitive communication that aligns with natural
speech |
Efficiency |
Quickly and accurately process user inputs and deliver the desired
outcomes, minimizing the driver's effort and time required for interaction |
|
Responsiveness |
Ability to provide timely, contextually relevant, and seamless
feedback to |
|
Usability |
The construction of voice commands needs to cover rich scenarios
to |
|
Accuracy of recognition |
Ability to correctly interpret and process user commands or speech
inputs without errors |
|
Trust |
The conversational system's reliability, performance, and ability
to provide accurate and relevant results during interaction |
|
Safety |
Interaction through dialog to support safe vehicle operation
ensures that |
|
Anthropomorphism |
The extent to which the system mimics human-like qualities like
the tone, |
Through a review of the literature, the key factors for GUI were identified as usability, efficiency, learnability, responsiveness, cognitive load, and safety. In the context of iHMI systems, usability is a critical factor for drivers' acceptance of in-vehicle technology (Stevens and Burnett, 2014). Most studies emphasize the usability of various technologies and functionalities in visual interfaces. Usability evaluations often rely on standardized tools such as the System Usability Scale (SUS), which provides a subjective and quantitative assessment of users' overall experience in terms of ease of use, learnability, and operational satisfaction with the interface (Brooke, 1996). Additionally, studies have demonstrated a direct relationship between improved usability and driving safety. In particular, a well-designed interface layout can significantly reduce the distraction caused by interacting with the interface, especially in complex driving environments (Li et al., 2017).
Efficiency is another critical factor in human-vehicle interaction. Its core goal is to ensure that drivers can complete tasks with minimal time and cognitive effort. This principle is reflected in the continuous iteration and optimization of interaction methods, evolving from physical buttons to touchscreens and gesture-based interactions, aiming to enhance system responsiveness and information transmission efficiency. For instance, the efficiency of touch-based interactions depends on the consistency of layout and logical arrangement of buttons in UI design, while gesture-based interactions focus on minimizing operational steps and reducing the risk of accidental triggers (Zhang et al., 2023). Fast and accurate interactions enable drivers to complete necessary tasks in the shortest possible time, reducing attention diversion and enhancing the overall driving experience.
As smart vehicle technologies advance, human-vehicle interaction functionalities are becoming increasingly complex, which imposes higher demands on the learnability of user interfaces. Learnability, defined as the ease with which users can master new interfaces and technological features, is regarded as a key factor in user acceptance of new systems (Noel et al., 2005). In multimodal interaction contexts, such as the integration of gestures and voice, interfaces must simplify operational logic and provide clear feedback to reduce the initial learning cost (Schmidt et al., 2010). Furthermore, incorporating step-by-step guidance and intelligent hints into interface design has been shown to significantly enhance learnability (Li and Sun, 2021).
With regard to cognitive load, it refers to minimizing the mental effort required by users to process information in iHMI visual interface design. Interfaces with high cognitive load can lead to driver distraction, delayed information processing, or operational errors, thereby increasing the risk of traffic accidents (Strayer, 2015). Constantine et al. emphasized in their study that in designing in-vehicle visual interfaces, prioritizing the simplification of interaction processes over complex workflows is essential for improving user recognition, interpretation, and task completion speed while reducing driver distraction and cognitive load (Constantine and Windl, 2009).
Responsiveness, defined as the system's timely feedback to user inputs, is another critical factor influencing driving experience and user satisfaction. Delays in system feedback can increase feelings of frustration, anger, and agitation while reducing satisfaction (Szameitat et al., 2009). To enhance safety and user experience, unobtrusive and function-specific feedback methods have been proposed to communicate system uncertainty and encourage appropriate use (Kunze et al., 2017). Therefore, timely and accurate feedback plays a vital role in maintaining user trust, ensuring safe driving performance, and optimizing the overall driving experience.
The increasing complexity of automotive user interfaces poses challenges to driver safety due to potential distractions and information overload (Kern and Schmidt, 2009). To address this core safety concern, researchers have proposed multimodal interaction technologies that support attention switching between the road and in-vehicle systems while minimizing visual distractions (Pfleging et al., 2012). Driver-based interface design spaces have been developed to analyze and compare different UI configurations, potentially improving interaction methods (Kern and Schmidt, 2009). Green (2008) introduced established standards and guidelines for evaluating driver interfaces, such as Society of Automotive Engineers (SAE) and International Organization for Standardization (ISO), to minimize distractions and information overload. The aforementioned usability factors are relevant across the three types of interaction methods. However, some usability factors are specific to a single interaction method. For instance, button recognizability pertains to physical buttons, entertainment value to touchscreen interaction, and novelty to gesture interaction. These usability factors are considered essential for evaluating the usability of each interaction method but are not classified as key factors for GUI as a whole.
The key UX factors obtained at CUI through literature collation include naturalness, efficiency, responsiveness, usability, accuracy of recognition, trust, safety, and anthropomorphism. Naturalness is a core factor for conversational voice interfaces, referring to the intuitive communication enabled by interfaces that align with natural language patterns and behaviors. This natural interaction lowers the barrier to using the system by eliminating the need for users to learn complex voice command formats. Studies have shown that improving natural language processing capabilities significantly enhances the user acceptance of CUI (M et al., 2023). Rosekind et al. highlighted that active participation in dialogue is more effective than passive listening in maintaining driver alertness (Rosekind et al., 1997). Hence, the design of CUIs should prioritize conversational naturalness over rigid command-based interactions.
Efficiency is a critical attribute of iHMI systems, referring to the system's ability to process voice inputs quickly and accurately while providing timely feedback. This minimizes the driver's workload and interaction time. In driving contexts, the recognition and processing of user commands by conversational voice systems directly impact user satisfaction and interaction frequency. Moreover, compared to traditional manual systems, voice systems have been proven to significantly reduce cognitive distraction and enhance driving safety (Carter and Graham, 2000).
Responsiveness is a critical factor for the success of CUI (Conversational User Interfaces), emphasizing the system's ability to provide timely and contextually relevant feedback in response to user queries or commands. High responsiveness not only enhances user experience but also reduces operational anxiety caused by waiting because delays may require participants to engage in interaction management (Danilava et al., 2013), thereby increasing risks and insecurity during the driving process. In conversational agent interactions, unnaturally long delays may be perceived as errors, while excessively fast responses may come across as rude (Funk et al., 2020). Therefore, CUI design must carefully balance engaging dialogue with effective, timely responses to improve usability and acceptability in automotive environments.
Usability reflects the comprehensiveness of CUI design, ensuring that voice commands can accommodate a wide range of scenarios and that the system delivers smooth interaction capabilities. Highly usable systems must not only be easy to learn and use but also support diverse functionalities, such as multilingual adaptation or voice operations in complex scenarios. With the widespread application of large language models (LLMs) in CUI, particularly in speech recognition and natural language understanding, studies indicate that improving the recognition of non-standard language inputs, adapting to noisy environments (Sokol et al., 2017), and providing personalized, context-aware interactions (Lin et al., 2018) can significantly enhance the usability of human-vehicle systems.
Accuracy of recognition is one of the fundamental performance metrics of voice interfaces, determining whether user speech inputs can be correctly identified and understood by the system. This encompasses not only the recognition of speech content but also the ability to interpret intonation, dialects, and speech in noisy environments. Notably, emotional content in speech provides additional information about the speaker's psychological state (Elkins and Derrick, 2013). For instance (Krajewski et al., 2008), demonstrated that fatigue states could be detected through acoustic features of speech, which greatly contributes to improving safety during driving.
Trust is a decisive factor in shaping how individuals interact with technology (Hoff and Bashir, 2015). For intelligent vehicle CUIs, the level of trust a driver has in the system is regarded as a critical determinant of efficiency and safety in highly automated driving (Ekman et al., 2018). Trust in CUIs is influenced by the system's anthropomorphism, perceived controllability, and intelligence (Ruijten et al., 2018). However, excessive trust in the system can lead to misuse of automation. For example, if drivers overly rely on voice feedback, they may fail to make correct judgments and timely responses in unexpected driving situations.
Safety is a core element of concern for CUI. Distraction during driving is one of the primary causes of traffic accidents, and the prevalence of traditional in-car technologies that rely on visual interfaces has raised concerns about driver distraction and safety (Rakotonirainy, 2003; Stevens, 2000). The most significant feature of CUI is its ability to reduce the need for visual and manual operation. By allowing drivers to keep their hands on the steering wheel and their eyes on the road, CUIs substantially decrease driver distraction and visual cognitive load, thereby enhancing driving safety (Huang and Huang, 2018).
Anthropomorphism is often defined as the degree to which human-like characteristics are attributed to non-human agents (Bartneck et al., 2009). Compared to graphical interfaces, anthropomorphic CUIs that mimic human behaviors and apply conversational principles have been proven to enhance trust, likability, and perceived intelligence (Ruijten et al., 2018). In a study by Waytz et al. (2014), the effects of anthropomorphism on trust and likability were examined, revealing that when a car was anthropomorphized (e.g., assigned a name, gender, and voice), people tended to like and trust it more. However, for CUI systems, achieving trust through human-like traits requires more than a superficial resemblance. It is more critical for automated systems to demonstrate human-level understanding, operation, and feedback behaviors during communication than to merely exhibit human-like appearances.
4.1 Discussion on research question 1
In addressing RQ1, the study found that the existing literature primarily categorizes GUI into three types: physical buttons, touch screens, and gesture, while CUI mainly focuses on speech-based interaction. As a traditional visual-driven interaction method, GUI excels in improving drivers' cognitive efficiency and operational accuracy due to its rich information and intuitive design. However, GUI also faces challenges such as high cognitive load and the potential to distract drivers in dynamic driving environments. In contrast, CUI enables hands-free operation through voice commands, significantly reducing drivers' cognitive load and visual distraction. However, its performance is still constrained by the accuracy of voice recognition technologies and interference from environmental noise. Gesture-based interaction, as a complement to GUI, provides a more natural and intuitive interaction method. Nonetheless, its accuracy and reliability are limited by the current state of sensor technology. In particular, the error rate of gesture recognition remains a challenge in complex in-vehicle environments and requires further improvement.
4.2 Discussion on research question 2
In addressing RQ2, the study identified key user experience (UX) factors for GUI and CUI. For GUI, the main UX factors include usability, efficiency, learnability, responsiveness, cognitive load, and safety. These factors highlight the need for interface design to balance ease of operation with information presentation to ensure driving safety and operational efficiency. For example, physical buttons have advantages in emergency operations due to their intuitiveness and low learning cost, while touch screens enhance user experience through high customizability and multitasking management. However, attention should be paid to reducing visual distraction to maintain driving safety. For CUI, the critical UX factors include naturalness, efficiency, responsiveness, usability, recognition accuracy, trust, safety, and anthropomorphism. Natural and anthropomorphic designs can enhance users' trust and acceptance of the system, while high efficiency and responsiveness ensure smooth and timely voice interactions. Recognition accuracy directly affects the user's operational experience and the reliability of the system, while safety is improved by reducing the need for visual and manual operations, thereby enhancing overall driving safety.
4.3 Limitations and future directions
Despite the comprehensive analysis of the existing literature through systematic review, this study has certain limitations. First, the literature screening process focused only on the primary interaction methods of GUI and CUI, potentially overlooking other interaction methods in human-vehicle interaction systems, such as eye-tracking interactions. Second, the research primarily concentrated on technical analyses, lacking empirical studies on actual user behavior and long-term usage experiences. Future research can incorporate both quantitative and qualitative methods to further validate and expand the findings of this study. Additionally, as artificial intelligence and big data technologies continue to evolve, exploring intelligent, personalized iHMI designs and the integration of multimodal interactions will be important directions for future research.
This study reviewed relevant literature in the iHMI field, analyzing the user experience (UX) factors of graphical user interfaces (GUI) and conversational user interfaces (CUI) in in-vehicle human-machine interfaces. It revealed the application status and critical influencing factors of GUI and CUI in modern smart cars. By gaining a deeper understanding of these UX factors, designers can more effectively optimize iHMI systems to enhance driving experience and safety. Future research should further explore the integrated application of multimodal interaction methods and the potential of intelligent technologies in iHMI to drive the continuous development of human-machine interaction technology in smart cars.
References
1. Bae, S.Y., Cha, M.C., Yoon, S.H. and Lee, S.C., Investigation of Touch Button Size and Touch Screen Position of IVIS in a Driving Context. Journal of the Ergonomics Society of Korea, 42(1), 39-55, 2023. doi.org/10.5143/JESK.2023.42.1.39
Google Scholar
PIDS App ServiceClick here!