Mara G. da Silva
Masters of Human-Computer Interaction and Design
University of California, Irvine
August, 2022
Imagine. . . you are invited to a big event. The event is in a large open space. You see people walking all around, pockets of conversations here and there. If you want to talk to someone all you have to do is walk up to them to start a conversation, and walk away when you are done.
Now imagine. . . this is not an in-person event, it is a virtual event. Figure 1 illustrates what this event could look like. Here everyone is represented by an avatar. You can move your avatar in any direction in this large open space. When you look at your screen you can see all the avatars moving around, you can understand who is near you, and if someone is approaching you. Mimicking the real world, if you want to talk to someone you move your avatar near the other person’s avatar to start a conversation.
Now imagine. . . you are a blind person participating in this virtual event. How can you know who is near you? How do you know where to go if you want to talk to someone? How do you know if someone is approaching you to start a conversation? A person with sight can look at the screen and instantly understand what is happening since this information is visually available. However, blind users don’t know what is happening on their surroundings in this dynamic virtual space.
Participating in virtual events with avatars moving around is not part of our everyday experiences. At least not yet.
The COVID-19 pandemic accelerated the adoption of many technologies that otherwise might have taken years to get traction.
Nowadays most people are used to virtual meetings on Zoom, Microsoft Teams, and more. These platforms are constantly pushing novel ideas. The Metaverse, for example, is this idea of merging physical and digital worlds. Microsoft has been working on how to use the Metaverse to improve meetings in its business platform Teams (https://www.microsoft.com/en-us/mesh), as shown in figure 2. Slowly but surely these concepts will seep into our everyday experiences.
Participating in virtual events with avatars moving around is not part of our everyday experiences. At least not yet.
The COVID-19 pandemic accelerated the adoption of many technologies that otherwise might have taken years to get traction.
Nowadays most people are used to virtual meetings on Zoom, Microsoft Teams, and more. These platforms are constantly pushing novel ideas. The Metaverse, for example, is this idea of merging physical and digital worlds. Microsoft has been working on how to use the Metaverse to improve meetings in its business platform Teams (https://www.microsoft.com/en-us/mesh), as shown in figure 2. Slowly but surely these concepts will seep into our everyday experiences.
People with visual impairments frequently rely on screen reader software to interact with technology. On a computer, for example, they use the keyboard to move the focus of the cursor, and the screen reader says aloud the text where the cursor is. It reads text only, not visual content. This seems like such simple technology, but many websites and applications are not built following accessibility guidelines, so screen readers don’t work well with them. As a result people with vision impairments are essentially denied access to services, education, information, opportunities for career growth, socialization, and more.
Virtual spaces present additional accessibility challenges because much of the information is presented only visually. How will people with visual disabilities understand this virtual environment and get information from it? It is becoming clear that in the Metaverse the traditional screen reader interaction won’t be enough.
This new digital frontier, the Metaverse, is like the “Wild West”. Everyone is still trying to figure out the best way to move forward. There are no set rules, no standards yet. If it is expected that people with vision impairments will use this type of technology in their everyday jobs, how can they have the same chances as everyone else? In our rush to push boundaries there is a risk of leaving many people behind.
That is why it is important for the UX and HCI communities to be thinking about this now, while this “new world” is still being defined and taking shape.
The challenge might look intimidating, but we can starts with baby steps. There are two things we can do to begin:
(1) Simplify the problem. Instead of looking at the problem as a whole, can we choose one of its aspects and try to improve it?
(2) Don’t reinvent the wheel. Even if the Metaverse is an upcoming technology, many of the challenges to be faced are probably already found in other types of applications. There is significant research in accessibility from other areas of computing. You don’t need to start from scratch, instead look for solutions or ideas on what has already been done to solve a similar problem.
Lately I have been reflecting on accessibility problems that will arise with the adoption of the Metaverse. Applying my own advice, I simplified the problem and looked for existing accessibility solutions from other areas.
I’m simplifying the problem by looking at one aspect of it at a time. There are accessibility issues with navigation, interactions between users, and more. I chose to look only into “awareness of surroundings”.
An additional way of simplifying the problem is reducing the complexity of the application. There is a platform for virtual events called Wonder (https://wonder.me), which is shown in figure 3. The way Wonder works matches the description of the virtual event at the beginning of this article: avatars move around in a large area, and when two avatars are close to each other they start a video/audio chat. Wonder, however, is a 2D environment, not a 3D immersive environment, and the types of objects and interactions are limited. By looking at an application like Wonder I’m simplifying the scope.
With both simplifications in place it is easier to focus on the problem. How can blind users be aware of their surroundings in a dynamic 2D virtual space, where most of the information is visual?
I looked at accessibility research and practices in the areas of Human-Computer Interaction (HCI), Virtual Reality (VR), Augmented Reality (AR), and video games. These are some approaches to increase accessibility:
(1) Structures and descriptions
(2) Sensory substitution
(3) Helper feature
Structures and descriptions. This approach is about enhancing the environment. It is the same concept as alt-text on an image: the screen reader does not read visual content so we add alt-text to an image, and the screen reader reads it to the user. In a virtual scene we need to add information to objects that make users understand what the objects are and what is their meaning in the scene [3] [6].
Sensory substitution. This approach refers to the natural process of allowing one or more senses to replace another [2]. The most common approach is to use audio and haptics to replace vision. In video games, for example, audio and haptics are frequently used to enhance players’ sensory experience (for example, players see an explosion, hear the sound, and the game controller vibrates, all at the same time). For accessibility we can use sound and haptics as channels to deliver information to users. Sound can be used to help users understand their own location in relation to other objects in the environment using techniques such as spatial sound, echolocation, and passive sonar [1]. Another technique is to give information about objects in a scene by using variations in the sound (for example using low and high keys, and different types of sounds to give different meanings) [3] [4]. Exploring the sense of touch through haptic feedback can also be used. For example, we can give different meanings to variations of the location of the vibration, the frequency, and the intensity [5].
Helper feature. This approach means creating specific tools and commands to address difficulties that are specific to blind and low vision users [6] [1].
The problem to tackle is: How can blind users be aware of their surroundings in a dynamic 2D virtual space, where most of the information is visual?
In the real world a blind person might get cues from the environment around them. They can probably hear the voice of a friend and walk in that direction to meet their friend. They might hear the sound of footsteps and know that someone is approaching. Can we bring the same clues to the digital world?
By adding three features to the system we can improve the accessibility of the virtual space: scanning, “who is nearby?”, and footsteps.
The virtual event platform is a dynamic space so the system will “scan” the space around the user, always keeping track of what is happening in the vicinity of the user. Figure 4 illustrates the concept of constantly scanning a radius around the user.
At any time a blind user might want to know what is happening around them. We can provide a helper feature so the user can ask the system “who is nearby?”, as shown in figure 5. The system will say one by one the names of users within a radius.
Using spatial sound users will hear a name from the direction where the other avatar is in relation to them. If John is behind the user, they will hear “John” from behind. If Shaimaa is to the right of the user they will hear “Shaimaa” from the right.
We can also apply variations to the sound. If the other avatar is very close to them they will hear the name loud and clear. If the user is far away the volume will be low. This change in volume gives blind users additional information about the environment around them.
In the dynamic environment of the virtual event platform avatars are constantly moving around. The user might have asked the system “Who is nearby?” a minute ago, but things might have already changed.
In the virtual event platform someone can approach the blind user and start a conversation with them via video/audio chat. This can catch a blind user off guard, and it can be disconcerting when someone starts talking to them and they have no idea who is talking.
In the real world a blind person might hear footsteps approaching. This makes them aware that someone might start talking with them soon.
Mimicking the real world, I explored the use of footsteps sounds when the system detects that someone within a certain distance from the user is walking towards them, as figure 6 illustrates. Using spatial sound the footsteps are heard from the direction of the avatar that is approaching the blind user. The volume of the sound will provide clues of the distance of the avatar. Lower volume means that the avatar is further away, getting louder as they approach.
When implementing this feature it is important to keep in mind that this adds more information to the auditory channel. There needs to be caution that footstep sounds will be in the background to create awareness of surroundings, keeping in mind that they should not interfere with other sounds that the user might be hearing from the screen reader or from a conversation.
This is also a helper feature. The “who is nearby?” feature is activated by the user, since the user decides when to ask the system for this information. In contrast footsteps sounds are a passive feature. The user does not need to do any action to receive this information. The system will constantly inform the user when there is relevant information to be shared.
These simple features suggested above show that it is possible to find ways to improve accessibility in applications such as a virtual event space. The Metaverse will require more than simply trying to make applications work with a screen reader. We, the UX/HCI community, need to look for novel approaches and find new ways to make applications accessible. And the time to start is now.
[1] Aaron Gluck and Julian Brinkley. 2020. Implementing ‘The Enclosing Dark’: A VR Auditory Adventure. The Journal on Technology and Persons with Disabilities (2020), 149.
[2] Jack M Loomis, Roberta L Klatzky, and Nicholas A Giudice. 2018. Sensory Substitution of Vision: Importance of Perceptual and Cognitive Processing. In Assistive technology for blindness and low vision. CRC press, 179–210.
[3] Shachar Maidenbaum and Amir Amedi. 2015. Non-visual virtual interaction: Can Sensory Substitution generically increase the accessibility of Graphical virtual reality to the blind?. In 2015 3rd IEEE VR International Workshop on Virtual and Augmented Assistive Technology (VAAT). 15–17. https://doi.org/10.1109/VAAT.2015.7155404
[4] Shachar Maidenbaum, Daniel Robert Chebat, Shelly Levy-Tzedek, and Amir Amedi. 2014. Depth-to-audio sensory substitution for increasing the accessibility of virtual environments. In International Conference on Universal Access in Human-Computer Interaction. Springer, 398–406.
[5] Alexander Marquardt, Christina Trepkowski, Tom David Eibich, Jens Maiero, Ernst Kruijff, and Johannes Schöning. 2020. Comparing Non-Visual and Visual Guidance Methods for Narrow Field of View Augmented Reality Displays. IEEE
Transactions on Visualization and Computer Graphics 26, 12 (2020), 3389–3401. https://doi.org/10.1109/TVCG.2020.3023605
[6] Shari Trewin, Vicki L. Hanson, Mark R. Laff, and Anna Cavender. 2008. PowerUp: An Accessible Virtual World. In Proceedings of the 10th International ACM SIGACCESS Conference on Computers and Accessibility (Halifax, Nova Scotia, Canada) (Assets ’08). Association for Computing Machinery, New York, NY, USA, 177–184. https://doi.org/10.1145/1414471.1414504
