A blue raccoon with the text Screen Reader under it, representing the project logo.

godot-screenreader

[Source Code] - [Download] - [Documentation]

Hey there, everyone!

I’ve been working on the start to a really cool project I started in the last few weeks that I wanted to show off. I know, I promised my next video would be more theoretically oriented but this is a really cool little thing. My next video will be on Pokémon and Gender, which I’m positive will not lead to any sort of negative feedback or upset comments. In the meantime, I want to show you what I built!

Screenreader footage of the highlighter navigating the interface.

Huh?

This is a tool called a “Screenreader” for the Godot game engine. Godot is a popular open source option for making video games, but Godot is infamous for having poor implementation of both control elements and accessibility. Let me start by explaining what a screenreader is. It’s a tool that reads off visual content on the screen to users and explains what it is, and organizes it in a way that makes sense to the user. This way, users who are low vision or have other kinds of disabilities that might make it hard for them to interact with a video game user interface have access to content they’d otherwise be unable to interact with.

I based my work off of a tool made by LightsOutGaming for Godot 3.x and an updated version by rodolpheh. I wanted to take his interface and build a much more customizable and developer and user-friendly screenreader that takes advantage of a few tricks I learned in my travels with Godot. A strong motivation for this project is to improve the structural framework that LightsOutGaming and rodolpheh developed, while also providing an easier to use interface for developers, since with the previous implementations, adding it to your game could take quite a bit of work.

Now, what’s cool is that I discovered while I was working on this project, there have actually been active developments on the Godot repository for adding screenreader accessibility, so my project will act as a temporary bridge until this development enters the main branch and is compiled into a future version of the game engine. I’ve reviewed the development of it, and unlike previous accessibility measures within the project, this seems like the real deal and an excellent implementation. I really hope this works out when it rolls out since accessibility in game engine IDEs has historically been a problem, and the transition towards these kinds of IDEs in game development has had a significant impact on indie games produced by blind people. However, it may take a while for it to enter the main branch, so my work is still a decent patch for now.

Why Develop a Screenreader From Scratch?

Note: This section was rewritten after NVDA support was added. Older revisions can be viewed in the website’s repository.

Godot screenreader support is currently being worked on in a development branch of Godot, and may exist in a future release, but currently, there is no way to support screenreader usage out of the box in Godot. Godot 4.x does support the DisplayServer.tts methods, but these do not detect or connect to a screenreader. Additionally, Godot Controls are inadequate for keyboard navigation, let alone screenreader navigation. Out of the box, Godot has no way for a screenreader to interact with it.

Historical implementations of Godot screenreader for Godot 3.x did integrate NVDA support, however, the build process for this was poorly documented and I could not figure out myself how it was implemented, and Godot 4.x changed its model for extensions, requiring a recompile of DLLs and other libraries. Thus, when creating the prototype, I opted to use the DisplayServer.tts functions as a baseline, and focus more on Control navigation and providing an interface for developers to build on for UI elements, following rodolpheh’s implementation. Fortunately, after initial release, I was forwarded to the work of developer NightBlade9, who had already implemented an NVDA interface I could connect within Godot 4.2+, and so I was able to then easily integrate support for NVDA into my project - as it turned out, we had developed two different parts of the same process. Later work includes analyzing another developer’s work, EricBomb, for further screenreader integration, and possibly looking into compatibility with Linux screenreader technology, but I am not well educated in Orca, the most common linux screenreader.

One thing I wanted to avoid in my design was forcing a web-like design in video games. In comparison to writing a game from scratch, web design carries extensive bloat in both design and implementation from adapting to market pressures for 30 years, and extending its design principles to other environments typically either results in frustrating, poorly designed environments such as Android, or walled gardens that restrict consumer customizability with their hardware and software, and both of which do not actually deliver on the solution of integrated accessibility. And despite all of these automated hooks and whistles, for complex custom tasks such as video games, customization is still necessary, so a developer needs to still engage with the process of developing accessible games. As a result, developers can often be hostile to this approach to accessibility and design. Instead, I opted for a design that allows for easy integration of expected behavior for solved problems like menus and tabs, while allowing for open ended design for more unique game design challenges.

Additionally, my design philosophy stresses that responsibility for control over the application’s behavior should be controlled by the application, not by external applications or the operating system, because of potential unpredictable edge cases and ensuring encapsulated software design. Historically, software design will attempt to escape the confines of hierarchical design expectations - a process that has motivated many developments and updates to web standards in general to adapt to the desires of developers - and so control should be managed as much as possible within the software’s space, as permitted by the operating system. Godot is a prime example of this process, whose justified technical decisions to make custom implementations of UI Control elements lead to gaps in accessibility. It is impossible to predict all possible ways these needs may interfere with using these standard implementations which leads to gaps such as those observed with Godot, and it is unreasonable for all software developers to conform all software to such expectations.

Instead, accessibility having multiple well documented and open source implementations allows for self-contained applications to be developed for properly, even sometimes using these implementations for themselves, with consideration for accessibility from the beginning. Additionally, while researching NVDA’s source code and implementation of SSML, a markup language for speech synthesis, it became apparent to me how important it was to encapsulate this functionality and allow for a less hierarchical design. NVDA manages its own SSML strings to support this feature because it allows users to control whether voice modifiers are applied, which prevents developers from enforcing inaccessible speech settings onto users. This kind of encapsulated design allows for screenreader developers to have full control over the application’s behavior without interference from Microsoft’s design, which has grown concerningly invasive in Windows 11, and ensuring the screenreader’s core behavioral pipeline remains in the open source space and fully controlled by the user. In fact, Microsoft’s implementation of SSML seems to be more interested in AI speech generation purposes rather than JIT TTS used for accessibility. Additionally, the user interface for screenreader settings are not directed by corporate pressures not driven by accessibility. This also allows for the development of intermediate libraries such as Tolk or AccessKit that can interface with multiple screenreaders and allow for both unified access for each while supporting custom features.

All in all, I think this experience highlights the value in open source accessibility projects and gradual integration of accessibility into video games.

Implementation

Now with all of that out of the way, we can discuss the design of the screenreader itself.

This actually isn’t the first attempt at a screenreader in Godot. LightsOutGames, an audiogame developer, developed his first screenreader for Godot 3.x around 2018. This screenreader was a really rough attempt to make games and the Editor more accessible since the Godot UI engine was completely inaccessible without a plugin. This plugin, like my own screenreader, doesn’t connect directly to the screenreader for organizing Controls, but reads strings through the screenreader instead, while navigating through Godot. Unlike my screenreader though, it does not attempt to override the function of the UI, and instead uses the in-game focus system to navigate. It is incomplete and only offers limited baseline compatibility. It was constructed both into a usable TTS kit (godot-tts) and an accessibility plugin (godot-accessibility). Later, github user rodolpheh created a fork that is compatible with Godot 4.x, but removes the direct screenreader connections and simply uses Godot’s embedded TTS functions. This seemed fine at first, but upon closer inspection, it doesn’t directly read strings to the screenreader, only to the OS TTS service, so the solution is not complete. Unfortunately, LightsOutGaming’s original plugin can’t be imported, so an extension for Tolk or similar library will have to be implemented.

What’s important to understand here is that a screenreader doesn’t just read text on the screen. It also makes selecting and navigating UI elements on a screen completely transformed into a new, more controlled format that allows for specialized players to navigate it. This way, players know exactly what element is selected and exactly how to navigate it, even without vision - but it could be useful for all sorts of situations potentially. This basically means that when interacting with the keyboard, Controls will be focused individually - this way, when I interact with inputs like the select or increment/decrement keys, it stays within the control and I know what Control I’m changing. I can use the up and down keys by default to navigate between controls. What’s nice is that it only navigates between controls that are currently visible, so if I use a control that changes the visibility of other controls, such as by scrolling through tabs, the nodes I can access changes to match what is available to sighted players.

So how does someone implement that in Godot anyways?

First, its important to build what Godot doesn’t - a sane navigation tree for Control nodes. Not all interfaces actually work well with navigation trees, so its important to allow for some flexibility here. So instead of trying to predict exactly all cases when and where developers will need the screenreader, I instead give them a one-line function that takes the root Control node as its argument. The idea is that by passing this root Control node, they only process nodes that are relevant to typical screen navigation. Using this function will recursively dig through the nodes and construct a tree-like structure that allows the nodes to be organized into various topologies that can later be navigated by the screenreader.

To isolate the screenreader functionality both as much as possible from the developer - thus reducing how many things the developer depends on to make it work - I had to make individual, separated functions to manage everything related to the screenreader system. They still work within the game’s engine, but the way the scripts are designed makes it so that their behavior is self contained. All the sound effects, themes and other assets used are all contained completely within the screenreader addon, and the developer doesn’t need to use any extra assets to make some baseline functionality work with the screenreader - however, they can use scripts attached or extended to their nodes to extend functionality and add features like alt text, which are not native to Godot’s UI system.

Then, it is important to disable the focus system completely. You might think there is an easy way to do this, like some kind of game wide setting to disable this system. Oh, sweet summer child - insofar as I understand, there is no way to disable this system without simply setting the focus_mode variable of every Control to either FOCUS_MODE_NONE, or if you want to still have mouse interactivity, FOCUS_MODE_MOUSE. This is incredibly annoying, because this means I need to manage the focus node of every single node that is processed by the screenreader’s interface. Furthermore, it also means all signals, theme changes and other features normally available with Controls are disabled, so they must be manually controlled. Signals are supported, but honestly I said “fuck it” for the first implementation of the themes because it would require me to figure out how to redraw every single fucking control within the screenreader logic and I would simply rather not do that right now. Did I mention that Godot didn’t have the most well designed UI system?

Anyways, after you finally tame the beast that is input control, we can finally start designing the control interface. For now, I’ve only tested it with keyboard, but the way it’s designed, I should only need to add a few extra features to support controllers as well. To avoid any potential conflicts with any sneaky embedded behavior, I separated the controls for the screenreader from the standard ui Control inputs. This means I made like 10 extra controls to manage the input actions, all starting with DOM_, making it easy to distinguish. To add to the barrels of fun I’m having here, because of the way that the Godot asset store works, it doesn’t seem to be possible to submit assets that require unique key bindings set in the Project Settings tab. So as a result, I also have to add the input actions manually too! As a result though, this requires less setup to get the screenreader running for developers, so I guess it’s a win.

Finally, I can start writing the inputs. I actually separated screenreader input into two groups - first, the Control input runs, and if there was no input, the screenreader navigation input runs. This way, whenever input is received, it first tries to navigate the Control itself, and if nothing happens, it will navigate the interface, having consistent behavior instead of this linked-list crap. This logically organizes my code in a way that allows me to easily organize the code for managing individual different Controls and their inputs. Additionally, its important that I write code to tell the screenreader how to read off every Control when its selected. Borrowing from LightsOutGames’ design, I insert string “tokens” into a list that is read off as a combined string later. This makes it easy to organize what order information is presented to the player, as well as making things like translations easier to manage. Its important to organize tokens so the most important information is presented first, so blind players aren’t waiting a million years to read their stat screens.

And don’t forget to add extra sound effects! They’re not always necessary but they help distinguish the UI even more, especially associating certain sounds with certain actions.

“But” - I hear the game developers cry - “what if I want to develop my OWN interfaces? You can’t possibly predict the accessibility behavior of ALL game Control interfaces!” And it’s true! It’s impossible to account for every possibility. But it is possible to account for the possibility that you can’t account for every possibility… or something. By that, I mean that my screenreader also supports a means of extending functionality to Controls through scripts, and through these scripts it is easy to not only override the function of any Control, but also to create your own Controls and manage their navigation schemes. The interface for building these is just as simple as the interface for building regular input code in your game. So really, instead of restricting developer creativity, it opens the doors for developers to easily experiment with a new kind of way to interface their games to players.

Another important part of accessibility is having the ability to customize the experience. Even though this screenreader is really bare bones in its current state, it still has some useful options that players can set to customize their experience. For example, sound effects can be disabled for the screenreader, and a mode called “verbose mode” can be toggled on or off - adjusting how much information is displayed to the reader while navigating an interface, which could be useful if a player is adept in using an interface and only needs to know critical information. A high contrast theme can be set, and subtitles and TTS Audio description can be enabled or disabled.

Oh yeah, I forgot to mention - it also has a high contrast theme changer and support for adding subtitles to videos. But I’m sleepy and don’t want to write any more. I literally slept all week like goddamn sleeping beauty. Just read the documentation, okay? I wrote like 6,000 words to make sure that things are explained clearly and that developers don’t get overwhelmed by this shit. Part of accessibility integration is also clear communication with developers.

Anyways, as a result of this, I had to write another system that may not even seem related to the screenreader - a menu manager that is designed to only be used specifically with the screenreader. It is used to display… well, its own menus. Things like options or extra tools that the user can use. But it can be used to display all sorts of controls I may want to use with the screenreader. This way, developers meddling around with their own menus won’t interfere with the core functionality of these menus and tutorials. That’s right, build your own menu manager! Either way, I used this design to allow for some additional extended functionality with the screenreader, such as the ability to find all the buttons in a user interface.

It took around 2-3 weeks to develop the whole thing, and its around 5,000 lines of code in the gdscript files. Not too shabby! Unfortunately by the end my dreaded sleep spells started to take over, making it hard to wrap things up for the initial version, but thankfully I was able to muster enough energy to complete the final lap.

Summary

This is an early release of this software, and it was a lot of fun to sprint out this first edition of this project, but it can hardly be called complete. It currently doesn’t have much in the way of mouse and controller support, and it needs to be integrated further with the screenreader capabilities of the OS, such as through Tolk. But I think it’s definitely a step in the right direction. The dirty work of organizing UI Controls into a sane manner is now complete. My hope is that it can also be embedded as an Editor plugin like the previous screenreaders, so that it can greatly improve accessibility for blind players to use the Godot game engine themselves, and perhaps take development control over screenreader development in the future. LightsOutGaming’s screenreader plugin does exist, and there has been an upgrade for version 4.x, so this will allow for an upgrade to the currently existing solutions.

All in all, I don’t really think that Godot is a long term solution for audio gaming or accessible gaming, because frankly the way it interacts with user interface in general is trash, both for players and for developers, but creating a project like this has a lot of positive benefit to accessibility as a design philosophy through providing a free and open source model of how to approach this kind of problem in the future, and making those solutions more accessible for both players and developers alike. Hopefully this can encourage further development of open source accessibility solutions in the indie game development sphere over time, and continue to migrate the problem of accessibility to an open space. I really encourage anyone watching, especially those with specialized technical knowledge on cross-platform implementation of technical screenreader compatibility, to contribute towards the repository, such as optimizations, improved designs and working on desired features.

Credits:

posted on 12:39:05 PM, 12/27/24 filed under: game tech [top] [newer] | [older]