Browsing with speech recognition
Posted on by Henny Swan in User experience
Tags: Assistive Technology
In our fifth and final post from our browsing with assistive technology series, we discuss browsing with speech recognition.
You can also explore browsing with a desktop screen reader, browsing with a mobile screen reader, browsing with a keyboard, and browsing with screen magnification.
Speech recognition software listens to human speech, transcribes it into text, and executes spoken commands that operate your computer or device. People commonly use it as an alternative to using a keyboard, mouse or touch gestures. Speech recognition provides access to the entire operating system and applications, including browsers and web content.
Who uses speech recognition
People with physical disabilities, dexterity limitations, cognition and learning disabilities use speech recognition software.
People who have paraplegia, quadriplegic, have one arm, or limited use of their arms due to osteoarthritis or arthritis may use speech recognition. Some people use speech recognition software combined with other assistive technologies such as switch devices or ergonomic keyboards.
People with temporary limitations may revert to speech recognition when needed. This includes people with conditions such as Repetitive Strain Injury (RSI) or Carpal Tunnel and people with a broken or fractured wrist or tendinitis.
Speech recognition is helpful for people with situational limitations that make using a keypad, mouse or touch gestures difficult or impossible. This includes using speech recognition on phones and voice assistants to look up information when cooking or driving a car.
Commonly used speech recognition software
Speech recognition software is built into all popular platforms, including Windows, macOS, iOS and Android. Speech recognition is also available through voice assistants such as Amazon Alexa, Apple's Siri, and Google Home.
Stand alone software can offer more robust support for browsing. Dragon is supported on Windows, and Dragon Anywhere is supported on Android and iOS. The Dragon family of products is probably the most popular available.
How speech recognition works
Speech recognition uses a combination of Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) to convert spoken words and sentences into text or actionable commands.
When using speech recognition, it can take time to train the software to recognise your voice accurately. It is also essential to have a good quality microphone that can clearly pick up what you say.
Navigating with speech recognition
As well as dictating text, filling out forms, and opening and closing applications, you can browse the web and completely control websites with voice commands.
Core navigation verbally mirrors how you navigate with a keyboard. For example, rather than selecting TAB on a keyboard, you say "Tab" to move focus to the next item and "Shift-Tab" to move to the previous item. To activate a link or button, say "Click" together with the text used in the link or button, such as "Click Home".
Things can get more complex depending on how well the content is designed and coded. If there are multiple links on the same page with the same link text, the software won't know which link to click. Instead, it will highlight all the links with the same link text and number them. You then select the link you want by saying the number.

Buttons must have visible labels so they can be activated. Labels in the source code must match their visual presentation, so it is clear which speech command will activate a control. If they don't have labels, you can say "MouseGrid", which overlays a grid on the page. Each box has a number. By saying a number in a box, the grid focuses on that part of the page. This is repeated until the button or link you want is focused.

While strategies such as numbering repeated links and MouseGrid are good, it slows down navigation and is not as good a user experience as it can be.
Transcript
[The TetraLogical logo whooshes into view on a white background. The logo flashes and stops with a sonar-like 'ping'. It then magnifies and fades out.]
[A dark purple background appears with the TetraLogical logo faintly overlaid]
Browsing with speech recognition
Speech recognition software listens to human speech, transcribes it into text, and executes spoken commands that operate your computer or device.
As well as dictating text, filling out forms, and opening and closing applications, you can browse the web and completely control websites with voice commands.
[The TetraLogical homepage appears with a horizontal list of links for main navigation at the top, a heading, and the body of the page content below]
Core navigation verbally mirrors how you navigate with a keyboard. For example, rather than using keys on a keyboard, you say "Tab" to move focus to the next item, "Shift Tab" to move to the previous item, and "Press Enter" to activate a control.
[A purple button with the text "Skip to main content" appears. As the user interacts with the page, the visible focus indicator moves too]
[User voice] tab, tab, tab four times, press shift tab, press, shift tab, press enter.
[On the final command, the "Services" page opens. The page then fades back to the homepage]
To activate a link or button, you can say "Click" together with the text used in the link or button. For example, "Click Services" to activate a link labeled "Services".
[User voice] click services
[The "Services" pages opens as before]
If you just say "Click link", the software will highlight and number all links in the current page. You then select the link you want by saying the number.
[User voice] click link
[A series of six green numbers appear dotted throughout the page. These are attached to each separate link, such as the logo and each individual menu option]
[User voice] choose 3
[The visible focus moves to the "Services" menu option, which has the number three above it. This then opens the "Services" page
The homepage appears again, this time with gridlines across the entirety of the page, marking out six distinct areas]
In situations where a control lacks a visible text label, or where the visible text doesn't match the actual accessible name of the control in the underlying markup, people using speech recognition can use alternative approaches such as MouseGrid, which overlays a grid on the page.
[The user moves the mouse cursor which changes the size and location of the grid. As the user hones in on the menu options, the grid keeps resizing to display as a smaller, more precise area]
Each box has a number. By saying a number in a box, the grid focuses on that part of the page.
This is repeated until the button or link you want is focused.
[The bottom of the TetraLogical homepage is displayed in front of a bright pink background]
In this recording, we're using MouseGrid to set focus to a graphical control that lacks visible text.
[User voice] MouseGrid
[Lines appear across the screen marking out nine areas of equal size on screen. Each one is numbered]
[User voice] seven
[A new grid appears in the area that was previously marked as seven. This is much smaller and now focuses on the bottom right of the screen.]
[User voice] six
[Again, a new grid appears in the area that was previously marked as six]
[User voice] six
[A very small grid is now displayed. The majority of the grid is over a button with an "email" icon displayed]
[User voice] click
These are some of the high level details about speech recognition, and common strategies that people browsing with speech recognition use.
[The screen fades to white and the TetraLogical logo appears again]
To find out more about accessibility visit tetralogical.com.
Summary
People use speech recognition as an alternative to using a keyboard, mouse or touch gestures. It is used by people with physical disabilities, dexterity limitations, cognition and learning disabilities.
Speech recognition software uses algorithms to identify spoken languages and follow verbal commands.
As well as dictating text, filling out forms, and opening and closing applications, you can browse the web and completely control websites with voice commands.
Speech recognition software depends on the quality of the design and code used to create the application or the web content. Repeated link text and missing labels for form inputs or buttons can make content harder to navigate.
Next steps
Read an inclusive approach to video production to learn more about how to produce accessible videos and how our embedded accessibility service can help you achieve sustainable accessibility.
Updated Thursday 2 March 2023.
We like to listen
Wherever you are in your accessibility journey, get in touch if you have a project or idea.