Browsing with speech recognition

Posted on Monday, 15 November 2021 by Henny Swan in User experience

In our fifth and final post from our browsing with assistive technology series, we discuss browsing with speech recognition.

You can also explore browsing with a desktop screen reader, browsing with a mobile screen reader, browsing with a keyboard, and browsing with screen magnification.

Speech recognition software listens to human speech, transcribes it into text, and executes spoken commands that operate your computer or device. People commonly use it as an alternative to using a keyboard, mouse or touch gestures. Speech recognition provides access to the entire operating system and applications, including browsers and web content.

Who uses speech recognition

People with physical disabilities, dexterity limitations, cognition and learning disabilities use speech recognition software.

People who have paraplegia, quadriplegic, have one arm, or limited use of their arms due to osteoarthritis or arthritis may use speech recognition. Some people use speech recognition software combined with other assistive technologies such as switch devices or ergonomic keyboards.

People with temporary limitations may revert to speech recognition when needed. This includes people with conditions such as Repetitive Strain Injury (RSI) or Carpal Tunnel and people with a broken or fractured wrist or tendinitis.

Speech recognition is helpful for people with situational limitations that make using a keypad, mouse or touch gestures difficult or impossible. This includes using speech recognition on phones and voice assistants to look up information when cooking or driving a car.

A black and white photo of Josh, a younger white man with short brown hair. He is sitting in his wheelchair as he laughs at the camera holding an ordnance survey map — Josh, sportsman and a wheelchair user with Spinal Muscular Atrophy, "I use my laptop, sensors, Alexa, and phone to train for the Paralympics and find accessible outdoor routes for hikes".

Commonly used speech recognition software

Speech recognition software is built into all popular platforms, including Windows, macOS, iOS and Android. Speech recognition is also available through voice assistants such as Amazon Alexa, Apple's Siri, and Google Home.

Stand alone software can offer more robust support for browsing. Dragon is supported on Windows, and Dragon Anywhere is supported on Android and iOS. The Dragon family of products is probably the most popular available.

How speech recognition works

Speech recognition uses a combination of Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) to convert spoken words and sentences into text or actionable commands.

When using speech recognition, it can take time to train the software to recognise your voice accurately. It is also essential to have a good quality microphone that can clearly pick up what you say.

Navigating with speech recognition

As well as dictating text, filling out forms, and opening and closing applications, you can browse the web and completely control websites with voice commands.

Core navigation verbally mirrors how you navigate with a keyboard. For example, rather than selecting TAB on a keyboard, you say "Tab" to move focus to the next item and "Shift-Tab" to move to the previous item. To activate a link or button, say "Click" together with the text used in the link or button, such as "Click Home".

Things can get more complex depending on how well the content is designed and coded. If there are multiple links on the same page with the same link text, the software won't know which link to click. Instead, it will highlight all the links with the same link text and number them. You then select the link you want by saying the number.

Buttons must have visible labels so they can be activated. Labels in the source code must match their visual presentation, so it is clear which speech command will activate a control. If they don't have labels, you can say "MouseGrid", which overlays a grid on the page. Each box has a number. By saying a number in a box, the grid focuses on that part of the page. This is repeated until the button or link you want is focused.

While strategies such as numbering repeated links and MouseGrid are good, it slows down navigation and is not as good a user experience as it can be.

British Sign Language (BSL) version of "Browsing with speech recognition"

Summary

People use speech recognition as an alternative to using a keyboard, mouse or touch gestures. It is used by people with physical disabilities, dexterity limitations, cognition and learning disabilities.

Speech recognition software uses algorithms to identify spoken languages and follow verbal commands.

As well as dictating text, filling out forms, and opening and closing applications, you can browse the web and completely control websites with voice commands.

Speech recognition software depends on the quality of the design and code used to create the application or the web content. Repeated link text and missing labels for form inputs or buttons can make content harder to navigate.

Next steps

Read an inclusive approach to video production to learn more about how to produce accessible videos and how our embedded accessibility service can help you achieve sustainable accessibility.

Updated Thursday 2 March 2023.