Community Wishlist/Wishes/Software for turning articles into spoken Wikipedia audios using T2S AI voice

Software for turning articles into spoken Wikipedia audios using T2S AI voice Submitted

Edit wish Discuss this wish

Description

I'm proposing a Web UI tool for creating spoken Wikipedia audios using modern AI voice (text-2-speech that sounds nearly natural).

Arch Mission Foundation
Elephant communication
Heraclitus
2022 in science#August (here still some issues with abbreviations and brackets)

Currently, one has to do all the 11 steps of c:Help:Spoken Wikipedia using AI#SoniTranslate so that it for example does not narrate "[1]" for refs or image captions and tables for the time being or to add the category spoken audios depending on the language and so on.

Lots of people listen to podcasts or audiobooks. Wikipedia articles are often really interesting but not that many people read a lot in general or on screens. Turning articles into spoken Wikipedia audios would be very impactful and of interest to many readers / listeners. Of course it would also be useful for blind people and people with vision/reading problems. The current method most of these audios are created by means that the vast majority of articles even in English Wikipedia does not have a spoken WP audio and if it does, it's outdated by years or a decade. The quality of text-2-speech has improved so much recently that a separate term 'AI-generated voice' seems fitting. Note Build the necessary technology to make free knowledge content accessible in various formats. in the strategy here.

I originally meant to only propose a Web UI for this so that one doesn't have to use things like Firefox addons that alter the Wikipedia article CSS to turn the article into a narration view (similar to print view) and could create these audios more quickly. Now I'd also like to propose that at some point the spoken audio files are created automatically and that the tool is largely used to improve these, e.g. to update an audio if there were recent major changes to its article or if there were misnarrations in the audio that needed to be fixed. The tool would be created first so people can improve the audio creation process over time until audios created with it are generally high-quality and don't have issues. Once this conversion process is in good shape, that part of the tool could be used to create the audios at scale.

Also needed is a proper audio player that for example has the feature to skip back by 5 or 10 seconds – that is a separate proposal.

Related wish: A tool for auto-transcription to speed up the creation of TimedTexts subtitles for videos on Commons

Assigned focus area

Unassigned.

Type of wish

Feature request

Wikimedia Commons, Wikipedia

Affected users

Wikipedia content consumers, Wikipedia contributors

Other details

  • Created: 13:45, 16 October 2024 (UTC)
  • Last updated: 13:45, 16 October 2024 (UTC)
  • Author: Prototyperspective (talk)