Talk:Community Wishlist Survey 2022/Generate Audio for IPA

The following Wikimedia Foundation staff monitor this page:

In order to notify them, please link their username when posting a message.
This note was updated on 11/2023

Project Announcement and Feedback edit

Contributors who engaged with this Wish's proposal

Rollo Rosewood Akathelollipopman Eptalon Noé Xavier Dengra Akathelollipopman Noé Pigsonthewing Ainali Modest Genius Pigsonthewing 1234qwer1234qwer4 Nachtbold Xaosflux Femkemilene Wskent Bischnu Akathelollipopman Vis M Yodin Matě MrMeAndMrMe UV Daud I.F. Argana Huji Sdkb Ottawajin Lectrician1 Tmv Tranhaian130809 Celerias Meiræ Spiros71 NguoiDungKhongDinhDanh Javiermes Aca Dexxor Ed6767 Lollipoplollipoplollipop Omnilaika02 ToBeFree


Thank you for all of your feedback and for engaging with the original proposal for this wish. I wanted to make you aware that we have begun our work on this wish and, if your capacity allows, we would love any input you have on our Open Questions as well as our initial investigations into the engines.

Here's a corpus of IPA audio we have tested. Please let us know if you have any words you would like to test in this testing corpus. We will work on adding those words to our corpus!
Here's technical investigation of the IPA options and the languages supported by each option.


Thanks again for engaging with this impactful wish and for participating on the wishlist.
Best, NRodriguez (WMF) (talk) 18:01, 20 May 2022 (UTC)Reply[reply]

Contributors who engaged with this Wish's proposal

Nw520 Pelagic Wostr Gusfriend Ali Imran Awan TheInternetGnome Minorax Man77 NightWolf1223 HynekJanac L235 Libcub Teratix Penalba2000 JAn Dudí Lrkrol Sadads Bencemac Mbkv717 Stwalkerster Dave Braunschweig Trey314159 Labdajiwa Thingofme Pppery Hià Paradise Chronicle Serg! Camillu87 Geertivp Amorymeltzer Aimwin66166 Rotavdrag Paucabot WikiAviator Daniel Case Wutsje Ninepointturn Bilorv Pi.1415926535 DarwIn Feoffer Tomastvivlaren Kpjas SD0001 Lambsbridge Paul2520 Waldyrious Bestoernesto Michael Barera Vulphere Ericliu1912 Emaus KnowledgeablePersona Beta16 Bodhisattwa Pbsouthwood DaxServer Cybularny Quiddity Sunpriat Gaurav Jl sg Evrifaessa Valerio Bozzolan Brainulator9

NRodriguez (WMF) (talk) 18:08, 20 May 2022 (UTC)Reply[reply]

Open Questions edit

Can you help us build out the corpus of IPA words we will use to test the different libraries? edit

  • Has any tonal languages been included? I don’t think I see Swedish or any Chinese language, for example, but maybe there are some tonal languages in the corpus that I don’t recognize. Also, is the current corpus including unusual consonants or vowels? I have tested eSpeak myself and know that it cannot handle Cantonese (it cannot pronounce the syllabic m; I tried to figure out how to fix it but there’s really no documentation). Al12si (talk) 14:44, 12 November 2022 (UTC)Reply[reply]

Do you know of any open source libraries that we should consider while we investigate our options? edit

Do you see any risks to introducing the video files inside the reader experiences? edit

  • "Video"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:21, 27 May 2022 (UTC)Reply[reply]
    I believe this is regarding the software extension used to play media files. There's a specific task for making the player display in a desirable way, at phab:T122901 (versus the full audio-player as currently used at d:shibboleth, or the icon+"listen" links as used at w:Shibboleth).
    The only risk I see is making sure the design is good: I.e. everyone (incl. screenreaders?) can access the audio-clip without leaving the page, but also still have access to the file/license info if desired. (@TheDJ:FYI) HTH. Quiddity (talk) 17:27, 27 May 2022 (UTC)Reply[reply]
  • I think the main issue with this feature is that it could display a false standard accent, making English projects sound more USA-centred, French projects sound more France-centered, Spanish projects sound more Madrid-centered and so on. A scripted sound can be prototypical, with approximate sounds for each consonants and vowels, an audio can't, audio fixes one version, with subtile traits such as length, highness, openness of vowels, pitch and others. There is no generic or neutral pronunciation. One way to deal with this issue may be to display several audio for each IPA, with regional distinctions. In addition with a preset for users to have in first their own local use, it may be interesting and less oppressive. Anyway, I am interested by this feature and I really hope you will make your UX tests public -- Noé (talk) 15:55, 7 November 2022 (UTC)Reply[reply]

Let us know any other thoughts you may have on the initial problem statement... edit

The Wikivoyages have phrasebooks. They don't use IPA – see voy:en:Wikivoyage:Phrasebook article template#Pronunciation guide for the English version; the other languages are similar – but it might be a useful source of words, and it's possible that getting IPA-based audio would encourage people to add IPA there. In the past, we've talked about both the value of IPA to some readers and need for audio (specifically, being able to hear the IPA without loading another page or covering up the text you're reading). Whatamidoing (WMF) (talk) 18:15, 30 May 2022 (UTC)Reply[reply]

Google Cloud dependency? edit

Is it the case that this feature is dependent on closed-source software in the Google Cloud, or is it independent and self-hosted? HLHJ (talk) 16:56, 15 October 2022 (UTC)Reply[reply]

Currently, yes. The open source solutions we found only supported a handful of languages, and didn't sound remotely as accurate as Google's TTS service. Rest assured this all done through the backend, and even then through a proxy, so no user data ever gets to Google. Longer-term we hope to switch back to open source once language support and quality is good enough. That is being tracked at phab:T317274. MusikAnimal (WMF) (talk) 03:13, 17 November 2022 (UTC)Reply[reply]

Schedule edit

@MusikAnimal (WMF) and @Whatamidoing (WMF) and @NRodriguez (WMF), can you please fill in/update Community Wishlist Survey 2022/Generate Audio for IPA#Release timeline ? —TheDJ (talkcontribs) 12:35, 23 November 2022 (UTC)Reply[reply]

@TheDJ: I've made a start and will do some poking   ~TheresNoTime-WMF (talk) 20:43, 23 November 2022 (UTC)Reply[reply]

Am I missing something? edit

Am I misunderstanding something? I have just tried this in my af.wiktionary sandbox and the markup:

<phonos ipa="ˈbɜːrmɪŋəm" text="test" lang="en-GB" />

is pronounced as "test"

Both of these alternatives:

<phonos ipa="'bɜːrmɪŋəm" text="" lang="en-GB" />
<phonos ipa="'bɜːrmɪŋəm" lang="en-GB" />

generate an error: "The generated audio appears to be empty. The given IPA may be invalid, or is not supported by the engine. Using the 'text' parameter may help.".

How can a user ensure that the IPA is parsed and pronounced? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:04, 2 February 2023 (UTC)Reply[reply]

@Pigsonthewing: the "text" parameter is not a label, but is the written word in the language that is specified in the lang= paramter. See also mw:Help:Extension:Phonos. What is the word you are trying to produce, I can try to show you an example. — xaosflux Talk 19:53, 2 February 2023 (UTC)Reply[reply]
@Pigsonthewing think I figured it out, see testwiki:Birmingham, is that what you were trying to achieve? — xaosflux Talk 20:02, 2 February 2023 (UTC)Reply[reply]
Thank you, but no. My point is that the template is not - apparently - parsing the IPA, but the value of the "text" parameter. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:10, 2 February 2023 (UTC)Reply[reply]
@Pigsonthewing I think the documentation needs a lot of work and opened phab:T328705 about it. — xaosflux Talk 20:48, 2 February 2023 (UTC)Reply[reply]

Could we have a response, here, please, from User:NRodriguez (WMF), User:Whatamidoing (WMF), User:MusikAnimal (WMF), User:TheresNoTime-WMF, or one of the other WMF folk working on this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:26, 26 February 2023 (UTC)Reply[reply]

Try ˈbɜːmɪŋəm or (even though en-GB is based on a non-rhotic accent) ˈbɜːɹmɪŋəm. The list of accepted phonemes is here and <r> is not one of them. Nardog (talk) 01:25, 27 February 2023 (UTC)Reply[reply]
Accepted by whom? The IPA I quoted above was copied from en:Birmingham. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:51, 27 February 2023 (UTC)Reply[reply]
By Google's text-to-speech engine, which Phonos relies on. So the description of Phonos as IPA-to-audio is somewhat misleading—it's really text-to-speech that sometimes accepts IPA as a bonus. The Google TTS supports IPA as input for only a subset of all supported languages (18 out of 53 to be exact). It also accepts not IPA but Pinyin and Jyutping for Mandarin and Cantonese. I've been advocating for renaming ipa="" and making it optional and supporting other phoneme schemes (Pinyin, Jyutping, and X-SAMPA), but they haven't made it clear they're doing it, which is super weird because doing so allows them to support with no extra cost 35 more languages, which include the 2nd, 6th, 7th, 8th, 9th, and 10th most widely spoken languages. Nardog (talk) 13:42, 27 February 2023 (UTC)Reply[reply]
@Pigsonthewing: as Nardog mentions, the voice models provided by Google (our currently-selected text-to-speech engine) only support certain phonemes and as such will "fall back" to reading the text parameter if an unsupported phoneme is provided in the IPA.
Unfortunately, we don't know how Google's voice models are implemented, but the current standard seems to be VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech)[1] — as we skip the step of phonemization (converting text to phonemes) by directly supplying the phonemes in the IPA, we need to ensure we input only phonemes which the voice model has been trained on. Additionally, when a model is trained, we don't always know the exact use certain phonemes are assigned — ə for example, is often used in at least 3 conflicting ways.
When building a tool such as this, we are limited by both the international phonetic alphabet (something I had only recently learnt from an impromptu chat with computational linguist Dr. Angus Andrea Grieve-Smith can be considered to "fall short of the ideal consistent representation that was sold to people"[2]) and the publicly available voice models.
As an aside, I recently spoke to Alan Pope, on whom a fairly robust voice model has been trained[3] — his blog post on the matter is a wonderful read for anyone interested in this part of the process! Of note is his voice models' supported phonemes.
I hope this goes a little way to highlighting the complexity, and resultant limitations, of what we're trying to do and I'd be more than happy to answer any further questions you may have. — TheresNoTime-WMF (talk • they/them) 14:39, 27 February 2023 (UTC)Reply[reply]
P.S. Way out of scope here, but wouldn't it be awesome to train our own voice model using a dataset provided by LinguaLibre? — TheresNoTime-WMF (talk • they/them) 14:54, 27 February 2023 (UTC)Reply[reply]
Though it is a common misconception that the IPA is "the ideal consistent representation"—so common that my enwp user page dedicates a section to it—it was never sold as such by the IPA (the association) itself. It was already telling you to "leave out everything that can be explained once for all" in 1904!
Out of curiosity, can you tell me what the three conflicting ways ə is used by Google? It might simply be that they correctly understand what a phoneme is: an abstract category encompassing multiple sounds (aka phones) in complementary distribution. But if not it has implications on template implementation when it's rolled out to major wikis. Nardog (talk) 15:56, 28 February 2023 (UTC)Reply[reply]
Maybe we should say the opposite, that Wikipedia doesn’t know what a phoneme is. The telling thing is that on Wikipedia most IPA is notated as phonetic, not phonemic. I have no idea who made this decision and why. Al12si (talk) 01:30, 23 March 2023 (UTC)Reply[reply]
Yes, ok, it's complex, but the Wish is called Generate Audio for IPA and the team claimed that they were working on that when attempting to cover the total failure of the Wishlist system some months ago. Theklan (talk) 21:42, 10 July 2023 (UTC)Reply[reply]

Not what was required edit

The proposal was for an IPA-to-audio renderer. It is apparent that what is being built is largely a plain-text-to-audio renderer. This is not what was requested, nor what is required. Rendering a text value will not allow anyone to know whether the IPA is correct, nor what the IPA is intended to sound like. It will not allow comparison of two different IPA representations of the same text lexeme. If an IPA-to-audio renderer is not possible, the request should have been - and indeed still should be - declined. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:06, 22 June 2023 (UTC)Reply[reply]

@Pigsonthewing You mentioned It is apparent that what is being built is largely a plain-text-to-audio renderer. Is this bold conclusion solely from the update posted today 22 June 2023? Or it's from something you have observed so far including the pilot wikis? Please let me know, so this can be cleared up.
This project is still about Generating Audio for IPA. ––– STei (WMF) (talk) 13:59, 22 June 2023 (UTC)Reply[reply]
Both today's update and the section above this one. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:31, 22 June 2023 (UTC)Reply[reply]
And also the current usage and examples. The deployment status is not about IPA rendering, is about an inline player, which is another wish. Theklan (talk) 21:32, 10 July 2023 (UTC)Reply[reply]
That never made sense anyway. The vast majority of IPA transcriptions are phonemic or allophonic transcriptions, which are language-specific and convey only selective information about exact articulatory configurations, omitting specifics that are either predictable according to the phonology of the language or irrelevant to the discussion at hand (see Handbook of the IPA, pp. 29–30). That means speech synthesis that directly derives audio from symbols is not an option (I guess unless you painstakingly recreate all the omitted parts in input for the audio to accompany each simpler, more legible transcription). So the only way that's humanly possible is language-specific text-to-speech. And it so happens that the only kinds of text-to-speech that don't sound horrendous are machine-trained ones, which typically accept IPA as input for only a portion of the supported languages (Google's, which CommTech initially went by, supports it for less than a half of all supported languages).
Then there are competing conventions. As the Handbook (p. 30) points out, /iː/ and /ɪ/, /iː/ and /i/, and /i/ and /ɪ/ are all valid ways to represent the vowels in heed and hid that are all "in accord with the principles of the IPA". So you can't tell whether /i/ is supposed to sound like the vowel in heed or hid just by looking at it. That means, even if you know what language is being transcribed, you can never tell if the resultant audio is correct without knowing the underlying context and conventions.
The very premise of the CWS wish was an untenable one, which is why I didn't vote for it and I suspect why (AFAICS) nobody who is actually a frequent editor of IPA transcriptions did. But CommTech didn't know that when they began working on it. Nardog (talk) 16:45, 22 June 2023 (UTC)Reply[reply]
"If an IPA-to-audio renderer is not possible, the request should have been - and indeed still should be - declined.". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:10, 22 June 2023 (UTC)Reply[reply]
You asked, as a reader, for a feature that made reading IPA redundant. You proposed automatic generation of audio from IPA, which is infeasible, as the means to accomplish it. That doesn't mean there aren't other means that can make reading IPA redundant for readers, like human editors manually inputting a prompt to generate audio, judging its quality, and adding it. Nardog (talk) 19:05, 22 June 2023 (UTC)Reply[reply]
I did not. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:26, 22 June 2023 (UTC)Reply[reply]
You didn't what? And whether my summary of your proposal is accurate or not, voters and CommTech certainly seem to have interpreted it that way. Nardog (talk) 01:59, 23 June 2023 (UTC)Reply[reply]
Sorry, but we are discussing here about a wish called "Generate Audio for IPA", which is not being done. Also, in the discussion we had this year about the lack of wishes fulfilled, the WMF team said that the "Generate Audio for IPA" was coming. Which is not. Theklan (talk) 21:34, 10 July 2023 (UTC)Reply[reply]
I've been watching this ad-nauseam over the last couple of months... and there are a few editors here who are disproportionally represented and attempting to influence what this feature should or should not be. I heavily advise inviting those who voted for the feature to give their opinion on what they want, with the information and experience that has been collected, as otherwise what has been built will likely not be accepted by those who asked for it.
Secondly, while personally I fear this is going to turn into a tool to fight the American vs British vs Canadian English wikiwars, I think it's important to realise that the general public probably won't care at all about IPA. It's my opinion that they only need a pronunciation and the whole IPA business can be removed from the lead as far as they are concerned. So even if we have gathered better feedback from more than the 4 people on this page, it is probably worth it to ask the general public what THEY want.
All in all, this seems a very good demonstration of why the Community Wishlist survey should be limited to smaller projects instead of these massive complicated projects that generally make it into the top 10 and why editors should not be doing product development. —TheDJ (talkcontribs) 13:42, 28 June 2023 (UTC)Reply[reply]
the general public probably won't care at all about IPA That's exactly why I advocated for making Phonos about generic text-to-speech rather than strictly about IPA-to-audio, which they turned down on the grounds that it was "not in the roadmap". It's alarming to me that they're still saying it's "about Generating Audio for IPA" despite the fact, according to this page, the project is supposed to address readers' inability to read IPA markup so generic TTS that supports more languages would clearly be a better solution. I hope they only mean that the CWS project is about IPA-to-audio and the Language team picks it up to make something that makes more sense. Nardog (talk) 16:36, 28 June 2023 (UTC)Reply[reply]
  • I voted for this, and think the primary benefit is that readers may want to know how a word should be pronounced. Many projects have spent considerable effort annotating these words with IPA - so an IPA-->sound solution could be useful, but I think the core benefit to the reader is just being able to hear the word without contributors recording and uploading audio files manually for each word to be announced. So perhaps an IPA rendered isn't being delivered, and maybe one day it could be - but working on a text-to-audio rendering solution isn't useless. — xaosflux Talk 14:26, 29 June 2023 (UTC)Reply[reply]
    It doesn't have to be either-or. If whatever engine you're relying on supports IPA for some languages, go for it, but it makes no sense to then preclude all other supported languages from being heard. Nardog (talk) 17:37, 29 June 2023 (UTC)Reply[reply]

@User:NRodriguez (WMF): Please see mw:Help talk:Extension:Phonos. The announcement has faulty examples. The "help page" is misleading. What are "some engines"? What is this extension supposed to do? The predominant effect I can see are inappropriate error messages and useless tracking categories. Community_Wishlist_Survey_2022/Reading/IPA_audio_renderer. Taylor 49 (talk) 21:27, 16 September 2023 (UTC)Reply[reply]

Please scrape this "Phonos" immediately edit

Yesterday I swichted the pronunciation template at Swedish wiktionary to Phonos. I had to partially revert the change due to dysfunctionality. Most likely I will remove it completely. I propose to completely scrape Phonos. Reasons:

  • it's dysfunctional: if "ipa=" is fed in but "file=" not then it causes an error and puts the page into a tracking cat, it cannot "read" IPA
  • it does not provide anything beyond the capabilities of the old templates
  • the look/layout is bad and hard to improve
  • it uses "Google API" phab:T317274 (I do not want to end up with public WMF wikis accessible from ChromeBook only and only after logging into "your" Google account after having consented to Google's TOS, also the attitude "let's bet on proprietary software until free software is avaialable and good enough" is inherently wrong, it has been applied again and again during the past 25 years, and the outcome was again and again bad (MNG vs Macromedia, Theora vs Q264, ...), there is no need to have public WMF wikis dependent on (and paying to) Google)
  • it converts Vorbis files to MP3 phab:T346508 (there is really no reason to do so, waste of resources, and promotion of proprietary "technologies")
  • the documentation is incomprehensible, the announcements cross-posted too all wikis have faulty examples, it's obscure what the "PhonosInlineAudioPlayerMode" does or how to enable or disable it
  • difficult to invoke from LUA, has to be lauched through hacky "extensionTag" leaving behind "striptease markers"

@User:Nardog @User:Pigsonthewing @User:Xaosflux @User:TheresNoTime-WMF @User:TheDJ @User:Theklan @User:Al12si @User:Whatamidoing (WMF) @User:NRodriguez (WMF) @User:STei (WMF) @User:MusikAnimal (WMF) @[[User:Noé 1]] @User:HLHJ @User:Samwilson @User:Quiddity: I mean it should get deprecated on all WMF wikis, and deactivated on all WMF wikis soon later. Taylor 49 (talk) 15:46, 17 September 2023 (UTC)Reply[reply]

As the Status Updates section makes clear, installations of Phonos on WMF wikis are in the inline audio player mode so ipa= is not available, and the Language team plans to expand the offering of open language services with Text-to-Speech, creating a stable technological foundation for projects such as the IPA Audio Renderer, which indicates it won't rely on Google when/if the IPA-to-audio generation becomes available. Nardog (talk) 15:49, 17 September 2023 (UTC)Reply[reply]
These comments mostly make me want to just not work on MediaWiki. —TheDJ (talkcontribs) 17:02, 17 September 2023 (UTC)Reply[reply]
Hi, apologies for the misunderstandings! As noted above, IPA rendering is still coming and without use of a proprietary API. It's worth mentioning however that all requests to Google were made on the backend, so there's no TOS for you to agree to, nor was anyone's data ever shared with Google. Even the backend request itself goes through an anonymized proxy.
We apologize if our status updates and the relevant Tech News announcement were unclear or misleading. Both link to mw:Help:Extension:Phonos, which we hope sufficiently describes how inline audio player works. I realize a lot of the other information on that page is written with the assumption IPA transcription works, but this is because mediawiki.org is intended for audiences in and outside Wikimedia, so while we don't have IPA rendering yet, third-party wikis can still enable it. For added clarity, I've added a note to the top of the page explaining the current situation at Wikimedia.
See the replies at phab:T346508 on why we are using MP3 – namely that it has wider support than other formats and is now non-proprietary.
I'm not sure why use of the Lua extensionTag is considered hacky. Phonos should be no different than any other extension-supplied tag such as <ref>...</ref>. I will note we originally had implemented Phonos as a parser function, but ran into issues like phab:T317112 that forced us to move it to a tag. In our case, we only work with unprocessed wikitext, so a tag makes more sense than a parser function, anyway.
We spent considerable time building Phonos, so I don't think it should be scrapped. It provides unique functionality even while only in inline audio player mode, and we're confident our friends on the Language team will deliver with an IPA renderer given their expertise in this area.
We appreciate your feedback and patience on this project. Warm regards, MusikAnimal (WMF) (talk) 18:23, 17 September 2023 (UTC)Reply[reply]
I disagree with your message, @Taylor 49. Having an inline player that plays sounds directly on the page, without opening another file, is a great advancement, and will make the reader's experience way better. I agree that the documentation is complex and misleading, and it doesn't do what it was wished (and that's a huge hole). Nevertheless, I hope it will do it in the future, and I hope that the future is near. Theklan (talk) 23:51, 17 September 2023 (UTC)Reply[reply]
Indeed "having an inline player that plays sounds directly on the page, without opening another file" is a great benefit ... but this privilege existed already before Phonos. Taylor 49 (talk) 13:37, 18 September 2023 (UTC)Reply[reply]
Yes, you can add a file and play it, but it will have a quite large play bar, which is not practical when adding it inline. Theklan (talk) 13:05, 19 September 2023 (UTC)Reply[reply]
As above, this short answer is that the current phonos tool is NOT an IPA engine, and you shouldn't try to use it for that purpose. That doesn't mean it is useless. — xaosflux Talk 23:58, 17 September 2023 (UTC)Reply[reply]
Hello @Taylor 49,
Apologies for the confusion about the Phonos. @MusikAnimal (WMF) has clarified most of the concerns you raised, and we would like to state again that the recent enablement was the inline audio player mode, which makes it easy to play audio files in Wikis without leaving the page and not the IPA.
You also pointed out that the privilege of playing audio without leaving the page existed before the phonos; for context, the main plan is for Phonos to provide audio for IPA. We wanted to allow for more granularity by allowing audio upload rather than generating it in order to accommodate different pronunciations and dialects.
So, Phonos can now render audio as an alternative option until we have an IPA engine, which the Language team will work on.
On your comment: "the documentation is incomprehensible, the announcements cross-posted to all wikis have faulty examples, it's obscure what the "PhonosInlineAudioPlayerMode" does or how to enable or disable it". We have made some changes to the page and will continue improving the documentation to eliminate any ambiguity. As for the examples shown in the announcement, it illustrates how the inline audio player is used unless there is something we need to include that you can help us understand.
To conclude, thank you to everyone who contributed to this conversation, and I hope we have clarified things for you @Taylor 49. Please let us know if you still have any follow-up questions.
UOzurumba (WMF) (talk) 12:42, 26 September 2023 (UTC) On behalf of the CommTech team.Reply[reply]
Thanks for the answer. It's the claim "only audio files will play" that is incomprehensible (together with the inverted logic: disabled -> more features). From what has been said here and elsewhere, it probably means:

For now the extension is in the "PhonosInlineAudioPlayerMode" on all WMF wikis, what means that the only feature available is inline playing of audio files (similarly to what has already been possible for long time), whereas feeding in IPA but no audio file is a bad idea, since it will admittedly show up, but not play, and instead show a red error and add a tracking category complaining about "rendering error".

Taylor 49 (talk) 14:48, 26 September 2023 (UTC)Reply[reply]
Yes, @Taylor 49, the PhonoinlineAudioPlayerMode” is in all WMF wikis, and it only works on file= and wikibase= parameters (feeding in IPA won’t work) as explained in this part of the Phonos help page. Thank you! UOzurumba (WMF) (talk) 19:18, 26 September 2023 (UTC)Reply[reply]
I don’t think WMF’s approach will work. Tl;dr, I don’t think any existing TTS system understands enwiki’s idea of IPA.
I know neither espeak nor Google supports generic IPA; they’re all language-specific. I happened to have tested Phonos on the test wiki (not knowing it’s configured differently — IPA is enabled over there) and found that Phonos actually understands phonemic IPA, although it’s useless for my language since it’s one of the two languages where Google has explicitly said IPA is not supported.
If WMF is going to make something that supports generic, phonetic IPA transcriptions, I believe they’ll have to train their own language model; that will be a huge amount of work. And Mimic, not espeak, is probably better suited as a base. I’ve tested Mimic3 for my own use recently and its quality is comparable to what I’ve seen on MacOS or on Android (although, of course, they don’t support generic IPA either because AFAIK nothing does). Al12si (talk) 01:46, 10 October 2023 (UTC)Reply[reply]
I don't believe WMF is working on something that aims to support generic IPA. But if they are, that is indeed something I would also, and in the strongest possible terms, recommend against developing.
(Generic, language-neutral IPA is certainly not "enwiki's idea of IPA", if that's your insinuation. Narrowness in phonetic transcriptions is a vast spectrum; they can be as broad as phonemic transcriptions and as narrow as impressionistic ones. enwiki uses diaphonemic transcriptions for English and broad phonetic transcriptions for other languages—except in specialized discussions about the relevant phonetics or phonology—which all "conform fully to the principles of the IPA" as described in the Handbook. Building software that generates accurate audio from such varyingly narrow transcriptions is, as you point out, impossible, because the same transcription can represent completely different sounds depending on the intent of the transcriber.) Nardog (talk) 02:56, 10 October 2023 (UTC)Reply[reply]
Return to "Community Wishlist Survey 2022/Generate Audio for IPA" page.