<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/rss-style.xsl" type="text/xsl"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	     xmlns:dc="http://purl.org/dc/elements/1.1/"
	   xmlns:atom="http://www.w3.org/2005/Atom"
	     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	  xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>tts &#8211; Terence Eden’s Blog</title>
	<atom:link href="https://shkspr.mobi/blog/tag/tts/feed/" rel="self" type="application/rss+xml" />
	<link>https://shkspr.mobi/blog</link>
	<description>Regular nonsense about tech and its effects 🙃</description>
	<lastBuildDate>Thu, 29 Jan 2026 09:19:14 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://shkspr.mobi/blog/wp-content/uploads/2023/07/cropped-avatar-32x32.jpeg</url>
	<title>tts &#8211; Terence Eden’s Blog</title>
	<link>https://shkspr.mobi/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title><![CDATA[1KB JS Numbers Station]]></title>
		<link>https://shkspr.mobi/blog/2025/07/1kb-js-numbers-station/</link>
					<comments>https://shkspr.mobi/blog/2025/07/1kb-js-numbers-station/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sun, 20 Jul 2025 11:34:53 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[tts]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=62005</guid>

					<description><![CDATA[Code Golf is the art/science of creating wonderful little demos in an artificially constrained environment. This year the js1024 competition was looking for entries with the theme of &#34;Creepy&#34;.  I am not a serious bit-twiddler. I can&#039;t create JS shaders which produce intricate 3D worlds in a scrap of code. But I can use slightly obscure JavaScript APIs!  There&#039;s something deliciously creepy about…]]></description>
										<content:encoded><![CDATA[<p>Code Golf is the art/science of creating wonderful little demos in an artificially constrained environment. This year the <a href="https://js1024.fun/">js1024 competition</a> was looking for entries with the theme of "Creepy".</p>

<p>I am not a serious bit-twiddler. I can't create JS shaders which produce intricate 3D worlds in a scrap of code. But I <em>can</em> use slightly obscure JavaScript APIs!</p>

<p>There's something deliciously creepy about <a href="https://priyom.org/number-stations">Numbers Stations</a> - the weird radio frequencies which broadcast seemingly random numbers and words. Are they spies communicating? Commands for nuclear missiles? Long range radio propagation tests? Who knows!</p>

<p>So I decided to build one. <a href="https://js1024.fun/demos/2025/24/bar">Play with the demo</a>.</p>

<p>Obviously, even the <a href="https://shkspr.mobi/blog/2020/09/a-floppy-disk-mp3-player-using-a-raspberry-pi/">most extreme opus compression</a> can't fit much audio into 1KB. Luckily, JavaScript has you covered! Most modern browsers have a built-in Text-To-Speech (TTS) API.</p>

<p>Here's the most basic example:</p>

<pre><code class="language-js">m = new SpeechSynthesisUtterance;
m.text = "Hello";
speechSynthesis.speak(m);
</code></pre>

<p>Run that JS and your computer will speak to you!</p>

<p>In order to make it creepy, I played about with the rate (how fast or slow it speaks) and the pitch (how high or low).</p>

<pre><code class="language-js">m.rate=Math.random();
m.pitch=Math.random()*2;
</code></pre>

<p>It worked disturbingly well! High pitched drawls, rumbling gabbling, the languid cadence of a chattering friend. All rather creepy.</p>

<p>But <em>what</em> could I make it say? Getting it to read out numbers is pretty easy - this will generate a random integer:</p>

<pre><code class="language-js">s = Math.ceil( Math.random()*1000 );
</code></pre>

<p>But a list of words would be tricky. There's not much space in 1,024 bytes for anything complex. The rules say I can't use any external resources; so are there any <em>internal</em> sources of words? Yes!</p>

<pre><code class="language-js">Object.getOwnPropertyNames( globalThis );
</code></pre>

<p>That gets all the properties of the global object which are available to the browser! Depending on your browser, that's over 1,000 words!</p>

<p>But there's a slight problem. Many of them are quite "computery" words like "ReferenceError", "URIError", "Float16Array". I wanted all the <em>single</em> words - that is, anything which only has one capital letter and that's at the start.</p>

<pre><code class="language-js">const l = (n) =&gt; {
    return ((n.match(/[A-Z]/g) || []).length === 1 &amp;&amp; (n.charAt(0).match(/[A-Z]/g) || []).length === 1);
};

//   Get a random result from the filter
s = Object.getOwnPropertyNames( globalThis ).filter( l ).sort( ()=&gt;.5-Math.random() )[0]
</code></pre>

<p>Rather pleasingly, that brings back creepy words like "Event", "Atomics", and "Geolocation".</p>

<p>Of course, Numbers Stations don't just broadcast in English.  The TTS system can vocalise in multiple languages.</p>

<pre><code class="language-js">//   Set the language to Russian
m.lang = "ru-RU";
</code></pre>

<p>OK, but where do we get all those language strings from? Again, they're built in and can be retrieved randomly.</p>

<pre><code class="language-js">var e = window.speechSynthesis.getVoices();
m.lang = e[ (Math.random()*e.length) |0 ]
</code></pre>

<p>If you pass the TTS the number 555 and ask it to speak German, it will read out <i lang="de">fünfhundertfünfundfünfzig</i>.</p>

<p>And, if you tell the TTS to speak an English word like "Worker" in a foreign language, it will pronounce it with an accent.</p>

<p>Randomly altering the pitch, speed, and voice to read out numbers and dissociated words produces, I think, a rather creepy effect.</p>

<script>const l = (n) => {
    return ((n.match(/[A-Z]/g) || []).length === 1 && (n.charAt(0).match(/[A-Z]/g) || []).length === 1);
};
m = new SpeechSynthesisUtterance;

function g() {
    setInterval(() => {
        s = Object.getOwnPropertyNames(globalThis).filter(l).sort(() => .5 - Math.random())[0]
        if (Math.random() > .3) {
            s = Math.ceil(Math.random() * 1000);
        }
        var e = window.speechSynthesis.getVoices();
        m.rate = Math.random(), m.pitch = Math.random() * 2, m.text = s, m.lang = e[(Math.random() * e.length) | 0]["lang"];
        speechSynthesis.speak(m);
    }, 2501);
}</script>

<p>If you want to test it out, you can press this button. I find that it works best in browsers with a good TTS engine - let me know how it sounds on your machine.</p>

<p><button onclick="g()">🅝🅤🅜🅑🅔🅡🅢 🅢🅣🅐🅣🅘🅞🅝</button></p>

<p>With the remaining few bytes at my disposal, I produced a quick-and-dirty random pattern using Unicode drawing blocks. It isn't very sophisticated, but it does have a little random animation to it.</p>

<p>You can <a href="https://js1024.fun/demos/2025">play with all the js1024 entries</a> - I would be delighted if you voted <a href="https://js1024.fun/demos/2025/24/bar">for mine</a>.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=62005&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/07/1kb-js-numbers-station/feed/</wfw:commentRss>
			<slash:comments>7</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Unicode Roman Numerals and Screen Readers]]></title>
		<link>https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/</link>
					<comments>https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 15 Mar 2023 12:34:02 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[a11y]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[Latin]]></category>
		<category><![CDATA[romans]]></category>
		<category><![CDATA[tts]]></category>
		<category><![CDATA[unicode]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=45103</guid>

					<description><![CDATA[How would you read this sentence out aloud?  &#34;In Hamlet, Act Ⅳ, Scene Ⅸ...&#34;  Most people with a grasp of the interplay between English and Latin would say &#34;In Hamlet, Act four, scene nine&#34;.  And they&#039;d be right!  But screen-readers - computer programs which convert text into speech - often get this wrong.  Why? Well, because I didn&#039;t just type &#34;Uppercase Letter i, Uppercase Letter v&#34;. Instead, I u…]]></description>
										<content:encoded><![CDATA[<p>How would you read this sentence out aloud?</p>

<p>"In Hamlet, Act Ⅳ, Scene Ⅸ..."</p>

<p>Most people with a grasp of the interplay between English and Latin would say "In Hamlet, Act four, scene nine".  And they'd be right!  But screen-readers - computer programs which convert text into speech - often get this wrong.</p>

<p>Why? Well, because I didn't just type "Uppercase Letter i, Uppercase Letter v". Instead, I used the Unicode symbol for the Roman numeral 4 - <code>Ⅳ</code>.  And, it turns out, lots of screen-readers have a problem with those characters.</p>

<h2 id="dont-know-much-about-history"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#dont-know-much-about-history">Don't Know Much About History</a></h2>

<p>Unicode contains the range of Roman numbers from 1 - 10, plus a couple of compound numbers, 50, 100, 500, and 1000 - in a variety of forms.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2023/03/Screenshot-2023-03-04-at-00-05-15-Numerals-in-Unicode-Wikipedia.png" alt="Screenshot of a Table of Roman numerals in Unicode." width="927" height="244" class="aligncenter size-full wp-image-45110">

<p>Why does Unicode contain these number which, to most people, are just squashed together Latin letter?  As ever with Unicode, it is a mix of legacy and practicality.</p>

<p>The <a href="https://www.unicode.org/versions/Unicode6.0.0/ch15.pdf">Unicode standard says</a>:</p>

<blockquote><p><strong>Roman Numerals.</strong> For most purposes, it is preferable to compose the Roman numerals from sequences of the appropriate Latin letters. However, the uppercase and lowercase variants of the Roman numerals through 12, plus L, C, D, and M, have been encoded for compatibility with East Asian standards. Unlike sequences of Latin letters, these symbols remain upright in vertical layout. Additionally, in certain locales, compact date formats use Roman numerals for the month, but may expect the use of a single character.</p></blockquote>

<p>Far be it for me to disagree with the learned authors of the spec, but I think they may have erred slightly on this one.  While it may be <em>preferable</em> to re-use Latin letters, it leads to ambiguity which can be confusing for a screen-reader.</p>

<h2 id="practical-examples"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#practical-examples">Practical Examples</a></h2>

<p>Let's write out the numbers using regular letters. Suppose you were talking about "Romeo and Juliet, Act III, Scene I".  Most screen readers will see the "III" and correctly speak aloud "Roman three" or similar. But when they get to the "I" it becomes ambiguous. Most will read out "Eye".</p>

<p>Screen-readers rarely look at the whole sentence for context. Which means they get confused. It's fairly obvious that XIV should be "fourteen" as there's no English word "xiv"<sup id="fnref:scrabble"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#fn:scrabble" class="footnote-ref" title="I'm sure there's some obscure Scrabble word, but we're talking everyday use here." role="doc-noteref">0</a></sup>. But what about "MIX" - is that 1009 or the word "mix"?</p>

<p>Anyone who has watched the BBC knows about their fondness for displaying in Latin the year a programme was made. MCMXCVI is particularly challenging for a screen-reader!</p>

<h2 id="testing-it"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#testing-it">Testing It</a></h2>

<p>I took the following sample sentence - using both letters and Roman numerals.</p>

<blockquote><p>Text. In Hamlet, Act I, Scene XI the year is MCMXCVI and they are watching Rocky V.</p>

<p>Roman. In Hamlet, Act Ⅰ, Scene Ⅺ the year is ⅯⅭⅯⅩⅭⅥ and they are watching Rocky Ⅴ.</p></blockquote>

<p>Here's how various services coped:</p>

<h3 id="amazon-polly"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#amazon-polly">Amazon Polly</a></h3>

<p>First, the good news. Amazon's Polly read the Roman numerals perfectly. It even pronounced <code>ⅯⅭⅯⅩⅭⅥ</code> as "nineteen ninety six".
</p><figure class="audio">
	<figcaption>🔊</figcaption>
	
	<audio controls="" loading="lazy" src="https://shkspr.mobi/blog/wp-content/uploads/2023/03/polly-roman-test.mp3">
		<p>💾 <a href="https://shkspr.mobi/blog/wp-content/uploads/2023/03/polly-roman-test.mp3">Download this audio file</a>.</p>
	</audio>
</figure>
But it gets rather confused with the ambiguous English text.<p></p>

<h3 id="microsoft-edge-read-aloud"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#microsoft-edge-read-aloud">Microsoft Edge Read Aloud</a></h3>

<p>I tried with <a href="https://pypi.org/project/edge-tts/">Microsoft Edge's Read Aloud TTS</a>.</p>

<p></p><figure class="audio">
	<figcaption>🔊</figcaption>
	
	<audio controls="" loading="lazy" src="https://shkspr.mobi/blog/wp-content/uploads/2023/03/edge.mp3">
		<p>💾 <a href="https://shkspr.mobi/blog/wp-content/uploads/2023/03/edge.mp3">Download this audio file</a>.</p>
	</audio>
</figure><p></p>

<p>It and makes a bit of a hash of the English and just skips the Roman numerals.</p>

<h3 id="google-text-to-speech"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#google-text-to-speech">Google Text To Speech</a></h3>

<p>The same was also true with <a href="https://cloud.google.com/text-to-speech/">Google's TTS products</a>.</p>

<p></p><figure class="audio">
	<figcaption>🔊</figcaption>
	
	<audio controls="" loading="lazy" src="https://shkspr.mobi/blog/wp-content/uploads/2023/03/gtts.mp3">
		<p>💾 <a href="https://shkspr.mobi/blog/wp-content/uploads/2023/03/gtts.mp3">Download this audio file</a>.</p>
	</audio>
</figure><p></p>

<h3 id="espeak-ng"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#espeak-ng">Espeak NG</a></h3>

<p>The <a href="https://github.com/espeak-ng/espeak-ng">venerable Linux utility</a> came out with this. 
</p><figure class="audio">
	<figcaption>🔊</figcaption>
	
	<audio controls="" loading="lazy" src="https://shkspr.mobi/blog/wp-content/uploads/2023/03/espeak.mp3">
		<p>💾 <a href="https://shkspr.mobi/blog/wp-content/uploads/2023/03/espeak.mp3">Download this audio file</a>.</p>
	</audio>
</figure><p></p>

<p>It gets the "Capital i" incorrect, and reads the Roman numerals as their Unicode code points.</p>

<h3 id="jaws"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#jaws">Jaws</a></h3>

<p>My good friend <a href="https://tink.uk/about-leonie/">Léonie Watson</a> who <a href="https://tink.uk/">writes extensively about accessibility</a> was kind enough to record some other samples for me.</p>

<p>Here are Jaws' "Expressive":
</p><figure class="audio">
	<figcaption>🔊</figcaption>
	
	<audio controls="" loading="lazy" src="https://shkspr.mobi/blog/wp-content/uploads/2023/03/Jaws_Vocalizer-Expressive-Kate-TTS.mp3">
		<p>💾 <a href="https://shkspr.mobi/blog/wp-content/uploads/2023/03/Jaws_Vocalizer-Expressive-Kate-TTS.mp3">Download this audio file</a>.</p>
	</audio>
</figure><p></p>

<p>And Jaws' "Eloquence:
</p><figure class="audio">
	<figcaption>🔊</figcaption>
	
	<audio controls="" loading="lazy" src="https://shkspr.mobi/blog/wp-content/uploads/2023/03/Jaws_Eloquence-TTS-Reed.mp3">
		<p>💾 <a href="https://shkspr.mobi/blog/wp-content/uploads/2023/03/Jaws_Eloquence-TTS-Reed.mp3">Download this audio file</a>.</p>
	</audio>
</figure><p></p>

<h3 id="nvda"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#nvda">NVDA</a></h3>

<p>Léonie also provided a recording of NVDA Microsoft One Core
</p><figure class="audio">
	<figcaption>🔊</figcaption>
	
	<audio controls="" loading="lazy" src="https://shkspr.mobi/blog/wp-content/uploads/2023/03/NVDA_Microsoft-One-Core-TTS-Michael.mp3">
		<p>💾 <a href="https://shkspr.mobi/blog/wp-content/uploads/2023/03/NVDA_Microsoft-One-Core-TTS-Michael.mp3">Download this audio file</a>.</p>
	</audio>
</figure><p></p>

<h3 id="narrator"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#narrator">Narrator</a></h3>

<p>And here's Narrator making a right mess of it.
</p><figure class="audio">
	<figcaption>🔊</figcaption>
	
	<audio controls="" loading="lazy" src="https://shkspr.mobi/blog/wp-content/uploads/2023/03/Narrator_Natural-Voices-TTS-Guy.mp3">
		<p>💾 <a href="https://shkspr.mobi/blog/wp-content/uploads/2023/03/Narrator_Natural-Voices-TTS-Guy.mp3">Download this audio file</a>.</p>
	</audio>
</figure><p></p>

<h3 id="others"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#others">Others</a></h3>

<p>If you know of any other screen-readers, or text-to-speech engines which can cope with this, please let me know!</p>

<h2 id="fixing-it"><a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#fixing-it">Fixing it</a></h2>

<p>On Linux, I <a href="https://github.com/espeak-ng/espeak-ng/pull/1672">raised a Pull Request to fix espeak-ng</a>.</p>

<p>The rest of the services don't seem to have a way to easily report bugs to them.  If you know a way to raise issues with these screen readers - please do so!</p>

<div id="footnotes" role="doc-endnotes">
<hr>
<ol start="0">

<li id="fn:scrabble">
<p>I'm sure there's some obscure Scrabble word, but we're talking everyday use here.&nbsp;<a href="https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/#fnref:scrabble" class="footnote-backref" role="doc-backlink">↩︎</a></p>
</li>

</ol>
</div>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=45103&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2023/03/unicode-roman-numerals-and-screen-readers/feed/</wfw:commentRss>
			<slash:comments>13</slash:comments>
		
		<enclosure url="https://shkspr.mobi/blog/wp-content/uploads/2023/03/polly-roman-test.mp3" length="89325" type="audio/mpeg" />
<enclosure url="https://shkspr.mobi/blog/wp-content/uploads/2023/03/edge.mp3" length="84672" type="audio/mpeg" />
<enclosure url="https://shkspr.mobi/blog/wp-content/uploads/2023/03/gtts.mp3" length="66912" type="audio/mpeg" />
<enclosure url="https://shkspr.mobi/blog/wp-content/uploads/2023/03/espeak.mp3" length="86536" type="audio/mpeg" />
<enclosure url="https://shkspr.mobi/blog/wp-content/uploads/2023/03/Jaws_Vocalizer-Expressive-Kate-TTS.mp3" length="562364" type="audio/mpeg" />
<enclosure url="https://shkspr.mobi/blog/wp-content/uploads/2023/03/Jaws_Eloquence-TTS-Reed.mp3" length="468950" type="audio/mpeg" />
<enclosure url="https://shkspr.mobi/blog/wp-content/uploads/2023/03/NVDA_Microsoft-One-Core-TTS-Michael.mp3" length="314723" type="audio/mpeg" />
<enclosure url="https://shkspr.mobi/blog/wp-content/uploads/2023/03/Narrator_Natural-Voices-TTS-Guy.mp3" length="273345" type="audio/mpeg" />

			</item>
		<item>
		<title><![CDATA[Blog To Speech]]></title>
		<link>https://shkspr.mobi/blog/2022/10/blog-to-speech/</link>
					<comments>https://shkspr.mobi/blog/2022/10/blog-to-speech/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sun, 02 Oct 2022 11:34:41 +0000</pubDate>
				<category><![CDATA[About A Minute]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[podcast]]></category>
		<category><![CDATA[tts]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=43633</guid>

					<description><![CDATA[Listen to this blog post in your browser:            Download MP3 audio.     Powered by Amazon Polly.   I&#039;ve noticed an interesting trend on some of the blogs I follow. More of them - though by no means the majority - are including audio versions of the content.  The usually look something like this:    or    The ones which have this are mostly using commercial Text-To-Speech (TTS) engines.…]]></description>
										<content:encoded><![CDATA[<details open="">
<summary>Listen to this blog post in your browser:</summary>
<audio controls="">
  <source src="https://shkspr.mobi/blog/wp-content/uploads/2022/10/Blog-To-Speech.mp3" type="audio/mpeg">
  <p>
    Download <a href="https://shkspr.mobi/blog/wp-content/uploads/2022/10/Blog-To-Speech.mp3">MP3</a> audio.
  </p>
</audio>
<small>Powered by <a href="https://aws.amazon.com/polly/">Amazon Polly</a>.</small>
</details>

<p>I've noticed an interesting trend on some of the blogs I follow. More of them - though by no means the majority - are including audio versions of the content.</p>

<p>The usually look something like this:</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2022/10/pressplay.jpeg" alt="Screenshot of a blog post. The header says &quot;Press play to listen to this article.&quot;" width="1522" height="604" class="aligncenter size-full wp-image-43635">

<p>or</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2022/10/blog1.png" alt="A blog post with an embedded audio widget." width="1592" height="614" class="aligncenter size-full wp-image-43636">

<p>The ones which have this are mostly using commercial Text-To-Speech (TTS) engines. Although a few of the (perhaps wealthier?) bloggers have hired people to record audio versions of their <del>posts</del> <em>newsletters</em>:</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2022/10/Screenshot-2022-10-01-at-21-39-51-ID-No-Mercy-_-No-Malice.png" alt="Screenshot of a blogpost saying that the audio version has been recorded by &quot;George Hahn&quot;." width="771" height="286" class="aligncenter size-full wp-image-43634">

<p>I find this curious. I don't think it is bad or wrong or unbloggerly. Just a bit odd. I'm from the generation who <em>hated</em> phone calls and ruthlessly mocked voicemail. And now I see <em>the youth</em> leaving each other voicenotes and I feel bemused.</p>

<p><a href="https://shkspr.mobi/blog/2022/01/is-it-faster-to-read-or-to-listen/">Reading is faster than listening</a>. For me, at least. But reading requires focus. It's hard to cook dinner while reading text. But it's pretty easy to do most things while a podcast prattles on in the background.</p>

<p>Obviously people with visual impairments use TTS systems. And they often have those tools built into their computer or browser. But most people with adequate sight don't know how to use their machine's accessibility capabilities. So perhaps having an easily-findable MP3 of the article is sensible?</p>

<p>I often edit old blog posts. Sometimes to merely change a typo, other times to cover up evidence of my muddled thinking. In the land of traditional audio, that's a problem. It's tricky to re-record something and edit it together seamlessly. But, with TTS, it is the work of seconds.</p>

<p>Anyway, that's a long rambley way of saying that I'm experimenting with adding an audio version of my posts. If people like it, I'll start adding it to all of them - and back filling the old posts.</p>

<p>If you think this is a good idea, or a terrible waste of time, be a sweetheart and drop a comment in the box, yeah?</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=43633&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2022/10/blog-to-speech/feed/</wfw:commentRss>
			<slash:comments>16</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Is it faster to read or to listen?]]></title>
		<link>https://shkspr.mobi/blog/2022/01/is-it-faster-to-read-or-to-listen/</link>
					<comments>https://shkspr.mobi/blog/2022/01/is-it-faster-to-read-or-to-listen/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sat, 29 Jan 2022 12:34:39 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[MSc]]></category>
		<category><![CDATA[reading]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[tts]]></category>
		<category><![CDATA[voice]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=41715</guid>

					<description><![CDATA[Fourteen years ago, I blogged about the future of voice. In the post, I asked these two questions - which I&#039;d nicked from someone else:   Are you faster at speaking or typing? Are you faster at reading or listening?   Lots of us now use Siri, Alexa, Bixby, and the like because it is quicker to speak than type. For long-form wordsmithing - it&#039;s still probably easier to type-and-edit than it is to…]]></description>
										<content:encoded><![CDATA[<p>Fourteen years ago, I blogged about <a href="https://shkspr.mobi/blog/2008/05/the-future-of-voice/">the future of voice</a>. In the post, I asked these two questions - which I'd nicked from someone else:</p>

<ol>
<li>Are you faster at speaking or typing?</li>
<li>Are you faster at reading or listening?</li>
</ol>

<p>Lots of us now use Siri, Alexa, Bixby, and the like because it is quicker to speak than type. For long-form wordsmithing - it's still probably easier to type-and-edit than it is to speak-then-edit. And the way humans speak is markedly different from how they write.</p>

<p>But the bottleneck has always been that <em>listening</em> to speech is slower than <em>reading</em> text.</p>

<p>The <a href="https://www.bps.org.uk/research-digest/most-comprehensive-review-date-finds-average-persons-reading-speed-slower">average reading speed is around 238 words per minute</a>. Obviously there are a lot of caveats around the age of the reader, the difficulty of the material, whether one is reading for leisure or work. But it will do as a comparator.</p>

<p>The <a href="https://www.sciencedirect.com/science/article/pii/S0885230819300518">average speaking speed is around 150 words per minute</a>. Again, that depends on the <a href="https://www.sciencedirect.com/science/article/pii/S0892199706000889">age of the speaker</a>, urgency of their talk, <a href="https://www.sciencedirect.com/science/article/pii/S0095447019300543">familiarity with the language</a>, and so on.</p>

<p>Therefore it is faster to read academic papers rather than to listen to academic lectures. Case closed!</p>

<p>Except…</p>

<p>There's a fascinating new paper out - <q><cite itemprop="headline"><a href="https://onlinelibrary.wiley.com/doi/abs/10.1002/acp.3899">Learning in double time: The effect of lecture video speed on immediate and delayed comprehension</a></cite></q>.</p>

<p>Here's the quote I found most interesting - with emphasis added:</p>

<blockquote><p>Collectively, the present experiments indicate that increased video speed (up to 2x) does not negatively impact learning outcomes and watching at faster speeds can be a more efficient use of study time. 

</p><p>Thus, as long as to-be-remembered information can be effectively perceived and encoded, <strong>learning outcomes may not be affected by playback speed</strong>. 

</p><p>However, previous work has indicated that speech comprehension begins to decline at around 275 words per minute (Foulke &amp; Sticht,&nbsp;<span><a href="https://onlinelibrary.wiley.com/doi/full/10.1002/acp.3899#acp3899-bib-0019" id="#acp3899-bib-0019R" class="bibLink tab-link" data-tab="pane-pcw-references">1969</a></span>; see also Goldhaber,&nbsp;<span><a href="https://onlinelibrary.wiley.com/doi/full/10.1002/acp.3899#acp3899-bib-0021" id="#acp3899-bib-0021R" class="bibLink tab-link" data-tab="pane-pcw-references">1970</a></span>; Pastore &amp; Ritzhaupt,&nbsp;<span><a href="https://onlinelibrary.wiley.com/doi/full/10.1002/acp.3899#acp3899-bib-0042" id="#acp3899-bib-0042R" class="bibLink tab-link" data-tab="pane-pcw-references">2015</a></span>; Vemuri et al.,&nbsp;<span><a href="https://onlinelibrary.wiley.com/doi/full/10.1002/acp.3899#acp3899-bib-0055" id="#acp3899-bib-0055R" class="bibLink tab-link" data-tab="pane-pcw-references">2004</a></span>) and the videos in the current study exceeded this threshold when played at 2x speed. 

</p><p>Although the elevated speech rates at 2x speed may initially be less comprehensible to students, researchers have been able to train participants to <strong>understand speech at rates up to 475 WPM</strong> (Orr et al.,&nbsp;<span><a href="https://onlinelibrary.wiley.com/doi/full/10.1002/acp.3899#acp3899-bib-0038" id="#acp3899-bib-0038R" class="bibLink tab-link" data-tab="pane-pcw-references">1965</a></span>). 

</p><p>Therefore, with practice, higher rates of speech may not be completely incomprehensible and since <strong>85% of students reported watching lecture videos at quicker than normal speeds</strong> (see Figure&nbsp;<a href="https://onlinelibrary.wiley.com/doi/full/10.1002/acp.3899#acp3899-fig-0003">3a</a>), they may be better able to process the material as a result of experience.</p></blockquote>

<p>I guess this shouldn't come as a surprise to me. I tend to watch my MSc lectures at 1.75x with subtitles - and have been doing the same with podcasts and tutorial videos for years. Looks like I am in the majority.</p>

<p>If the average person speaks at ~150 Words Per Minute, increasing playback speed to 1.5x gives a listening rate of ~225 WPM. That's about the same as reading speed.</p>

<p>Going to 475 WPM means listening at 3x normal speed.</p>

<p>My mate Léonie Watson is blind and has written extensively about <a href="https://tink.uk/notes-on-synthetic-speech/">the use of text-to-speech technology</a>.  Because she listens to a synthetic voice, with predictable and consistent pronunciation, she's able to listen at about <strong>520 WPM</strong>! That's 3.5x faster than the speech of a  biological human.</p>

<p>I'm not suggesting that you can speed-listen your way through any complicated topic and retain perfect understanding of subject and nuance. But it is becoming clear that <em>synchronous</em> teaching has limitations when it comes to efficiently teaching people. There's no substitute for being able to stop an expert mid-lecture and saying "sorry Prof, I don't get that - could you please help me understand?"  But the reality is, most people never stick their hand up in class. So listening to lectures on playback - at double speed - is simply a better "user experience" for the student.</p>

<p>Learning, of course, isn't just listening to people drone on in front of a blackboard. The student still needs to do the exercises, <a href="https://shkspr.mobi/blog/2022/01/an-algorithm-to-write-an-assignment/">write their essays</a>, consolidate their knowledge, reflect on what they've learned, and so on.</p>

<p>But the ability to "speed" your way through a (well edited and professionally recorded) lecture is something to be welcomed. It gives students more time to spend on their studies with, apparently, no ill effects.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=41715&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2022/01/is-it-faster-to-read-or-to-listen/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[TTSF (Text To Shipping Forecast)]]></title>
		<link>https://shkspr.mobi/blog/2021/10/ttsf-text-to-shipping-forecast/</link>
					<comments>https://shkspr.mobi/blog/2021/10/ttsf-text-to-shipping-forecast/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 13 Oct 2021 11:34:05 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[BBC]]></category>
		<category><![CDATA[robot]]></category>
		<category><![CDATA[tts]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=40619</guid>

					<description><![CDATA[The BBC Shipping Forecast is one of those strange bits of national tradition which, somehow, bridges the gap between infrastructure and folklore.  You can listen listen to the latest forecast on the BBC - read by professional newscasters.  But what if we wanted a robot to read it? If our speaker is sick, bored, or too expensive - how would we automate the audio version of the Shipping Forecast? …]]></description>
										<content:encoded><![CDATA[<p>The BBC Shipping Forecast is one of those strange bits of national tradition which, somehow, bridges the gap between infrastructure and folklore.</p>

<p>You can listen <a href="https://www.bbc.co.uk/programmes/b006qfvv">listen to the latest forecast on the BBC</a> - read by professional newscasters.</p>

<p>But what if we wanted a robot to read it? If our speaker is sick, bored, or too expensive - how would we automate the audio version of the Shipping Forecast?</p>

<p>The <a href="https://www.bbc.co.uk/weather/coast-and-sea/shipping-forecast">BBC publishes the general forecast</a> - but it's important to note that this is <em>not</em> what is read out on air.  Instead, they use <a href="https://www.metoffice.gov.uk/weather/specialist-forecasts/coast-and-sea/print/shipping-forecast">this compressed version published by the Met Office</a>.</p>

<p>The Met's version doesn't have an API - or any other way to get structured information out of it - but the HTML is relatively basic and easy to extract the data from.</p>

<p>Once done, it can be passed to a TTS (Text To Speech) service like Amazon Polly.</p>

<p>Here are the (quick and dirty) results:</p>

<h2 id="female"><a href="https://shkspr.mobi/blog/2021/10/ttsf-text-to-shipping-forecast/#female">Female</a></h2>

<p></p><div style="width: 320px;" class="wp-video"><video class="wp-video-shortcode" id="video-40619-3" width="320" height="320" preload="metadata" controls="controls"><source type="video/mp4" src="https://shkspr.mobi/blog/wp-content/uploads/2021/10/sf2.mp4?_=3"><a href="https://shkspr.mobi/blog/wp-content/uploads/2021/10/sf2.mp4">https://shkspr.mobi/blog/wp-content/uploads/2021/10/sf2.mp4</a></video></div><p></p>

<h2 id="male"><a href="https://shkspr.mobi/blog/2021/10/ttsf-text-to-shipping-forecast/#male">Male</a></h2>

<p></p><div style="width: 320px;" class="wp-video"><video class="wp-video-shortcode" id="video-40619-4" width="320" height="320" preload="metadata" controls="controls"><source type="video/mp4" src="https://shkspr.mobi/blog/wp-content/uploads/2021/10/sf1.mp4?_=4"><a href="https://shkspr.mobi/blog/wp-content/uploads/2021/10/sf1.mp4">https://shkspr.mobi/blog/wp-content/uploads/2021/10/sf1.mp4</a></video></div><p></p>

<h2 id="thoughts"><a href="https://shkspr.mobi/blog/2021/10/ttsf-text-to-shipping-forecast/#thoughts">Thoughts</a></h2>

<p>I've previously experimented with <a href="https://shkspr.mobi/blog/2021/07/synthetic-poetry/">Synthetic Poetry</a>. Robots aren't <em>great</em> at reading out verse - they lack emphasis and emotion. But something like the Shipping Forecast is perfect for them. It requires a calm, even tone. No particular need for words or phrases to be stressed. Each syllable needs to be clearly and well enunciated. When dealing with life-and-death matters, there's no room for error.</p>

<p>Text to speech is - for some very specific use-cases - indistinguishable from organic speech. Although, amusingly, Amazon's system was unable to correctly pronounce "Utsire" - so a little manual intervention was needed on that!</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=40619&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2021/10/ttsf-text-to-shipping-forecast/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Synthetic Poetry]]></title>
		<link>https://shkspr.mobi/blog/2021/07/synthetic-poetry/</link>
					<comments>https://shkspr.mobi/blog/2021/07/synthetic-poetry/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 21 Jul 2021 11:48:51 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[tts]]></category>
		<category><![CDATA[turing]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=39605</guid>

					<description><![CDATA[I&#039;ve been experimenting with Amazon&#039;s Polly service. It&#039;s their fancy text-to-sort-of-human-style-speech system.  Think &#34;Alexa&#34; but with a variety of voices, genders, and accents.  Here&#039;s &#34;Brian&#34; - their English, male, received pronunciation voice - reading John Betjeman&#039;s poem &#34;Slough&#34;:  https://shkspr.mobi/blog/wp-content/uploads/2021/07/slough.mp4  The pronunciation of all the words is…]]></description>
										<content:encoded><![CDATA[<p>I've been experimenting with <a href="https://aws.amazon.com/polly/">Amazon's Polly service</a>. It's their fancy text-to-sort-of-human-style-speech system.  Think "Alexa" but with a variety of voices, genders, and accents.</p>

<p>Here's "Brian" - their English, male, received pronunciation voice - reading John Betjeman's poem "<a href="https://en.wikipedia.org/wiki/Slough_(poem)">Slough</a>":</p>

<p></p><div style="width: 324px;" class="wp-video"><video class="wp-video-shortcode" id="video-39605-6" width="324" height="360" preload="metadata" controls="controls"><source type="video/mp4" src="https://shkspr.mobi/blog/wp-content/uploads/2021/07/slough.mp4?_=6"><a href="https://shkspr.mobi/blog/wp-content/uploads/2021/07/slough.mp4">https://shkspr.mobi/blog/wp-content/uploads/2021/07/slough.mp4</a></video></div><p></p>

<p>The pronunciation of all the words is incredibly lifelike. If you heard it on the radio, it might sound like a half-familiar BBC presenter. It has a calm, even tone which suits the poem splendidly.</p>

<p>The rhythm is also spot on. That's mostly a function of the short lines and helpful punctuation the poem contains. Much like iambic pentameter, or a limerick, the syllables lend themselves to a specific and identifiable cadence.</p>

<p>But the emphasis is all wrong. The poem just... ends. There's no sense of finality in the tone.  You'd expect a competent reader to recognise "tinned <em>minds</em>" as being worthy of stressing.  Polly does have some capability to mark specific words for emphasis, but it's all very manual.</p>

<p>There's no synthetic emotion. Do you feel the rage, desperation, sadness, hopelessness of the poem? While <a href="https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html">Polly has some SSML (Speech Synthesis Markup Language) support</a> - the range of emotions it can express are <a href="https://developer.amazon.com/en-US/docs/alexa/custom-skills/speech-synthesis-markup-language-ssml-reference.html#amazon-emotion">severely limited</a>. And, again, must be applied manually.</p>

<h2 id="i-used-to-be-an-adventurer-like-you-but-then-i-took-an-arrow-in-the-knee"><a href="https://shkspr.mobi/blog/2021/07/synthetic-poetry/#i-used-to-be-an-adventurer-like-you-but-then-i-took-an-arrow-in-the-knee">"I used to be an adventurer like you, but then i took an arrow in the knee!"</a></h2>

<p>One of the reasons <a href="https://knowyourmeme.com/memes/i-took-an-arrow-in-the-knee">stock phrases</a> pop up so often in video games is that it is expensive to write and record thousands of different lines of dialogue.</p>

<p>We're <em>almost</em> at a stage where a computer can procedurally generate lines for background characters to speak, and then "record" an audio version in an array of styles. No more expensive voice actors, no more memetic references for in-group homophily. Each player of a game will have a completely different dialogue experience.</p>

<p>But the bit that we're <em>still</em> missing is the automation of emphasis and emotion and comic timing and understatement and... all the things which trained actors spend years learning how to do successfully.</p>

<p>In 2011, the film critic Roger Ebert had surgery which eliminated his voice. He proposed the following <a href="https://bits.blogs.nytimes.com/2011/03/07/roger-ebert-tests-his-vocal-cords-and-comedic-delivery/?src=me&amp;_r=0">"Ebert Test"</a> for synthetic voices:</p>

<blockquote><p>If the computer can successfully tell a joke, and do the timing and delivery, as well as <a href="https://www.youtube.com/watch?v=y-LD9Xgqf6w">Henny Youngman</a>, then that’s the voice I want.
</p></blockquote>

<p>We're <em>so</em> close, I can taste it.   The Turing Test for realistic voices is whether they can move the audience to tears with poetry.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=39605&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2021/07/synthetic-poetry/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		<enclosure url="https://shkspr.mobi/blog/wp-content/uploads/2021/07/slough.mp4" length="1586486" type="video/mp4" />

			</item>
	</channel>
</rss>
