Better learning through spoken words while reading subtitles

View this thread on: d.buzz | hive.blog | peakd.com | ecency.com
·@nutela·
0.000 HBD
Better learning through spoken words while reading subtitles
# Making #BeyondBitcoin more accessible for newbie's
And facilitate accelerated learning.

This one is for you @officialfuzzy. I consider this to be my contribution to the Whaletank which aired yesterday, Friday 18th August 2017. 

https://steemitimages.com/DQmTkMokSS17Wo17Yj36ZmwpfzCdqmqZEbcjBgsn4eTY3dj/U5dr9xK4wQ3BMFe4MubJguoJvhs1sYy_1680x8400.jpeg

# Introduction
I said to @officialfuzzy that I found it hard to keep up as someone who is quite new here. And I consider myself to be quite a *pro* having developed for iOS, I know my `bash`-fu and the crazy regular experessions. I've done graphics, User eXperience which I consider myself maybe best at. I compose and produced music. I've seen weird systems on my job for Air Traffic Control software, I loved BeOS, but I used Windows and Linux for work. Used fantastic window managers while having hefty debates. Focus Follow Mouse! I've seen a lot! But since 2009 I'm on Mac OS and there I got my most fun / work done. 

So having seen so many systems while I couldn't quite figure out the specifics of the #BeyondBitcoin show I figured it could be a good thing to transcribe what has been said and possibly make a summary of the talks.
 
>Yeah but the Whaletank turned out to be almost the *whole* day f***!

# Cue the automation, YouTube vs. Watson vs. Google Docs
I've been experimenting with some **Speech to Text tools** today. Now I'm using IBM's Watson. Yes this has been spoken in. And I've also used Google Documents Voice input. Google doesn't create nice sententences however as Watson does.  You can try it live in Chrome. 

### Watson
![WatsonText.png](https://steemitimages.com/DQmNQC1sAYaL483cePGPX6A8Huasx6Tx1Z8AFWVopKjLJGh/WatsonText.png)


>In the backend, YouTube’s sound captioning system is based on a Deep Neural Network model the team trained on a set of weakly labeled data. Whenever a new video is now uploaded to YouTube, the new system runs and tries to identify these sounds. For those of you who want to know more about how the team achieved this (and how it used a modified Viterbi algorithm), Google’s own blog post provides more details.

--- Techcrunch

>Memo to Google and YouTube — Don’t rest on your laurels just yet
>* only has a single speaker, who speaks clearly and in a slow, consistent, monotonic and deliberate manner
>* has good quality audio and it doesn’t have any background noise or sound effects and / or music
>...
Most YouTube videos do not have these elements.
Rather they tend to have multiple speakers, rapid fire and often unintelligible dialogue, sound effects and background music and the list goes on and on.
In these circumstances, Google and YouTube’s auto-generated craptions will simply not be able to get anywhere near the magic 95% level that we have demonstrated above.

--- Medium

# YouTube - editing while listening
But the best and easiest by far, is maybe YouTube's _automatic captioning system_. Quite easy to upload the audio from the Whaletank which is hosted on *SoundCloud* but you can also make your own recording on *Mumble*. Mumble even allows you to record each speaker individually I guess to match up *volume* levels. This is called multitrack or multichannel in *mumble* speak.

![MumbleMultitrack.png](https://steemitimages.com/DQmQFghVTaTeZoPZNkWDK9m8xqKk6tGfrdTA343BJgzbCbD/MumbleMultitrack.png)

YouTube's captions are also very easy to edit and the interface is really well designed. The subtitles / captions can be edited *while* the video is playing back. But make sure you are listening not reading! Sometimes the visual system takes precedence over the aural. Then you think you are hearing what you are actually reading. Not good! So yeah it's nice that you can type and *play* along. And it gave a meaningful piece of text out of it.

![DontPauseWhileEditingCaptions.png](https://steemitimages.com/DQmQ4wP2sumuixefPdufu1QaHxh4CWN9yKeTc7t2TSn57A9/DontPauseWhileEditingCaptions.png)

# The weird way to the finish-line
Before I settled on YouTube I tried Google Docs Voice input. I wanted to test the speech to text quality. But for that I needed to get fuzzy's *lovely* voice to be heared by Google. This wasn't a real success because playback over speakers and into the mic is not handy. Plus any noise will be transcribed too.

>UPDATE: It seems Google now requires you to be using the Chrome browser (on PC or Mac) in order to enable Google Now. If you're having any issues, make sure you're using Chrome! 

# Routing audio and much cursing
The best thing you can do if you have a *pre-recorded* text is to *re-route* the audio so that Google can hear it. This means not going through your speakers and back into your microphone. Speech recognition will be *much* better that way. 

But getting software to work can be *painful*. I ended up cracking Loopback because SoundFlower was buggy and I could't get a new version to compile. Also Xcode takes up *so* much diskspace... So worried I might infect my machine by cracks and keygens I try to find alternatives to SandBoxie which I highly recommend if doing such experiments on Windows.

##### Notice the *external* microphone from my in-ear headphones
![MacOSSoundPreferences.png](https://steemitimages.com/DQmNjriDJfKKhmi1uELVXR2u6tDiWQkw2AxxNxegY4rTREU/MacOSSoundPreferences.png)

##### Loopback to reroute pre-recorded audio to a *virtual* microphone input
![Loopback.png](https://steemitimages.com/DQmYU7P2Yc8334XNhNYrninVxZVCXZ27HFtsuHFAzggHz8D/Loopback.png)

[Virtual Audio Cable (windows)](http://software.muzychenko.net/eng/vac.htm)
[Video](https://www.youtube.com/watch?v=ioZe0xVWTNE)
[Alternatives for Mac](http://alternativeto.net/software/virtual-audio-cable/?platform=mac)

A post on Quora was also very helpful to [list some alternatives for speech recognition](https://www.quora.com/Are-there-any-software-that-can-generate-subtitles-based-on-speech-from-the-video-file-Then-convert-that-subtitle-in-english)

# Uploading YouTube nightmares - more cursing!
Uploading audio to YouTube can be a nightmare. It's not the right format (ogg) or files are too big (I'm looking at you tunesTube 25mb limit!). Expensive software packages.

### FFMPEG to the rescue
Having downloaded quite a few video from YouTube or other video sites with the excelllent `youtube-dl` commandline application I wondered if it could convert video and audio for YouTube upload. It could!

`ffmpeg -loop 1 -r 2 -i image.jpg -i input.mp3 -vf scale=-1:380 -c:v libx264 -preset slow -tune stillimage -crf 18 -c:a copy -shortest -pix_fmt yuv420p -threads 0 output.mkv`

As you can see command-line options can be pretty incomprehensible and therefore I Google them rather than looking into manuals or `grep`ping help outputs. So finally I found the best command-line option which takes an image (replace `image.jpg` with the name / *path* to your image. Idem for the audio file `input.mp3` And produces `output.mkv` which you can upload *directly* to YouTube.

Credit: http://www.makeuseof.com/tag/3-ways-add-audio-podcast-youtube/

# Different *use cases*
Watson and Google transcribe speech to text in *real time* so it could be published in near realtime. I'm not sure if *YouTube Live* can transcribe in real time too.

But imagine if we would be hosting the #BeyondBitcoin Whaletank and you could catch up or *read* what is being said. Even if their would be a slight lag you could jump into the converstation with Mumble and ask your questions! How *cool* is that?

[The IBM Watson Speech to Text service uses speech recognition capabilities to convert Arabic, English, Spanish, French, Brazilian Portuguese, Japanese, and Mandarin speech into text.](https://speech-to-text-demo.mybluemix.net/)

Even better is that you can fork their work on GitHub!

![WatsonWordAltsmaller.png](https://steemitimages.com/DQmUXiGXXfAKanyvKbZaSNajerQCQAuaMtUdyHy7NtYxJJe/WatsonWordAltsmaller.png)

# Things I didn't try
>A paid service, as you might expect PopUp Archive has some distinct advantages over both the YouTube and the Watson-plus-Amara workflows. PopUp Archive will store audio files, can ingest multiple files at once from a variety of sources, supports multiple users on an account, includes robust transcript editing tools, differentiates speakers, adds punctuation, has very precise timestamps, and more. In testing the options for Tuts+ videos, we've found that PopUp Archives transcriptions are more accurate than competing services. In short, you get what you pay for. 

I haven't bothered to try those out but given sufficient funds I might. This might be great for making *Coursera* type courses where you *watch* a video, and simultaneously read along with the currently spoken text *highlighted* in transcription! Talking *beyond* cool-bits!

Thanks to [Photography Tutsplus](https://photography.tutsplus.com/tutorials/3-ways-to-subtitle-and-caption-your-videos-automatically-using-artificial-intelligence--cms-26834)

<hr>

https://steemitimages.com/0x0/https://steemitimages.com/0x0/https://media.giphy.com/media/13CJfenolX5Mk/giphy.gif


I've uploaded the *whole* Whaletank and I'll be posting it later to @Dtube or others maybe after editing the automatic subtitles! Let's create something great!

@Nutela

<hr>

More sources:

* There is also http://www.diycaptions.com.

* [A super great post about YouTube accuracy](https://medium.com/@mlockrey/youtube-s-incredible-95-accuracy-rate-on-auto-generated-captions-b059924765d5)


* [download audio transcriptions from youtube with browser hack hehe](https://www.labnol.org/internet/transcribe-video-to-text/28914/)

* https://techcrunch.com/2017/03/23/youtubes-automatic-captioning-system-can-now-describe-sound-effects/
👍 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,