BLOG 3

You Never Know When Coding Comes in Handy...

by Lyuying Guo

Published Jul 29, 2024

My mom writes newspaper articles based on recorded oral interviews occasionally. Today, she gave me an hour-long audio file and asked me to transcribe it into text, since her transcription software wasn't working properly.


I’ve done manual transcriptions for my mom in the past, before machine transcription tools were popularized. Sitting there for hours, playing the audio clips back and forth, and typing for hours on end was painful.


I researched some transcription tools online such as as Notta and Maestro. They perform well, generating texts fast and are able to tell different speakers apart. However, the free version only shows the transcription of the first 3 minutes. The payment plans didn’t seem worth it for a one-time use.


So I turned to ChatGPT, although I’ve been avoiding it in the past few years due to my reserved thoughts about the public’s hype over the chatbot. Turns out, this smartass personal assistant isn’t that smart yet:




Whelp, I guess you either pay by money or effort in this economy.


I was ready to do some hands-on work with open-source technologies. I found out about Whisper – OpenAI’s machine learning model for speech recognition and transcription. By following a YouTube tutorial, I ran the model through Google Colab (It allows you to use Jupyter Notebook on Google Drive without having to install anything on your local drive. Very convenient!). After splitting the original audio into two parts on my MacBook's Voice Memos, I managed to transcribe the whole hour of audio into text with a few lines of command. I cannot describe my joy when I saw transcribed text popping up line after line as I hit the “run” button, preceded by two hours of laborious research on alternative tools.





Later, I went into the machine-generated text file for an hour of manual editing to delete filler words, unnecessary line breaks, and reorganized the paragraphs to make the transcription more readable. (Althought you don’t really need this step since the machine transcription is already quite accurate. I just did it to make my mom’s work easier.)


Without my previous venture in self-taught coding, I probably wouldn’t be able to use technology to save time in my mom’s editorial job today. She gladly praised my work.


Man, I love paying by time and effort, not money. :P