I’m not a fan of “everything in the Cloud” dogma. I prefer local apps, because otherwise why would we need anything else than Chromebooks, if everything is hosted on Azure, AWS or Google Cloud, with JS frontends in a browser?

This doesn’t apply to the big chatbots that require such intensive resources (and I refuse to own a “decent” graphics card) that running them locally is not feasible. But I hate when everybody hosts something “in the Cloud” because they can. Because this is how you do things today.

Given that the world decided otherwise in the last ~15 years, I tried to get a grasp of Azure more than once. Each time, I thought I’ll get a stroke. The architectural complexity of their shit is abysmal! How could so many people use Azure and still mimic mental sanity?!

It reminds me of another aspect of modern software development that’s so typical for the last 20 years: you cannot just build an app and test for bugs. No, you first have to deal with missing dependencies, wrong paths, and generally a fucked-up development environment. Gone are those times when you were installing an IDE, creating a new project, writing code, compiling and linking it, and it just worked!

But Azure seems to be a meta-meta-meta shit of this kind. You need advanced training to even understand where to start from, and how to make the necessary 2,000 clicks that would eventually allow you to do something useful. This is beyond insane.

The last time I tried to explore the Azure shit, I made sure not to subscribe to anything that would make me pay. So just the free tier. But Azure is not something where you’d be able to get a quick “Hello, World!” proof of concept. It’s modern shit, so it’s fucking complex. So I gave up. C’est le parcours du combattant.

Yesterday, while I was exploring the availability of an online voice recognition and transcription service that would accept a 60-minute MP3 and crunch it at no cost, I discovered that Azure AI Speech, which is part of Azure Cognitive Services Speech, which in turn is part of Azure AI Services, offers “5 audio hours free per month” in the free tier. And this can be used in the Azure Speech Playground, “typically accessed via Azure AI Studio or Speech Studio.” Azure this, Azure that.

Speaking of Azure AI Services, I wanted to try the Azure AI Foundry, only to discover that Romanian is not a supported language:

  • gpt-4o-mini: Supported languages: en, it, af, es, de, fr, id, ru, pl, uk, el, lv, zh, ar, tr, ja, sw, cy, ko, is, bn, ur, ne, th, pa, mr, te
  • o1: Supported languages: en, it, af, es, de, fr, id, ru, pl, uk, el, lv, zh, ar, tr, ja, sw, cy, ko, is, bn, ur, ne, th, pa, mr, te
  • gpt-4.5-preview: Supported languages: en, it, af, es, de, fr, id, ru, pl, uk, el, lv, zh, ar, tr, ja, sw, cy, ko, is, bn, ur, ne, th, pa, mr, te

WTF?! All these models support input and output in many more languages when accessed in an online chatbot!

So their offering is utterly pointless and useless.

But I reactivated my “subscription” (no payment, remember?) to be able to try Azure-AI-Speech, since Real-time transcription promises “Live transcription capabilities on your own audio without writing any code.”

Wow, no code.

For fuck’s sake, it didn’t work! It failed to accept my MP3. Then, a 16-bit PCM, allegedly the standard for Azure Speech Services, failed too!

Is there anything by Microsoft that works? Oh, Copilot, right. But how are people using Azure AI Services, if they fail to work more often than not?

So I decided to try Whisper. It just worked, with minor adjustments (a local environment was needed):

python3 -m venv whisper_env
source whisper_env/bin/activate
pip install -U openai-whisper
pip install setuptools-rust
whisper input.mp3 --model medium

Now, of course, I have to correct the resulting output (generated as txt, srt, vtt, tsv, and json), because there are errors. But the result is surprisingly decent for a modest model running on a €400 laptop with Intel video!

Fuck you, Azure AI shit!