Did you know that the time it takes for sound to travel from Microphone to Earphone on Android devices could be 10 times slower than the iOS? So what, you may think… As a matter of fact, it doesn’t make much of a difference to a regular user who might never pay attention to a latency during a phone call or while listening to music. But for the ones that try to sing, especially the 40m users of US musical apps company Smule, a latency of 10-20 milliseconds is a huge thing.
For a company that has built a business and a huge userbase around the simple idea to make people from anywhere sing together in a mobile app, and oftentimes even sing together with someone from the other edge of the world, this is a crucial question. “Same user experience for all of the users is probably one of the biggest challenges we need to overcome on a daily basis,” tells us Dobri Dobrev, the VP of Engineering in Smule’s office in Bulgaria.
In the past twelve months, ever since the 11 years old Silicon Valley scaleup launched operations in Sofia, it is ensuring that users, no matter how strong or weak their internet connection is, and no matter the smartphone, experience the same quality of sound. “Ever since Ocarina (which was Smule’s first application ten years ago) launched, we’ve been specializing in high-quality sound processing. Ocarina was about allowing users to create their own ‘music’ by blowing into their iOS device microphone and then creating different tones by holding fingers. Now as the portfolio of the company and the userbase grew, we are no longer limited to iOS. And trying to offer the same user experience on iPhones and hundreds of different Android devices comes with a whole new level of complexity,” explains Dobrev.
Hello, can you hear me?…
A user would need to download an app and press a couple of buttons to sing Ed Shareen’s songs, or pay to sing with Jessie J on a split-screen. A team of over 200 people and a smooth working system and infrastructure is needed to make this happen for the user.
Behind the whole fun millions of people are having with Smule’s apps, there are more than 60 patents, 4k servers around the globe that process some 20 TB of new songs that are performed each day and store roughly 24PB of singing performances. Some of the patents are for technologies allowing people singing together with no latency regardless of their internet quality, others for mixing vocals of geographically distributed singers, or even for automatic conversion of speech into rap song (the company also runs a rapping app – AutoRap).
”To make these things possible, the software needs to recognize the rhythm and the tonality of songs, modulate human speech into other rhythms and tonalities and be able to produce a new product – a song video with some effects, in milliseconds. The application takes inputs from the camera, the mic, adds effects and synchronizes all this, delivers it to the user in real-time and also stores it on our servers,” explains Dobrev how their cross-platform technology works.
Making the magic happen
“There are a lot of complex factors that need to fall into place to make it possible,” says Dobrev. On the one hand, to make all these apps run the same way on high-end smartphones, and on lower-class ones, his team needs to do a lot of smart architecture designs and optimizations, plus some low-level code, e.g. device drivers level. “We have different configurations for different devices. Thus we extract the best of the hardware and the software of a particular device.”
Being focused mostly on the real-time experience and community elements, for Smule it’s important to allow its users to feel like they are in the same room. So they need to optimize resources all the time so that the speed is the same for all users. “The smaller the footprint of the application, the faster it loads. We make sure we don’t have any unnecessary libraries within the app,” he explains further. Smule’s engineering also works a lot on network level, and optimizes for slow or high latency ones. The system often needs to guess what the next action of the user would be, so it loads it and prepares it on time, especially in the cases when the internet connection is not as good.
One of Dobri Dobrev’s main tasks in Bulgaria is to make sure this all works, as promised, for the AutoRap users – the app that turns regular speech into rap. After the singing app has proven its potential, Smule’s team has decided to further develop the portfolio, and grow this particular product.
“The next step for us is to make this participation more accessible, even faster and scalable,” explains the VP of engineering. To try to replicate the Sing success he will also have to expand his team in the next months. “Now imagine if the next rap app with 40m users comes from Sofia, he laughs.
If you’d like to find out more about Smule you could also check: