This is exciting! LingoStand now analyzes your pronunciation and provides personalized feedback on your sounds, tone, and even the emotion you convey 🤩.
Powered by Google’s cutting-edge Gemini Experimental Multimodal model, this feedback is usually (but not always) spot-on and always helpful!
The pronunciation feedback takes into account the user’s native language phonemes, giving tips to quickly be provide awareness and help the user make the sound that is giving trouble.
Example
- Student native language: Spanish (me 😅)
- Learning / Improving: English
- Sentence: “I like to run in the mornings”
- Word to improve: “run”
- Feedback in Spanish: ‘La palabra es “run”. La “r” en inglés es un sonido vibrante.
Trata de hacer vibrar la punta de tu lengua en la parte superior de tu boca, por ejemplo “rrrrrun”. Una palabra en Español con un sonido similar es “perro”.’ - Feedback translated to English: ‘The word is “run.” The “r” in English is a vibrating sound.
Try vibrating the tip of your tongue at the top of your mouth, for example “rrrrrun.” A Spanish word with a similar sound is “perro.”‘
Developer Notes
When I learned about Gemini Multimodal LLM, I always wanted to see how much value can we help deliver for people that want to improve or learn their language skills. It’s finally coming together.
First, when Gemini came out, it was not available in Europe, then it did not really had Audio support but mostly video, then I tested the ‘1.5 gemini preview’ model, and it wasn’t really working for this use case, but now, finally, the Experimental model seems to work!
It’s interesting how developing using Generative AI is looking for me. It’s becoming a path of conditional selecting Templates, and substituting strings in the templates to provide personalization and memory to the LLM model. And not I think it makes sense to use different API’s (GPT, Llama3, Groq, Gemini,…), to be able to deliver the value the student needs.
Going back to the Audio analysis and feedback, I wonder how many hidden use cases for language learners are there to be discovered? I feel this is just the tip.
If you can think of an use case, please write me a line and let’s brain storm!
Thank you for reading