AI-Enhanced Social Media Images

So much of the content in social media platforms is visual: GIFS, photographs, memes, infographics, bitmojis and emojis to name a few. Sound and music are such vital components to the enjoyment of media, but have been largely overlooked.

I joined Microsoft Research in the summer of 2016 for an internship with researchers in computer vision and accessibility. We explored novel ways of transforming social media—especially images—into rich audio experiences for the blind.

Social media content was analyzed by AI algorithms to retrieve high-level data features for sonification.

Image sonification systems typically use low-level data such as RGB-values as the basis for their audio-mappings. However, using Microsoft Cognitive Services (AI), I was able to extract high-level features such as people, sentiment, emotions, environments and objects. This made it easy to craft soundscapes that actually sounded like the images.

Short sounds play right after the username and longer sounds play in the background of the post text.

I ultimately settled on a design that used speech, sonification, auditory icons, soundscapes and music. I had to be smart about the way I arranged them so that people could hear everything without adding too much extra time to the browsing experience. I hope that these types of technologies start finding their way into apps soon. So much of the visual content on the web is still unaccessible to the blind.

A sequence of five social media posts contrast text-to-speech with the AI-driven audio additions.

Publications

Leave a Reply