{"id":237,"date":"2018-11-02T22:17:38","date_gmt":"2018-11-02T22:17:38","guid":{"rendered":"https:\/\/mikewinters.io\/?p=237"},"modified":"2019-10-04T19:24:22","modified_gmt":"2019-10-04T19:24:22","slug":"ai-driven-audio-in-social-media","status":"publish","type":"post","link":"https:\/\/mikewinters.io\/?p=237","title":{"rendered":"AI-Enhanced Social Media Images"},"content":{"rendered":"\n<p>So much of the content in social media platforms is visual: GIFS, photographs, memes, infographics, bitmojis and emojis to name a few. Sound and music are such vital components to the enjoyment of media, but have been largely overlooked.<\/p>\n\n\n\n<p>I joined Microsoft Research in the summer of 2016 for an internship with researchers in computer vision and accessibility. We explored novel ways of transforming social media&#8212;especially images&#8212;into rich audio experiences for the blind.<\/p>\n\n\n\n<figure class=\"wp-block-image alignwide\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"586\" src=\"https:\/\/mikewinters.io\/wp-content\/uploads\/2018\/11\/FinalMapping-e1570210581941-1024x586.jpg\" alt=\"\" class=\"wp-image-369\" srcset=\"https:\/\/mikewinters.io\/wp-content\/uploads\/2018\/11\/FinalMapping-e1570210581941-1024x586.jpg 1024w, https:\/\/mikewinters.io\/wp-content\/uploads\/2018\/11\/FinalMapping-e1570210581941-300x172.jpg 300w, https:\/\/mikewinters.io\/wp-content\/uploads\/2018\/11\/FinalMapping-e1570210581941-768x439.jpg 768w, https:\/\/mikewinters.io\/wp-content\/uploads\/2018\/11\/FinalMapping-e1570210581941-1870x1070.jpg 1870w, https:\/\/mikewinters.io\/wp-content\/uploads\/2018\/11\/FinalMapping-e1570210581941-400x229.jpg 400w, https:\/\/mikewinters.io\/wp-content\/uploads\/2018\/11\/FinalMapping-e1570210581941-800x458.jpg 800w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Social media content was analyzed by AI algorithms to retrieve high-level data features for sonification.<\/figcaption><\/figure>\n\n\n\n<p>Image sonification systems typically use low-level data such as RGB-values as the basis for their audio-mappings. However, using Microsoft Cognitive Services (AI), I was able to extract high-level features such as people, sentiment, emotions, environments and objects. This made it easy to craft soundscapes that actually sounded like the images.<\/p>\n\n\n\n<figure class=\"wp-block-image alignwide\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"337\" src=\"https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/TemporalEvolution_Cropped-1024x337.jpg\" alt=\"\" class=\"wp-image-367\" srcset=\"https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/TemporalEvolution_Cropped-1024x337.jpg 1024w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/TemporalEvolution_Cropped-300x99.jpg 300w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/TemporalEvolution_Cropped-768x253.jpg 768w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/TemporalEvolution_Cropped-1870x615.jpg 1870w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/TemporalEvolution_Cropped-400x132.jpg 400w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/TemporalEvolution_Cropped-800x263.jpg 800w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Short sounds play right after the username and longer sounds play in the background of the post text.<\/figcaption><\/figure>\n\n\n\n<p>I ultimately settled on a design that used speech, sonification, auditory icons, soundscapes and music. I had to be smart about the way I arranged them so that people could hear everything without adding too much extra time to the browsing experience.  I hope that these types of technologies start finding their way into apps soon. So much of the visual content on the web is still unaccessible to the blind. <\/p>\n\n\n\n<figure class=\"wp-block-embed-youtube alignwide wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\nhttps:\/\/www.youtube.com\/watch?v=1W2uQYfvb1I&#038;feature=youtu.be\n<\/div><figcaption>A sequence of five social media posts contrast text-to-speech with the AI-driven audio additions.<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-gallery columns-2 wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\"><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"220\" src=\"https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/58480fd7cef1014c0b5e4943-1024x220.png\" alt=\"\" data-id=\"388\" data-link=\"https:\/\/mikewinters.io\/?attachment_id=388\" class=\"wp-image-388\" srcset=\"https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/58480fd7cef1014c0b5e4943-1024x220.png 1024w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/58480fd7cef1014c0b5e4943-300x64.png 300w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/58480fd7cef1014c0b5e4943-768x165.png 768w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/58480fd7cef1014c0b5e4943-1870x402.png 1870w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/58480fd7cef1014c0b5e4943-400x86.png 400w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/58480fd7cef1014c0b5e4943-800x172.png 800w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"286\" src=\"https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/microsoft_research_logo-1-1024x286.jpg\" alt=\"\" data-id=\"390\" data-link=\"https:\/\/mikewinters.io\/?attachment_id=390\" class=\"wp-image-390\" srcset=\"https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/microsoft_research_logo-1-1024x286.jpg 1024w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/microsoft_research_logo-1-300x84.jpg 300w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/microsoft_research_logo-1-768x214.jpg 768w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/microsoft_research_logo-1-400x112.jpg 400w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/microsoft_research_logo-1-800x223.jpg 800w, https:\/\/mikewinters.io\/wp-content\/uploads\/2019\/10\/microsoft_research_logo-1.jpg 1147w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Publications<\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Winters, R. M., Joshi, N., Cutrell, E., &amp; Morris, M. R. (2019). <a href=\"https:\/\/journals.sagepub.com\/doi\/full\/10.1177\/1064804618788098\">Strategies for Auditory Display of Social Media<\/a>. <em>Ergonomics in Design<\/em>, <em>27<\/em>(1), 11\u201315.<\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>So much of the content in social media platforms is visual: GIFS, photographs, memes, infographics, bitmojis and emojis to name a few. Sound and music are such vital components to the enjoyment of media, but have been largely overlooked. I joined Microsoft Research in the summer of 2016 for an internship with researchers in computer [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":385,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[23,35,6,9,5,21,11,2,20,10],"tags":[],"class_list":["post-237","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-accessibility","category-artificial-intelligence","category-coding","category-collaboration","category-design","category-industry","category-publication","category-sonification","category-teams","category-web","has-thumbnail"],"_links":{"self":[{"href":"https:\/\/mikewinters.io\/index.php?rest_route=\/wp\/v2\/posts\/237","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mikewinters.io\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mikewinters.io\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mikewinters.io\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mikewinters.io\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=237"}],"version-history":[{"count":18,"href":"https:\/\/mikewinters.io\/index.php?rest_route=\/wp\/v2\/posts\/237\/revisions"}],"predecessor-version":[{"id":391,"href":"https:\/\/mikewinters.io\/index.php?rest_route=\/wp\/v2\/posts\/237\/revisions\/391"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mikewinters.io\/index.php?rest_route=\/wp\/v2\/media\/385"}],"wp:attachment":[{"href":"https:\/\/mikewinters.io\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=237"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mikewinters.io\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=237"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mikewinters.io\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=237"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}