An app could soon give blind people a new tool to explore the world
Anirudh Koul’s grandfather was slowly losing his ability to see. By 2014, he was having a hard time recognizing Koul’s face in their weekly Skype calls bridging the vast distance between the Silicon Valley, where Koul is a data scientist at Microsoft, and the elderly man’s home in New Delhi.
So Koul started reading up on the challenges of vision loss and thinking about how the recent advances in deep learning, a potential-packed area of machine learning, could help give people a new way to recognize what’s around them without actually seeing it.
That was the modest beginning of Seeing AI. Two years later, Microsoft CEO Satya Nadella introduced the budding technology to thundering applause at this year’s Build conference. It grabbed headlines, inspired people around the world, was showcased at a White House event and prompted many in the tech industry to contact Koul and his colleagues with congratulations and offers of assistance.
“We were beyond amazed,” Koul says. “Never in our wildest dreams did we think the response would be so huge. It was a deeply humbling experience.”
Koul has the sort of job that many in the tech world dream of: About 80 percent of his work is on projects or ideas that he’s come up with himself and gotten his managers’ support to pursue. He says Microsoft has the cutting-edge tools, vast data sets, talented people and an “innovation pipeline” that gives data scientists the opportunity to turn their ideas into real products and features.
“I could not have asked for a better career path for myself,” Koul says. “If you’re passionate, driven and have a thirst to innovate, there’s no better place for you than Microsoft.”
Seeing AI is one of a long list of revolutionary technologies being created with machine learning at Microsoft. Still in development, the app can be used on a smartphone or with smart glasses from Pivothead. It uses computer vision, speech recognition and natural language processing to help describe a person’s surroundings, read text, answer questions and even identify emotions on people’s faces.
With a quick snapshot of a scene, the app may tell the user it sees a dog playing with a Frisbee, a crowd waiting at an airport baggage carousel or a young man who is smiling. It can help those who are blind or have low vision read a menu, determine how people are reacting to what they’re saying and tackle other routine tasks without having to rely on others for help.
“That was a really defining moment for me, where I felt that what I’m making is actually making a difference”
Irene Chen got the chance to work on Seeing AI as part of the Garage Internship Program, in which interns work toward building a new product in just four months. The University of British Columbia student initially just wanted to learn about computer vision but found something much more transformative in working on Seeing AI. She remembers the first time she saw a blind person use it.
“She struggled a little bit at first, but then it read what she wanted it to read perfectly, and her eyes lit up. Her whole face was glowing,” Chen recalls. “That was a really defining moment for me, where I felt that what I’m making is actually going to make a difference … It wasn’t something that I’ve ever felt before.”
Chen, a software developer who worked on backend development — “where all the algorithms and magic happen,” she says — and deployment, says the project involved collaborating with people in Microsoft offices in Japan, Serbia, Cairo, London, Washington and California, where the team found the different kinds of expertise they needed.
“Whenever we emailed teams all over the world and asked them to assist us, they were all very open to chatting with us and helping us, even though they’re not directly involved with our project,” she says. “I thought it was just incredible.”
Koul, who admits he’s sometimes star-struck by some of the people at Microsoft who have accomplished amazing things, agrees that having that open-ended resource is invaluable.
“You can reach out in an internal group list and say, ‘Hey, I have this problem that I’m coming across.’ In maybe one or two hours, you have responses from, like, five people from around the world who are experts in that particular area,” he says. “It’s like instant problem-solving.”
Now Koul taps regularly into the nearly limitless tools and talent of the company to follow his interests, but there was a time when he wouldn’t have believed he’d find his ideal career at Microsoft. The tech industry, sure — but working at a big company? That wasn’t in his plan at all.
He became interested in technology as a kid growing up in India. His parents got an MS-DOS computer with a black and white screen, which they had little interest in, but Koul began tinkering with it and was writing his own programs by the time he was in high school.
He earned a computer science degree from Dalhousie University in Canada, worked for four years as a research engineer at Yahoo and went back to school at Carnegie Mellon University for his master’s in computational data science, specializing in machine learning and natural language processing.
There, he was the entrepreneurship advisor, guiding fellow students to build prototypes for their ideas quickly and generally speed up their innovation processes. At the same time, he churned out his own projects for various hackathons, racking up hacking experience and awards.
Back then, he was the opposite of the kind of person who worked at Microsoft — or so he thought. “I was completely into open source, Linux and the fast-working mindset of startups,” he says. “I had this image of Microsoft being this mammoth company where things moved sluggishly.”
But when a Microsoft recruiter contacted him, he decided he’d check out the company. He says he went for an interview and could immediately see “the growing startup culture” — and realized his previous impressions were completely wrong.
“The people I met here had this hacker spirit in them,” he says. “And they also had this amazing amount of data that would be hard to get in any other company.”
He remembers how on his first day as a data scientist at Microsoft, he was playing around with some data and ran a program with it that took about two hours to finish, using a cluster spanning thousands of computers. On his home computer, he figures, it would have taken about seven months.
“They have fine-tuned the machinery and made tools available so that it’s very fast for a data scientist to iterate and get answers from data,” he says.
He started Seeing AI as a project for last year’s //oneweek hackathon. He shared his idea with colleagues who had worked with organizations for blind people, those who had worked on accessibility technology, developers, researchers, designers and others. His idea seemed a bit overly ambitious to many, he says, but he built a team of 16 people from Microsoft offices in California, Washington state and London who wanted to help.
“With advancements in computer vision and deep learning, I knew we could build something more useful than what currently exists” he says.
Saqib Shaikh, who works at Microsoft in London, had created a primitive version of the same idea at the previous year’s hackathon. The software engineer, who’s been blind since age 7, was searching the 2015 hackathon submissions for technology to assist blind people and came across Koul’s project.
Shaikh immediately contacted Koul to find out how he could help, and the two “ended up talking for hours about different ideas and technologies,” Shaikh recalls.
Shaikh participated from London as Koul and others got down to work at the three-day hackathon in the Silicon Valley and Redmond, Washington. Margaret Mitchell, a Microsoft researcher who specializes in vision-to-language and assistive technology, provided the image captioning capabilities that were instrumental to bringing the project to life.
Competing against 13,000 employees, they ultimately won the global event’s Tech for Good category with their project, which was called “Deep Vision” at the time.
“That was amazing,” Shaikh says. “We’d all decided that this was something worth doing, and that we’d like to push forward regardless … To win was just like the icing on the cake.”
Koul says he was incredibly lucky to be a part of such an amazing team, calling it an “honor and privilege to work alongside such dedicated people who care deeply about accessibility.”
Shaikh says he’s started using the app, especially when he wants to read something, and expects to find more ways to integrate it into his daily habits as it becomes more polished and refined. He’s eager to help make it so that others will be able to do the same.
“I think the impact could be huge. In the short term, this is some really interesting new tech that we could bring to the world to help people in novel ways,” he says. “Going forward, I think the future is going to be incredible in terms of what these different AI machine learning algorithms can do.”
Since the hackathon, the global team continued improving the project, often in their spare time. Shaikh’s managers at Bing recognized his passion for the technology and gave him two months away from his regular project to build the foundations for the current Seeing AI app.
Mitchell hired Koul to be a full-time driver on the project. Koul’s first move was to pitch Seeing AI as a project for the Garage interns to work on – and they accepted. Koul, Mitchell, and Shaikh all credit the Garage interns for making the Seeing AI app possible.
Microsoft officially created a full-time Seeing AI team just last month that is working toward bringing the technology to the market.
Mitchell says the project is a powerful example of what she wanted to do when she came to work at Microsoft: develop core research and follow it through all the way to the consumers.
“Microsoft provided the opportunity to work directly with visually impaired communities, have access to massive computing resources and the opportunity to showcase cool and important work to the rest of the world,” she says. “There is no other place that makes such amazing connections, freedom and resources possible.”
Koul says the project probably wouldn’t have gotten off the ground anywhere else; Microsoft leadership’s support is what helped “this dream come to fruition.”
“I have seen so many accessibility-related projects here that would not have seen the light of day if it was not at Microsoft,” he says. “Accessibility is built deep into our culture, and it encourages people to think of how to make things that are inclusive of our whole society.”
In his nearly five years at Microsoft, Koul has helped create various hacks and picked up numerous awards for them. He says he enjoys being given so much independence in his daily work and having so many avenues to present his ideas. He also credits Microsoft’s StayFit program, one of the company’s many employee benefits, for helping him lose 84 pounds.
As for Seeing AI, he says, “This is just the beginning. We have a long road ahead of us, with many more features in the works that we’re excited to showcase in the future.”
His grandfather passed away in late April, the same week Microsoft created the fully funded Seeing AI team. The 85-year-old man knew a little bit about it and “was impressed to hear about me working toward scenarios for blind users,” Koul says.
Koul is grateful for the opportunities he’s had to turn big-data ideas into new products and features. After all, he says, “What’s the fun of working on something if you don’t see it come to life?”
– Written by Tracy Ith