Evolving Naama: DAIN’s Explainable Emotion Recognition API

DAIN Studios will have a booth at Slush this year, where you can come and try Naama, the demo of our explainable emotion recognition API using computer vision. The name Naama is the Finnish word for face. As our algorithm interprets the facial emotional expression, we thought the Finnish word for face “Naama” describes it quite well.

The development of our demo has been a rather cohesive evolution starting with several of our data scientists blogging and speaking at events about a diverse range of issues, such as Explainable Artificial Intelligence (xAI), DIY Computer Vision AI and Machine Learning. 

I sat down with our in-house Naama project team of Data Scientists and Data Engineers –  Heeren, Juho, Thomas and Pekka, and asked a few questions everyone has been keen to know in the lead up to presenting a demo of Naama at Slush.

Thomas : If we think back to 2-3 years ago, what has changed in Computer Vision AI between that time and today (the main changes and developments)?

Over the past few years there have been developments across several areas that have enhanced our ability to use and apply Computer Vision AI. The most crucial change has been the explosive interest in libraries, which makes the development of Computer Vision AI simpler and faster – more people can develop Computer Vision AI, with basic coding skills and access to high-level APIs such as Keras, and free online courses and software library from fast.ai

Several other changes and developments are driving the progress of Computer Vision AI including most of the training and building of models are done mostly in the cloud. These models are gaining in complexity but are also supported by advancements in computational capabilities. Looking into the future, companies like Google and Amazon are proposing and developing very accessible Computer Vision development applications, with almost no code needed, so these types of developments are going to increase the accessibility and use cases of Computer Vision AI.

Questions: What is your personal favorite example / use of Computer Vision AI?

Heeren: My personal favourite use case of computer vision AI is in medical diagnoses. As data is becoming more and more accessible, henceforth, algorithms are getting better in not only diagnosing but also preventing certain diseases by suggesting some early signs e.g. in skin related diseases, etc. On another side, I also liked the use of the computer vision AI algorithms in the automotive sector. Either its autonomous driving or detecting certain emotions of drivers, computer vision algorithms are paving the path for an exciting future in this sector.

Pekka: DAIN Studios has a demo planned for SLUSH that uses Computer Vision AI, what can the demo do? And why should we check it out?

Essentially it combines face detection, face recognition and emotion classifiers. In addition, there is an Explainable AI layer in the demo. With Graphics Processing Unit also known as GPU, all of these run in near real-time, so it’s fun to try with several people in the camera at the same time.

Heeren: There was a DAIN team, with members from each of our three studios that worked together on this demo – who did what for the demo and what type of skills are needed to make a demo like this?

It was very diverse mix of skills. Being an emotion recognition algorithm, it was evident that most of the research and thought process went towards data science. Juho and Thomas were primarily responsible for developing the data science and integrating the trained model. As mostly computer vision AI algorithms are efficiently run over GPU power, Pekka’s extensive experience in image analysis came in handy. Pekka being himself a veteran in Computer Vision related use cases, his main contribution came in selecting underlying GPU hardware, technology and model optimisation. In AI projects, one aspect which is often ignored in the beginning is deploying the application in a robust and scalable process. That’s where my skills in data engineering came into action. I helped the team in exposing this computer vision application through a scalable web API and containerising the overall project code. Moreover, this all can go easily out of focus (engineers are easy to get carried away when solving an interesting challenge like this ;-)), hence the role of Leena as the product manager was crucial, and if I may say, foundational to keep the efforts from all team members focused, and goal oriented.”

Juho: We often hear the term that if you want state-of-the-art AI, you need great data. What data is used in this demo?

It is an interesting question, because in addition to each camera frame to be analyzed, the only data we provided in this case were one photo per DAINian directly from our website, although there is naturally lots of data behind the pre-trained models used in the demo. For Naama, we used code from open-source repositories as the basis for face recognition and prediction of other user attributes, namely gender, age, and facial expression. Particularly in the field of deep learning image analytics, pretrained models are publicly available for various purposes, and their training data and methods are not always documented properly. For a more serious application this could be a problem, but for the purposes of this demo we completely disregarded algorithmic bias, for example.

We use this to raise awareness about the state of where computer vision AI currently is, and the potential ethical challenges that can arise from such technologies. For instance, one facial photo uploaded to a shady mobile app for virtual make-up application may allow all parties able to access the photo to recognize the user forever with very little effort.

 

In the lead up to Slush, follow us on LinkedIn to check out our other articles  to learn more about the topics mentioned in this post. Next we will have one of our founders and an AI ethics experts, Saara Hyvönen, discuss the ethics of AI and what it means in the context of facial recognition.