Applied Machine Learning: How We Built a Face Recognition System for the Arcanys Kitchen

A few weeks ago, our passion for process optimization made us call on machine learning to enhance the internal food serving system in our canteen—humbly called the Arcanys Kitchen— with a custom-built face recognition system for employees claiming their meals. Exciting, huh? Let me rewind a tad.

Early 2018, Arcanys launched its own in-house cafeteria. Now, offering free meals every day is not only a nice addition to our company benefits, but it also allows our crew to enjoy legit breaks without the hassle of going around for lunch every day (and, you know, pay for it). Things were dead easy: people order their meal using our internal app (called Glutton) the day before. The next day, they give their name at the counter and the kitchen staff would be able to track their order on the app. Easy? Yes. Working? Yes, too. But was it the smoothest, most effective process? Not exactly. For the kitchen staff, checking 200 names one by one was a tedious and error-prone method. And as process optimization freaks, we had to do something to get rid of this little glitch and improve the whole Arcanys Kitchen experience.

Sneak peek on the Arcanys Kitchen

Optimizing Arcanys Kitchen flow with a Face Recognition tool

The solution concept was simple: to build a facial recognition (FR) system that would identify the kitchen user and track his order in a flash:

Step 1. The user (an Arcanys employee) places an order via Glutton, Arcanys’ internal meal ordering app, the day before.
Step 2. The user goes to the handout window and faces a camera.
Step 3. The camera scans the user's face, automatically sends real-time video frames to the FR system, and the system immediately pulls the corresponding order from Glutton and displays it on the kitchen monitor.

Ideally, the entire process is accomplished a good 30 seconds shorter than the previous process since you remove the asking and checking of the name, and confirming of the employee’s matching order after that, from the equation.

But how do you exactly set up a program that automatically puts names to faces and matches those with their food orders from Glutton? Yep, what we needed was a facial recognition software integration, plain and simple. And the FR system would have the following components:

A webcam - which captures real-time images of users claiming their lunch. We don’t want to rely on some complicated hardware that might need some special software: a basic webcam should do the trick.
An interface - which displays the orders of the users to the Kitchen staff.
A Facial Recognition application - which analyzes and identifies who the person is in front of the webcam.
An integration with Glutton - so the system automatically matches the identified user with their order and the kitchen staff immediately knows what to serve.

That's it for the theory. Now, how to make the magic happen? Essentially, we had two options. Either we turn to one of the readily available machine learning tools in platforms like AWS, or we develop our own facial recognition system ourselves.

Off-the-shelf vs. Custom Face Recognition system

Here at Arcanys, we work with a lot of AWS tools. So one of the obvious choices for us would be to use the Machine Learning as a Service (MLaaS) solutions that Amazon provides. After all, their Machine Learning tools are easily integrated when you have the correct APIs. No need to get your hands dirty building your own machine learning tools, or risk the high investments that are usually involved in developing them. As a matter of fact, AWS has released in 2017 a facial recognition tool called Amazon Rekognition. This service enables developers to not only have an app that can easily analyze images but also allow them to create a database in the cloud which they can store collections of reference images that they want the tool to recognize. And you could keep millions of reference images in Amazon S3.

However, even though Amazon Rekognition is a great facial recognition tool, it has one major downside—analyzing a single image can quickly take up to several seconds to process. Why? Because the images to be analyzed have quite a long way to go. First, video frames would have to travel from the webcam to Amazon servers where the image will be compared to the reference digital photo stored in Amazon S3. This is where Amazon Rekognition analyzes a match before it sends the feedback or results back to Glutton, which is where the system can pull out what the employee ordered for lunch. The snag here is that Amazon servers are housed outside our premises, even outside the country. Not only that, but it will also have to rely on the internet connection.

And for us, it is crucial that the image analysis is accomplished as quickly as possible: our developers at lunchtime are hungry, and we want to spare them from waiting extra minutes for their food. My best option was clearly to get my hands dirty and build a Face Recognition API that would run on our own servers. This way, we would not only be able to optimize the image processing time but also be spared from having to deal with the requirements that a generic service such as AWS Rekognition has.

Concretely, this system would allow us to:

Register users in the system. It only takes one (recent) photo of each employee to be registered in the system and recognizable by it. This systematic operation has become part of our onboarding process.
Quickly scan users' images from a real-time video stream. Twice per second, the live stream via the Arcanys Kitchen webcam sends a video frame to the face recognition API. The API then immediately sends feedback in the same frame. Note: Feedback comes about 0.2 to 0.3 seconds after the camera scans a face.
Automatically pull up users’ orders from Glutton. The integration of the FR feature to our meal ordering system allows us to match the scanned users to their corresponding lunch orders automatically. Results are directly displayed on the interface for the kitchen staff.

[Components of the system. The computer in the kitchen shows the live feed from the attached webcam.
Frames from the feed are sent to the face recognition system to detect who is in front of the camera.
The person's identity is then pulled up from our meal ordering system and their order is displayed on the kitchen monitor.]

With the concept mapped out, all that was left were the nitty-gritty development and testing phases.

The Facial Recognition Integration: Benefits & Limitations

After several pots of coffee and much brainstorming, we were finally able to develop a working FR system using Python–which is probably the most popular programming language for building machine learning tools, and face_recognition Python library, which employs the state-of-the-art dlib toolkit that is built using several ML algorithms for deep learning. The library has a 99.38% accuracy on the Labeled Faces in the Wild dataset. We also used the Python Flask framework to create an API for the face recognition module.

The User Interface for the kitchen staff. It shows the live webcam feed and the full list of orders. When a face is detected, the name of that person together with the ordered meal will be displayed. The staff can then mark orders as ‘claimed’ so that they keep track who claimed their meal, and who didn't.

Now that's done, here are the results.

Better speed and accuracy, lower price: an efficient face recognition tool

Speed. With that custom-made FR application, it only takes under a 0.1 seconds per face for the system to recognize a person in a video frame. If you include some overhead from the API like sending the data back and forth, we can get a response under 0.3 seconds. Quite fast for our use, if you ask me, and definitely faster than the several seconds it takes for AWS Rekognition to come up with a result.
Edit: Recent experiments showed that our custom system outperforms AWS Rekognition not only on speed, but also on accuracy. The Arcanys system typically delivers a result within 300 ms, while AWS Rekognition may let us wait more than 2 seconds. This is mainly due to the fact that our system, unlike Amazon, doesn't require to upload any data on servers around the world—all our servers are in-house for this specification.

Cost-efficient. Money could be another consideration. Amazon charges you based on how many images are processed—1 buck for 1000 images. Although not very expensive, if you’re using a continuous stream of images like in our situation, charges can quickly add up. Let’s do a quick calculation, shall we? If you process 4 images per second for 3 hours per day, and with an average of 21 working days per month, you may very well end up paying almost $1000 for the AWS Rekognition service per month. And that’s certainly not chump change.

Privacy concerns. If you are already using AWS services, then there is probably less of a concern with regards to privacy. However, you have to keep in mind that using Rekognition means you’ll be sending photos of your employees to the cloud. And if your company's (or your client's) policy forbids you to store sensitive personal data in the cloud, for security or some other reasons, then you'll most likely end up building your own solution in your internal infrastructure anyway. So might as well go for a custom-built tool from the outset, wouldn't you say?

A Work in Progress

A few hiccups. Now, it appears that face detection is not an error-free process. Tests showed that 1 out of 10 Filipino users, the system was not sure who it was, doubting the results between two people with some facial similarities. However, when operated on our European expat co-workers, the system could identify them with an error rate of 0%.

[Testing the face recognition tool. The dots and lines indicate the facial landmarks. e.g., the location of the eyes and the mouth]

Note that this has little to do with ethnic origins but more to do with the fact that the underlying algorithm has been trained on a dataset of mostly Caucasian people. Moreover, the application has just recently been launched and much like other software, it will need updates and some bug fixing to continually improve its performance.

Conclusion

So is our face recognition solution so much better than that of Amazon? No, it is simply good at what it does—quick recognition of a person from among a relatively small selection of people. If you don’t care about waiting a few seconds for your result, and perhaps you are working with a huge dataset, and you need to support millions of people, then probably Amazon’s tools are a better option. In the end, it will all boil down to what you want the application to accomplish.

Here at Arcanys, we always look for optimum solutions and we'll definitely keep on bringing new improvements to our processes, systems, and technologies. And as a leading software development company in the Philippines, we build custom software, machine learning and data management solutions, drawing from years of experience, using the best industry practices, and mixing in a little magic of our own. Wanna see how we could help your business? Contact us and we'll be glad to have a chat with you.

Applied Machine Learning: How We Built Our Own Face Recognition System to Enhance the Arcanys Kitchen Experience