How to Create a Dystopian Future at Home with Python, OpenCV, and Microsoft Azure

Derek Andre Azure, Cloud, Python, Technology Snapshot, Tutorial Leave a Comment

Facial recognition is both amazing and horrifying. Some amazing things it can do is the ability to find missing children or seniors, using your face to unlock your phone, and being able to board an airplane faster.

In this blog post, I want to highlight some powerful tools and platforms that allow you to create distributed facial recognition systems with OpenCV and Azure’s Cognitive Services. By the end of this post, you will have a working face detector using OpenCV that can communicate with Azure’s Cognitive Services.

I used Python 3.7.4 and pip 19.2.3 for this project. You can view the code from this blog at https://github.com/dcandre/Dystopian-Future-At-Home.

To the Cloud!

The facial recognition and speech synthesis will be handled by Microsoft Azure’s Cognitive Services. You will need an Azure subscription.

Log in to the Azure portal and create a new resource group. Then search for “Cognitive Services” in the Azure portal. Select “Cognitive Services”, under the “Services” header, from the results. You can push the add new button and it will take you to the marketplace where you can select the Face module.

Create a new Face service under the Resource Group you created earlier. It will be propped up and you can go to the new cognitive service page and get the subscription key. Then search for “Cognitive Services” again. Select “Cognitive Services”, under the “Services” header.

After you click the add new button, choose the Speech module. Create a new Speech service in the Resource Group you created earlier and after it is deployed, go to the service page and grab the subscription key. Magical!

pip install -r

When you pull this project from Github, you can install the dependencies with the requirements.txt file. I would do that in a virtual environment. The project is made up of two Python files called train.py and detect.py. There will be a folder called training_pics. Finally, a crazy xml file called haarcascade_frontalface_alt.xml.

The train.py file, when executed, will train the machine learning model, to detect faces, from the jpg images in the training_pics folder. The detect.py file, when executed, will use OpenCV to detect if there are faces present in the video stream produced by your computer’s webcam or another connected camera.

If there are faces in the video stream, then the frame captured from the video stream is sent to the Face Cognitive Service on Azure to determine if it recognizes them. If there are recognized faces, then strings (which include the names of the people) are created and sent to the Speech Cognitive service to create an mp3 file that can be played.

Selfie Time!

I took 14 different pics of my face from different angles and with different lighting.

Here are some rules for the PersonGroupPerson object, which you will be creating with the train.py script. My script looks for jpg files in the training_pics folder, but you can alter the script to look for any kind of image file that can be used for a PersonGroupPerson.

After you have the hot pics, throw them in the training_pics folder.

Ride that Train

Let’s either create the train.py file or look at it from the Github repository.

import glob, uuid, sys, time
from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials
from azure.cognitiveservices.vision.face.models import TrainingStatusType, APIErrorException

face_client = FaceClient('<Azure Face Service Endpoint URL>', CognitiveServicesCredentials('<Azure Face Service Subscription Id>’))

person_group_id = 'dystopian-future-group'
target_person_group_id = str(uuid.uuid4())

To use the face service you need to create a FaceClient object. The facial data that we want to train our model with will be held in a PersonGroup object.

See Also:  Decoding Base64

I have created an id for that Person Group, dystopian-future-group.

try:
    face_client.person_group.create(person_group_id=person_group_id, name=target_person_group_id)
except APIErrorException as api_error:
    print(api_error.message)

try:
    person_group_person = face_client.person_group_person.create(person_group_id, "Derek")
except APIErrorException as api_error:
    print(api_error.message)

The snippet above, in the first try-catch block, creates the PersonGroup. The second try-catch block will create a PersonGroupPerson object. The PersonGroupPerson object will represent facial features for a particular person in a PersonGroup.

I have set the name property on the PersonGroupPerson object to Derek. Change it to the name of the person in the pictures that you placed in the training_pics folder. We will use this name later when we use the text to speech service.

training_images = [file for file in glob.glob('./training_pics/*.jpg')]

for training_image in training_images:
    print(f'Opening image {training_image}')
    training_image_stream = open(training_image, 'r+b')
    try:
        face_client.person_group_person.add_face_from_stream(person_group_id, person_group_person.person_id, training_image_stream)
    except APIErrorException as api_error:
        print(api_error.message)

face_client.person_group.train(person_group_id)

while (True):
    training_status = face_client.person_group.get_training_status(person_group_id)
    if (training_status.status is TrainingStatusType.succeeded):
        break
    elif (training_status.status is TrainingStatusType.failed):
        sys.exit('Training the person group has failed.')
    time.sleep(5)

Now we are going to load the paths of the jpg images, from the training_pics folder, into the training_images list. We will loop through that list and use the FaceClient object to add the image to the PersonGroupPerson object. After every image has been added to the PersonGroupPerson object, we can use the train method on the PersonGroup object on the FaceClient object to train our model. The while loop is used to keep checking the status of the training and to print out.

Run your train.py script. If nothing explodes, then you will have a trained model that can identity a certain person. Awesome!…Possibly shady.

Detect…ive Gadget

Now we are going to look at the detect.py file. Go ahead and create the file if you didn’t pull it from Github. Create a main function. Next, we can go over what is in the main function below.

def main():
    total_number_of_faces = 0
    face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_alt.xml")
    video_capture = cv2.VideoCapture(0)

    while (video_capture.isOpened()):
        video_frame_captured, video_frame = video_capture.read()

        if video_frame_captured == True:
            gray_video_frame = cv2.cvtColor(video_frame, cv2.COLOR_BGR2GRAY)
            gray_video_frame = cv2.equalizeHist(gray_video_frame)
            faces = face_cascade.detectMultiScale(gray_video_frame)
            faces_in_frame = len(faces)

            print(f'number of faces in the frame: {total_number_of_faces}')

            if faces_in_frame != total_number_of_faces:
                total_number_of_faces = faces_in_frame
                
                if total_number_of_faces > 0:
                    retval, video_frame_buffer = cv2.imencode(".jpg", video_frame)

                    if retval == True:
                        recognized_people = get_recognized_people(video_frame_buffer)

                        for person in recognized_people:
                            if len(person.candidates) > 0:
                                person_information = get_persons_information(person) 
                                text_to_speak = get_text_to_speak(person_information.name)
                                speak_text(text_to_speak)
                        
        else:
            break

    video_capture.release()

We are going to use OpenCV to capture video from our webcam, or another connected camera, and determine if the video frame has faces in them. If there are faces in the video frame, then it is sent to Azure’s Face Cognitive Service to determine if the face belongs to the PersonGroupPerson object that we trained our model on.

The video_capture variable is a VideoCapture object for video capturing from video files, image sequences or cameras. I pass in 0 to use the default camera, which is the webcam on my computer. The while loop will continue until the camera stops capturing footage or there is a problem capturing frames from the video stream.

The first thing to do is to take the video frame and convert it to grayscale with the cvtColor function. Then you can use the equalizeHist function to balance the brightness and the contrast in the image.

At the beginning of the main function we created a variable face_cascade. This is a CascadeClassifier object. We will use Haar detection to find faces in the video frame with the CascadeClassifer’s detectMultiScale function.

This will return bounding boxes for the found faces. If the number of faces changes and there is at least one face in the frame, then we will create a buffer for the frame with the imencode function. That buffer is sent to our get_recognized_people function, which I will talk about in a little bit.

See Also:  Go "On The Fly"

The get_recognized_people function will return a list of people in the video frame. Then we will loop over that list and get the PersonGroupPerson object from Azure in the get_persons_information function.

You Betta’ Recognize

Let’s define the get_recognized_people function.

def get_recognized_people(video_frame_buffer):
    face_client = FaceClient('<Azure Face Service Endpoint URL>', CognitiveServicesCredentials('<Azure Face Service Subscription Id>'))
    
    video_frame_stream = BytesIO(video_frame_buffer.tobytes())

    try:
        faces = face_client.face.detect_with_stream(video_frame_stream)
    except APIErrorException as api_error:
        print(api_error.message)

    face_ids = []

    for face in faces:
        face_ids.append(face.face_id)
    
    recognized_people = []

    if len(face_ids) > 0:        
        try:
            recognized_people = face_client.face.identify(face_ids, 'dystopian-future-group')
        except APIErrorException as api_error:
            print(api_error.message)

    if not recognized_people:
        recognized_people = []

    print(f'number of people recognized: {len(recognized_people)}')

    return recognized_people

You are going to use the FaceClient object again to use Azure’s Face Cognitive Service. The detect_with_stream function will return the face rectangles, landmarks, and most importantly, the face ids of the recognized people in the video frame. You can gather the face ids in the face_ids list and loop over those to get the PersonGroupPerson with the identify function. Then a list of those objects are returned from the function.

I Need The Deets

Let’s look at the get_persons_information function.

def get_persons_information(person):
    face_client = FaceClient('<Azure Face Service Endpoint URL>', CognitiveServicesCredentials('<Azure Face Service Subscription Id>'))
    
    try:
        person_information = face_client.person_group_person.get('dystopian-future-group', person.candidates[0].person_id)
        return person_information
    except APIErrorException as api_error:
        print(api_error.message)

We can get a PersonGroupPerson by its id with the above get method. The returned object will have a name property which we will pass to the get_text_to_speak function.

def get_text_to_speak(name):
    sayings = [f'Hello {name}', f'Do you want to play a game {name}', f'We are watching you {name}', f'Look {name}, I can see you are really upset about this. I honestly think you ought to sit down calmly, take a stress pill, and think things over.']
    return random.choice(sayings)

I Spack

The speak_text function will use the Speech Azure Cognitive Service to create a TTS audio result called SpeechSynthesisResult that our system can play with the phrase from the get_text_to_speak function.

def speak_text(text):
    print(f'saying, {text}')

    speech_config = speechsdk.SpeechConfig(subscription='<Azure Speech Service Subscription Id>', region='<Azure Region>')
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
    result = speech_synthesizer.speak_text_async(text).get()

End of Line

The last thing to do on the detect.py file is to call the main function

if __name__ == '__main__':
    main()

Now you have a working face detector, using OpenCV, that can communicate with Azure’s Cognitive Services. I hope that you found this post helpful in getting to know some of the cool tools and platforms available for distributed facial recognition systems.

Next Post

One idea that I had was to run this application on a Raspberry Pi. I used picamera instead of the VideoCapture class from OpenCV. I was able to capture a low-resolution frame from a Raspberry Pi camera, but the CasscadeClassifier super-barfed and crashed Raspbian when I ran the detectMultiScale function on the video frame. I have 4 GB of memory with a maxed out 2 GB swap file. Is the processor not powerful enough? Plus, you have to compile OpenCV’s source to use the latest version on a Raspberry Pi; pip only has version 3 for the ARM processor.

I will have to explore this more in a later blog post. If anyone has had success running OpenCV on a Raspberry Pi, let me know.

What Do You Think?