Quick Introduction to the Computer Vision API

Brad Kirtley .NET, Conversational Apps, Technology Snapshot, Tutorial Leave a Comment

What Is Machine Learning?
Machine learning is a hot topic these days because some of the biggest tech companies are focused on taking this technology to a new level. For instance, to help develop autonomous driving cars, better interaction between you and your house with products like the Amazon Echo.

Machine learning is a core sub-area of artificial intelligence. Machine learning enables computers to self-learn without being explicitly programmed. As new data comes available, the computer has the ability to learn, grow, change, and develop itself to make better decision in the future.

The rapid pace and growth of this technology being pushed by companies like Google, Microsoft, IBM, Uber, Facebook, etc. has the ability to make some drastic changes in how we live in the future. This technology will help reduce the workload and possible incorrect diagnoses when radiologist read films, reducing the amount of accidents on our highways caused by human error, possible reduction of inappropriate message / images / videos from bullying on social network sites.

This article will touch on one of the many Artificial Intelligence API’s that Microsoft has built for public consumption. (Of course, for a fee. :)) We will specifically focus on the process of uploading a picture and passing that picture onto the Microsoft Cognitive Services – Computer Vision API and retrieving different attributes about that image. This is an aspect of AI technology that companies like Facebook & Google are using to try to stop bullying and other issue within social networking.

Let’s Get Started

Prerequisites

  • Visual Studio 2015
  • Basic knowledge of C#
  • Microsoft Developer Account – https://www.microsoft.com/cognitive-services/en-us/sign-up

Create New Project

Let’s start by creating a basic ASP.NET MVC project called cvexample. Start up Visual Studio 2015. Click File => New => Project. Name the project cvexample.

In this project, we will implement one of the new Microsoft Cognitive Services, Computer Vision. As you build this web application, you will learn how to interact with the Computer Vision API and see the different attributes that are available to you. These attributes will be populated with the best guesses of what the network believes the uploaded picture is about.

Once you click “Ok” on the above screen you will get prompted with the screen below that is requesting you to select a template. Select the item, MVC and click “Ok”.

Set Up the Controller

After Visual Studio has finished creating the project structure, right-click on the “Controllers” folder, select Add => Controller. You should see the popup below:

Select “MVC 5 Controller – Empty” and click “Add”. You will then be prompted to enter in the name of the controller, as shown in the input box below. Enter UserSubmittedFileController and click “Add”.


You should see the newly created UserSubmittedFileController.cs in your navigation screen as shown below:

Copy Code

Copy the code below into the UserSubmittedFileController.cs class that you just created. The code is just basically taking in a file, saving it out to a folder, reading the file in from the folder, and then posting it to the API for examination. The key part of this code is:

var options = "visualFeatures=categories,faces,tags,description,imagetype,color,adult&language=en";

The variable options are used to pass in the different visual features that you would like the system to examine and populated with data to be returned back to you. For instance, the values being set in the “options” variable above are categories, faces, tags, description, imagetype, color, and adult. This is telling the API to perform different tasks on the image and to populate corresponding data attributes that will be returned.

An explanation of these visualFeatures are listed below:

visualFeatures(optional)

string – A string indicating what visual feature types to return. Multiple values should be comma-separated.
Valid visual feature types include:

  • Categories – categorizes image content according to a taxonomy defined in documentation.
  • Tags – tags the image with a detailed list of words related to the image content.
  • Description – describes the image content with a complete English sentence.
  • Faces – detects if faces are present. If present, generate coordinates, gender and age.
  • ImageType – detects if image is clipart or a line drawing.
  • Color – determines the accent color, dominant color, and whether an image is black&white.
  • Adult – detects if the image is pornographic in nature (depicts nudity or a sex act). Sexually suggestive content is also detected.
details (optional)

string – A string indicating which domain-specific details to return. Multiple values should be comma-separated.
Valid visual feature types include:

  • Celebrities – identifies celebrities if detected in the image.
  • Landmarks – identifies landmarks if detected in the image.
language (optional)

string – A string indicating which language to return. The service will return recognition results in specified language. If this parameter is not specified, the default value is “en”.
Supported languages:

  • en – English, Default.
  • zh – Simplified Chinese.
        [HttpGet]
        public ActionResult UserSubmittedFile()
        {
            return View();
        }

        [HttpPost]
        public async Task<ViewResult> UserSubmittedFile(HttpPostedFileBase file)
        {
            try
            {
                if (file.ContentLength == 0)
                {
                    ViewBag.Message = "File was empty";
                    return View();
                }
 
                var fileName = Path.GetFileName(file.FileName);
                var path = Path.Combine(Server.MapPath("~/UserSubmittedFiles"), fileName);
                file.SaveAs(path);
 
                var computerVisionRootViewModel = await ExamineImage(path);
                ViewBag.Message = "File Uploaded Successfully!!";
 
                return View(computerVisionRootViewModel);
            }
            catch(Exception e)
            {
                ViewBag.Message = "Error uploading the file.  Please try again!";
                return View();
            }
        }
 
        protected byte[] GetImageAsByteArray(string imageFilePath)
        {
            var fileStream = new FileStream(imageFilePath, FileMode.Open, FileAccess.Read);
            var binaryReader = new BinaryReader(fileStream);
            return binaryReader.ReadBytes((int)fileStream.Length);
        }
 
        protected async Task<ComputerVisionRootViewModel> ExamineImage(string imageFilePath)
        {
            var client = new HttpClient();
 
            client.DefaultRequestHeaders.Add("ocp-apim-subscription-key","{INSERT YOUR KEY HERE}");
 
            var options = "visualFeatures=categories,faces,tags,description,imagetype,color,adult&language=en"; 
            var uri = "https://westus.api.cognitive.microsoft.com/vision/v1.0/analyze?" + options;
 
            byte[] byteData = GetImageAsByteArray(imageFilePath);
 
            using (var content = new ByteArrayContent(byteData))
            {
                content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
 
                var response = await client.PostAsync(uri, content);
 
                if (!response.IsSuccessStatusCode) return null;
 
                var jsonResults = await response.Content.ReadAsStringAsync();
 
                var computerVisionRootViewModel = new ComputerVisionRootViewModel();
                    
                JsonConvert.PopulateObject(jsonResults, computerVisionRootViewModel);
 
                return computerVisionRootViewModel;
            }
        }

Log into your Microsoft Cognitive Services account and go to the “Computer Vision – Free” section as shown below. There you will copy your key to be inserted into the “{INSERT YOUR KEY HERE}” section above. This key is used to authenticate you when calling Microsoft’s API.

File Storage Location

Go to the root of your project, right click and select Add => New Folder. Name that folder UserSubmittedFiles. This is the location the files are uploaded to. Your navigation should look like the screenshot below:

Create View

Go into the UserSubmittedFileController file that was created earlier. Click on the UserSubmittedFile() method, then right-click and select “Add View…”. Take all the defaults and click “Add”.

You should now see the view UserSubmittedFile.cshtmlwithin the Views/UserSubmittedFile folder as shown below:

Insert View Code

Replace ALL the code in the UserSubmittedFile.cshtml with the code below. You will need to change the @model…. statement on the first line to include your project name that you used, if different than was used above when creating the project.

@model cvexample.Models.ComputerVisionRootViewModel
@{
    ViewBag.Title = "UserSubmittedFile";
}
<h2>Computer Vision Example - User Submitted File</h2>
<br/>
 
@using (Html.BeginForm("UserSubmittedFile", "UserSubmittedFile", FormMethod.Post, new {enctype = "multipart/form-data"}))
{
    <div class="col-md-12">
        @Html.TextBox("file", "", new {type = "file"}) <br/>
 
        <input type="submit" value="Upload"/>
 
        @ViewBag.Message
    </div>
    <br/>
}
<div class="col-md-12"><hr/></div>
@if (Model != null) {
    <div class="col-md-12">
        <div class="row">
            <div class="col-sm-9 text-success"><h3><b>Results</b></h3></div>
        </div>
        <div class="row">
            <div class="col-sm-3"><strong>Is Adult Content:</strong>
            </div>
            <div class="col-sm-6">
                @{
                    if (Model != null && Model.adult != null)
                    {
                        @Model.adult.isAdultContent.ToString()
                    }
                }
            </div>
        </div>
        <div class="row">&nbsp;</div>
        <div class="row">
            <div class="col-sm-3"><strong>Black & White Image:</strong>
            </div>
            <div class="col-sm-6">
                @{
                    if (Model != null && Model.color != null)
                    {
                        @Model.color.isBWImg.ToString()
                    }
                }
            </div>
        </div>
        <div class="row">&nbsp;</div>
        <div class="row">
            <div class="col-sm-3"><strong>Faces:</strong></div>
            <div class="col-sm-6">
                @{
                    if (Model != null && Model.faces != null)
                    {
                        var faces = Html.Raw(Json.Encode(@Model.faces));
                        @faces.ToHtmlString()
                    }
                }
            </div>
        </div>
        <div class="row">&nbsp;</div>
        <div class="row">
            <div class="col-sm-3"><strong>Categories:</strong></div>
            <div class="col-sm-6">
                @{
                    if (Model != null && Model.categories != null)
                    {
                        var categories = Html.Raw(Json.Encode(@Model.categories));
                        @categories.ToHtmlString()
                    }
                }
            </div>
        </div>
        <div class="row">&nbsp;</div>
        <div class="row">
            <div class="col-sm-3"><strong>Tags:</strong></div>
            <div class="col-sm-6">
                @{
                    if (Model != null && Model.tags != null)
                    {
                        var tags = Html.Raw(Json.Encode(@Model.tags));
                        @tags.ToHtmlString()
                    }
                }
            </div>
        </div>
        <div class="row">&nbsp;</div>
        <div class="row">
            <div class="col-sm-3"><strong>Description:</strong></div>
            <div class="col-sm-6">
                @{
                    if (Model != null && Model.description != null && Model.description.captions != null)
                    {
                        var descr = Html.Raw(Json.Encode(@Model.description.captions));
                        @descr.ToHtmlString()
                    }
                }
            </div>
        </div>
    </div>
} 

We are almost done; a few more steps to cover.

We need to create the ViewModel that is used when converting the raw JSON from the API to a ViewModel that we pass to our view. To create the ViewModel, right click on the “Models” folder, select Add => Class. The popup below will appear. Select “Class” and name it ComputerVisionRootViewModel.cs. Click “Add”.

You should now see your new ComputerVisionRootViewModel.cs under the Models folder.

Let’s Populate The ViewModel

Copy the following code between the namespace tags. I found an easy to use tool http://json2csharp.com/ to paste in the example JSON from the Microsoft API documentation to create the object. The object will then be used to populate based on the raw JSON being returned.

public class FaceRectangle
{
    public int left { get; set; }
    public int top { get; set; }
    public int width { get; set; }
    public int height { get; set; }
}
 
public class Celebrity
{
    public string name { get; set; }
    public FaceRectangle faceRectangle { get; set; }
    public double confidence { get; set; }
}
 
public class Landmark
{
    public string name { get; set; }
    public double confidence { get; set; }
}
 
public class Detail
{
    public List<Celebrity> celebrities { get; set; }
    public List<Landmark> landmarks { get; set; }
}
 
public class Category
{
    public string name { get; set; }
    public double score { get; set; }
    public Detail detail { get; set; }
}
 
public class Adult
{
    public bool isAdultContent { get; set; }
    public bool isRacyContent { get; set; }
    public double adultScore { get; set; }
    public double racyScore { get; set; }
}
 
public class Tag
{
    public string name { get; set; }
    public double confidence { get; set; }
}
 
public class Caption
{
    public string text { get; set; }
    public double confidence { get; set; }
}
 
public class Description
{
    public List<string> tags { get; set; }
    public List<Caption> captions { get; set; }
}
 
public class Metadata
{
    public int width { get; set; }
    public int height { get; set; }
    public string format { get; set; }
}
 
public class FaceRectangle2
{
    public int left { get; set; }
    public int top { get; set; }
    public int width { get; set; }
    public int height { get; set; }
}
 
public class Face
{
    public int age { get; set; }
    public string gender { get; set; }
    public FaceRectangle2 faceRectangle { get; set; }
}
 
public class Color
{
    public string dominantColorForeground { get; set; }
    public string dominantColorBackground { get; set; }
    public List<string> dominantColors { get; set; }
    public string accentColor { get; set; }
    public bool isBWImg { get; set; }
}
 
public class ImageType
{
    public int clipArtType { get; set; }
    public int lineDrawingType { get; set; }
}
 
public class ComputerVisionRootViewModel
{
    public List<Category> categories { get; set; }
    public Adult adult { get; set; }
    public List<Tag> tags { get; set; }
    public Description description { get; set; }
    public string requestId { get; set; }
    public Metadata metadata { get; set; }
    public List<Face> faces { get; set; }
    public Color color { get; set; }
    public ImageType imageType { get; set; }
}

One last change: open up the file RouteConfig.cs within the App_Start folder and replace the routes.MapRouter(..... code with the code below. This will make our new controller/view the startup controller/view when we run the project.

routes.MapRoute(
    name: "Default",
    url: "{controller}/{action}/{id}",
    defaults: new { controller = "UserSubmittedFile", action = "UserSubmittedFile", id = UrlParameter.Optional }
);

Test Run

Run the project. You should see something close to the screenshot below.

Get A Picture

Browse out to google and search for an image with a person or people in it. Try to find a picture with some background like water, beach or forest. Save that file somewhere so you can then upload it to our test site.

For my example, I downloaded the following picture:

Upload A Picture

Select the “Choose File” button on the screen, browse out to the file that you previously saved, highlight that file, then click “Upload”. If everything worked as expected, you should see something like the results below.

If you notice, it found that two people are in the picture; one a male that is 48 years old and one that is a female that is 46 years old. It is not adult content or a black and white image. It also took a guess what the picture was about, “a man and woman sitting on a bench.”

HOW COOL IS THAT!!!!!

Other Attributes

There are other attributes of the picture that it will try to figure out as well, based on the options parameters that you used when calling the API. To learn more about the other attributes that are available, go to: https://westus.dev.cognitive.microsoft.com/

Continue The Fun

You can remove or add other attributes by changing the visualFeatures flags passed into the Microsoft Cognitive Service’s – Computer Vision’s API. You will find these flags being set below on the line:

var options = “visualFeatures=categories,faces,tags,..........

All the values have been set up in the ViewModel if captured. However, not all the values are setup to be passed to the View from the ViewModel.

If you are interested in seeing other attributes, like the location of the face within the picture, go into the UserSubmittedFile.cshtml file and add any other attributes that are available.

protected async Task<ComputerVisionRootViewModel> ExamineImage(string imageFilePath)
        {
            var client = new HttpClient();
 
            client.DefaultRequestHeaders.Add("ocp-apim-subscription-key", "");
 
            var options = "visualFeatures=categories,faces,tags,description,imagetype,color,adult&language=en"; 

            var uri = "https://westus.api.cognitive.microsoft.com/vision/v1.0/analyze?" + options;
 
            byte[] byteData = GetImageAsByteArray(imageFilePath);
 
            using (var content = new ByteArrayContent(byteData))
            {
                content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
 
                var response = await client.PostAsync(uri, content);
 
                if (!response.IsSuccessStatusCode) return null;
 
                var jsonResults = await response.Content.ReadAsStringAsync();
 
                var computerVisionRootViewModel = new ComputerVisionRootViewModel();
                    
                JsonConvert.PopulateObject(jsonResults, computerVisionRootViewModel);
 
                return computerVisionRootViewModel;
            }
        }

Summary

In this blog, you learned how to upload an image, pass the image onto the Microsoft Cognitive Services – Computer Vision API, and then retrieve different attributes about that image. Pretty cool stuff!

Companies such as Facebook, Google, Amazon, Uber are all spending heavily in this space to help reduce bullying on social sites, the uploading of inappropriate images / videos, and the race to build the first autonomous car. If you think this technology is as cool as I do, go check out the other APIs available from Microsoft Cognitive Services. Some of these are APIs to read text or signatures off an image, scan a person’s face for security reasons, verify content within a video, real-time location movement monitoring within a video being captured (think security systems capturing an intruder’s movements).

I hope you enjoyed this blog and found it as a good basic introduction into the artificial intelligence APIs that are currently available from Microsoft. This article has shown how easy Microsoft has made it to hook some AI into your own applications.

To dig deeper into the Computer Vision API check out:
https://www.microsoft.com/cognitive-services/en-us/computer-vision-api

To learning about other Cognitive Service APIs offered by Microsoft check out:
https://www.microsoft.com/cognitive-services/en-us/apis


About the Author
Brad Kirtley

Brad Kirtley

I have been a software developer for 17+ years. I have used the Microsoft.NET technologies across several different industries (Healthcare, Real Estate, Document Management, Learning, eCommerce). My focus has been primarily web development, while dabbling in iOS. Recently digging into JavaScript.


Share this Post

Leave a Reply