>> Learn how to infuse AI into your Java application.
In this episode of The AI Show,
we'll learn all about computer and
Custom Vision Cognitive Services API.
Welcome to this episode of
The AI Show where my colleague Ruth Yakubu is
going to show us how to infuse
AI into a Java application.
My name is Seth Juarez.
All right. So, Ruth, how do we get started?
>> So, to get started,
let me just give a little background
of what we developers face.
>> Okay.
>> There are two types of developers,
the one that is sitting in their cube,
and the business has
business requirements that we can and leverage AI.
But they're like, last I check,
I'm not a data scientist.
How can I do this?
Then, that's when you realize that Microsoft has a lot of
Cognitive Services that span across different industries,
that our developers can
leverage and hit the market real quick.
>> Awesome. So, if I have an application and I'm like,
hey, I want to put some AI, but I don't know what to do.
You're saying Cognitive Services is an answer to that.
>> Yeah. Because our researchers have
done years and years of research,
and tons and tons of data curation,
analytics, machine learning, and what not,
and provide all of that.
>> So, how do I burn it into my application?
>> Okay. So, for this application, I'm using Java,
and one thing that is super
exciting for us in the Java community is,
we now have Maven,
STKs for Cognitive Services.
So, you can do all of that with the rest applications,
but you have STKs now.
So, for the application I want to show you guys today is,
I'm going to create a lost and found type of
situation where we're using Computer Vision,
also tie into text analytics,
and I'm only tying into text and analytics to, okay,
you load something, how can we query something
and find a result based upon
the tags that were generated from a image?
>> Awesome. Let's dive in.
>> Okay. So, to start off with, I'm in Eclipse.
So, this is a Spring Boot application.
We don't need to go into details unless you
want us to go into specific details later.
So, what I'm going to do is, launch the application.
So, what I did was right click.
It's a little bit too late now.
But basically, what it's doing is,
building the application and the end
is going to launch the Tomcat server.
>> Awesome. So, this is like
a regular Spring application.
It's a website, MVC style type thing.
>> Exactly.
>> Cool.
>> Yeah. So, now that we see that
the application is up on port 8080,
let's launch the application.
So, now, I'm going to open
a picture that we're going to analyze.
So, what I'm doing here is,
I'm calling the Computer Vision API,
and the method I called was analyzed.
So, first, let's take a look at the image. There's a car.
There's a parking lot type of situation.
There's a background going on.
So, the good thing about the Computer Vision is,
when we call it,
it's going to return attributes
about the image that it finds.
So, it's going to generate different tags,
so you see a whole slew of these.
Car, road, grass, driving,
parking, and all of that.
So, just take in a eyeball of it.
All of those are accurate compared
to the image that we just uploaded.
>> Awesome. And this isn't data that you upload.
This is something to be clear that
the service returns to you based on the picture.
>> Yeah.
>> Okay, cool.
>> Yeah. You just upload a image,
call the API with the image,
tell it that please analyze this image.
Then, all I'm doing is returning the JSON,
not even do anything.
Just to show the users what the Computer Vision does.
>> Awesome.
>> So, let's get back to
the main point that I wanted to show,
the power of this and different ways you can use this.
So, think of a lost and found type of scenario, right?
You can never anticipate what users are going to lose.
So, I'm guilty of that.
I lose a lot of stuff.
So, let's do a very generic one.
Kids, I think they specialize in losing things.
So, let's say, I'll upload a teddy bear.
It shows the attributes like before.
Let's upload this Mercedes-Benz.
It's also a key,
and see what it returns back.
So, for the Mercedes-Benz,
this was kind of interesting.
The tags that it was returning back is,
it's a table, sitting, black, piece, top,
luggage, desktop, players, suitcase,
collar, phone, [inaudible] actually mouse.
So, for the human eye, personally,
we're used to how keys looked liked,
especially in the medieval times.
There's certain way a key looks like.
But nowadays, what do we do in outliers where
keys are no longer
the traditional way of how you see a key?
Let's say, I'm fresh off a boat from somewhere.
I have never seen a Mercedes-Benz key.
>> I know I haven't.
>> Yeah.
I haven't seen one either.
But yeah, this one,
I can easily think it's a garage door opening.
>> Right.
>> Because it can pass for a lot of stuff.
So, let's go back to the analytics part.
This is another piece that I
wanted to highlight to developers.
Let's say, you have a whole bunch of text, right?
You're dealing with the lost and found,
what are users going to do when
they come in to tell you what they lost?
They tend to ramble on and on,
but you need to get to the gist of it.
The whole point they need to just tell us is,
I lost the teddy bear,
I lost a key.
That's it, but people like me,
I'm going into a form like,
"I was at the food court-",
if I can spell,
"-and I think I
left my toy at the table."
and click "Submit".
So, one thing is
the very first thing you look at
is okay we call that "Text
Analytics" and that one also has
multiple functions like sentiment analysis,
key phrase extraction.
So, in real life,
this person would have written like
a three paragraph type of description of what happened,
different possible areas that
they think they've lost something,
but the key thing is look at
the things that it extracted,
it got that there was a food court,
there was a toy, there was a table.
So, what it's going to do is look in the database,
because in the last page what I was
doing is all the image for a lost
and found let's say each time
somebody loses something you take a photo of it,
store in the database.
Later on when somebody comes,
we no longer need to go and
manually go aisle after aisle searching
for stuff you can also use
AI to find some of these stuff.
Just do as when somebody submits that,
let it do a query and see what matches they've found.
So, the keyword was toy,
it brought back to teddy bear,
I uploaded the traditional key,
it didn't find that,
but for some reason I
think it's probably thinking the car keys.
>> If it's a toy or it could be somebody's toy as well.
Right? The thing I like about this is that you
actually allowed users to
upload pictures and ask about
those things without having any user intervention at all.
>> Yes.
>> Using this kind of service, I have two questions.
The first question is, what does this look like in code?
>> Yeah.
>> The second one is,
I saw there were some mistakes
when the pictures were uploaded,
is there a way to fix that?
So let's start with the first, let's take a look at it.
>> Yes, awesome.
So, to start with,
this is the Spring Application, right?
So, when you're talking to a web-type of interface,
in spring you have something called controller.
So, those are risk controllers.
It takes requests from the web
translate it into your business application.
So, I'm depending on some help or
class or services that will actually do the API requests,
so I'm calling them in.
>> Is the package you import from Computer Vision?
>> Say that again?
>> Is this a package that you import
from the Computer Vision services?
>> In the service class.
So, we'll open that real quick.
So, let me do a quick drive by,
this is a typical REST Application.
Right? So you specify annotation that,
"Okay, I need the request mapping, what's the method?
In REST, you're going to do a Git,
you need to specify the path.
So, for the scenario
of when we're doing dealing with the text analysis,
that's what I was doing.
Here, I'm returning a form but you can
return whatever you want to.
I'm using Thymeleaf
>> I see.
Because I'm not too Web-centric but I'm just specifying
the HTML file name it should call.
So that's just the display part of it.
So, when somebody pulls something, enters an information,
now, as you can see,
the user entered a description.
So, you're taking all of that.
And now, you took the description and fed it into
a service that did some analysis,
that's the API that we called,
return JSON body and you parse
the data the way you wanted to present.
All I'm doing is parsing and present it to the user,
they're also grabbing the URL,
so we're good on that.
>> So the service,
the text service, is
it like a package you just download
in order to make the API calls?
>> Yes.
>> Okay.
>> So, let me show you this service
because even when we go to the controller,
it looks exactly the same,
the only thing is before I return the JSON,
I'm saving it to a database.
>> Got it.
>> One thing under Spring is Domain.
Well, Domain is like your Database Model.
I get into Repository.
Let's get into which one do you want to look at?
>> Let's look at the Vision Service.
>> Okay. Vision Service.
In order to use a Vision Service,
the very first thing is you need to
provision the service on the Azure portal and
the key thing is in order to call this API or if you're
using the SDK from Maven you need a key.
So, once you establish the key,
you need to specify the key in
your header then what's going to be your body,
you need to pass the parameter in
the body which is going to be the image file,
then the rest is okay.
Going to our API,
definitions that we provide online for each of
our AI Cognitive Service
that okay if you want to call this,
this is how you call it.
Basically, for Java I'm just setting
up all of the parameters that I'm going to be calling.
Finally, once you build all the parameters,
the thing that you do is do a build,
and where I'm actually
executing this Spring provides
something called a rest template,
where you're passing in
that builder and you need to figure
out that the API that you're using is the polls or Git.
Then the final thing- well,
the entity is basically the header.
Then the last thing is, okay,
what's your output going to look like?
Is it json output?
So that's basically what that is.
>> So this is what a rest
endpoint call looks like in Java?
>> Yes.
>> Cool.
>> So let me do a quick drive by.
In this situation, I wanted to use
the SDK to give you guys an idea.
For the SDK, if you're a Java developers,
if you notice, I'm using
a blob storage when I upload this images.
So, the awesome thing
that a lot of people in the Java community are having
misconception is with Microsoft Environments Tools
or Azure is probably.net but
not knowing we have so many for
SDK's that you can go find on the Maven Repository.
So, this is an example of
a Library you can just call and that's the blob storage.
>> Cool, so when you upload the picture
you're just pushing those up to blob storage?
>> Yes.
>> Smart.
>> Before you do anything,
I know, yeah, we're gonna delete,
blur out my credentials,
but in order to do that,
the very first thing you do is provide
your keys and the unique blob and the container,
but if you notice I'm not doing
necessary REST calls because
now you're calling the objects.
This is how simple and cleaner
your application is going to look when you're using SDK.
>> Got it.
>> Okay.
>> Cool. You were telling me that there are some
that just don't work, tell me about those.
>> Yes. So, AI is new and it's up and coming and that's
why there's so much buzz about it and
all these companies are improving
on their algorithms and whatnot.
So, for the scenario that you're in
a situation that you
use something like a Computer Vision API,
and you found some outliers that did
not meet your business case,
what you can do is use something
called Custom Vision API.
>> I see. So in the case where I
upload a picture of something it
may have never seen before
because I have crazy stuff in my house,
there is a way to actually train it
to recognize new things.
>> Yes. So for that one,
the awesome thing is you can go to computer then.
>> Custom Vision.
>> I'm glad somebody is paying attention.
Then say AI.
Then the next thing we need to do is
"Log In" to that account.
Log-in never took so long.
The very first time that you log into the account,
you need to agree to the terms of services.
I already started a project,
but if you didn't,
you click on "Create Project".
Give it a "Name", "Description",
and it's very crucial that if you
know the "Domain" it falls under,
you may want to put it in there because
if you're dealing with a category that's in the "Domain",
it increases your chances of finding things better.
>> Got it.
>> So, due to time, I created keys.
So let's take a step back,
the Custom Image service is a classifier.
It goes on the machine learning classification,
algorithm, and what do classifiers usually
do is either yes or no.
You're trying to find whether this is this or that.
So, I think the amazing thing that
the Microsoft AI Research Team has provided for us is
the ability for a developer to just come in in
a situation where car keys
are getting more and more involved.
It is a key, but it's not the traditional key,
and depending on what industry you are,
it could be other things like a designer handbag, okay?
How can it recognize
the Louis Vuitton bag versus bag versus a Prada bag?
Not that I know anything about that,
but I just wanted to show you how simple this is.
The recommended image is
for you to upload at least 50 images,
but I think for my image,
all I wanted to do was differentiate,
check whether tell me the difference between
a BMW key and a Mercedes key.
So, I just gave it those two categories.
So I uploaded different images for the BMW key
and uploaded a whole bunch of
different images for Mercedes keys.
They're not quite up to 50,
but you'll be surprised with very little.
It's still very accurate and predicting your model.
Once you have your models uploaded,
which is everything is very intuitive,
click on the "Train".
I already did one in the past,
so the last iteration I did was okay.
I uploaded the images.
There are two categories, right?
When you're training your model,
you need to see how precise is,
and the recall is okay,
the error factor of it like
what are the chances that in this group,
there is some that did not
match what you're trying to train?
So pretty straightforward. But looking at this,
I think I'm pretty confident that 90 percent of the time,
it's going to recognize what I'm trying to do.
So, the very important thing for
a developer to do is this part right here.
I need to figure out that,
when you take it back to your application,
are you going to upload it?
Because you can train everything that did visually,
you can do it via code.
Then in our case,
I'm just going to call the API. I trained it.
Everything is good to go.
It shows you the API to use,
so I went ahead and create a custom service.
So, I have the project key,
the RESTless API call.
>> It's just like the same thing that we
had before, but now with a different service.
>> Yes. The only thing is they call it prediction key.
Or other APIs, I think it's
all CP subscription ID key, something like that.
So, be aware of that.
You're uploading an image in the URL type of format.
You're calling the REST API.
Then nothing else really changes.
Let's get back to the controller and call
this custom that's coming out the other one.
So, we're at the end.
Let's see how good this prediction was.
I'm going to restart this Java application.
Okay, perfect.
So, it's done.
The next thing is,
I'm going to go to my trusted Computer Vision,
and now let's find the same key
and upload it and see what it says.
Okay.
>> It looks like it did
indeed find it to be a Mercedes key.
>> Yes.
>> Awesome. So, what I'm understanding then,
so I want to see if I can summarize it, what you did,
you made a lost and found application,
where you can upload pictures of
like someone's like, "Oh, they lost this.
Take a picture," and that's all they do. They upload it.
The Computer Vision service puts all the tags on there.
Someone goes in, types using
text analytics and text analytics has
all the extra stuff out and
maps the things that it found.
>> Yes.
>> Then there's cases where there are
certain things that it won't
understand that you have
to go back and train. Did I get that right?
>> Yes. One thing we have to keep,
we have to take into consideration,
when you're using Custom Vision, there's a limitation.
What if you have petabytes of data?
In those type of situations,
you have to be cognitive,
"Okay, I'm going to train this model.
There's a way we need to export the model,
and if it's too big,
we need somewhere to run and
has a system capacity to run that."
Another story, another alternative to
bypass all of this is going to deep learning.
There are several algorithms out there that you can use,
and if you know what you're doing at this point,
you can code all of that and come up with
maybe a better solution.
I won't say a better solution.
>> More customized.
>> But a deeper. Yes, more customized. Yes.
>> Awesome.
>> Depending on where your industry is.
>> This has been super helpful like
pulling as I like that you show it all in
Java to show that it really doesn't
matter what language you use.
Thanks so much for you today.
We were learning all about
Computer Vision and Custom Vision,
how you can enrich your apps with AI today.
Thank so much for watching. My name
is Seth Juarez. See you next time.