Thứ Năm, 9 tháng 11, 2017

Youtube daily videos Nov 9 2017

Watch: BTS's Jungkook And Jimin Share Fun Self-Made Videos From Their Trip To Japan

BTS members Jungkook and Jimin have shared fun and impressive travel videos from their recent trip to Japan together!.

The pair recently got to take some well-deserved time off in Tokyo, where they traveled the city, enjoyed the sights, took a trip to Disneyland, and more.

When they got back, they gifted fans with some videos theyd filmed and edited themselves that show what a great time they had on the trip.

Soompi. Display. News. English.

300x250. BTF Soompi. Mobile. English.

300x250. ATF.

Check out the videos below!.

나랑 꾹이 첫 여행기 1?? — 방탄소년단 (@BTS_twt).

나랑 꾹이 첫 여행기2 ??? — 방탄소년단 (@BTS_twt).

After Jungkook posted his video on November 8, Jimin replied via Twitter, Jungkook, if Id known youd make something so cool I would have dressed up a bit and I wouldnt have done weird things.

Ill dress up next time.

Thank you for making the video, you must have worked hard..

정국아 저렇게 멋지게 만들 줄 알았으면 형이 좀 꾸미고 이상한 짓좀 인했을텐데 다음에는 형이 꾸밀게 영상 만들어줘서 고마워 고생했겠다 — 방탄소년단 (@BTS_twt).

BTS recently wrapped up promotions for their hit mini album Love Yourself: Her.

They will soon be headed to the United States for events including their , and an appearance on  They are also.

For more infomation >> Watch: BTS's Jungkook And Jimin Share Fun Self-Made Videos From Their Trip To Japan - Duration: 2:36.

-------------------------------------------

Superheros Vs Wild Animals Fights Frozen Vs Spiderman Dance Videos Dinosaurs Surprise Eggs Videos - Duration: 10:34.

Superheros Vs Wild Animals Fights Frozen Vs Spiderman Dance Videos Dinosaurs Surprise Eggs Videos

For more infomation >> Superheros Vs Wild Animals Fights Frozen Vs Spiderman Dance Videos Dinosaurs Surprise Eggs Videos - Duration: 10:34.

-------------------------------------------

Whatsapp status tu ghar aaja pardesi | whatsapp status video | sad love status video - Duration: 0:31.

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

Subscribe my channel~~NR MUSICAL

For more infomation >> Whatsapp status tu ghar aaja pardesi | whatsapp status video | sad love status video - Duration: 0:31.

-------------------------------------------

LOS MEJORES VIDEOS DE RISA #2 HD - Duration: 20:19.

For more infomation >> LOS MEJORES VIDEOS DE RISA #2 HD - Duration: 20:19.

-------------------------------------------

iPhone X Destruction Videos Are The Worst (RANT) - Duration: 9:52.

What's going on French boy Connor, how welcome back to another video and today, I'm gonna be answering this simple question

What's inside an iPhone X a bunch of electronic stuff?

That's a good enough answer for me, but for apparently millions of people that's not enough

So let's take a look for ourselves my favorite unboxing channel on YouTube known as what's inside but before we get started as always with

Channels like these where I don't really think the people themselves are bad. I just don't really like the content

Please don't go some hate these people's ways as a commentator

I'm far more concerned with the popularity of this type of content that I am with the actual creators themselves

It's hard to fault people when all they do is find the niche have success in it and decide to just keep going with that

Because why not and because I think it's worth mentioning the father and son in these videos that you're about to see

They're actually pretty cool like the father has some really good information on some of his videos the son doesn't really seem like he has

An inflated ego despite the fact that their channels had a lot of success and they leverage their YouTube platform into doing really cool

opportunities like a bunch of work with

nonprofits and other international

Ventures like that which I have a huge passion for so it's hard for me to really say that these people are bad news

They're not they're doing their job, and that's fine so again not hating on these people

I just don't like the idea that smashing stuff on YouTube can get you

Popularity money and like whatever else that comes from that it just seems kind of ridiculous with all that being said let's see what's really

inside my phone axe

a

$1,200 poop emoji maker Oh

No

Wow what a masterpiece here's the thing with intros on YouTube they can be really cool and

Successful when they're done right, but if they're not done right. They might as well not be in the video

I wouldn't not the hard way with my first couple of videos

I made it intro just because I thought everyone usually if you have an intro and

Turns out wasn't funny wasn't that good, and I knew that when I made it

But I just thought it might as well

just leave it in here because

It's what people on YouTube do right the problem is people shouldn't just follow what everyone else does do what's good

Do it make sure your content better in this case just throw it out of your video, but a side note

I'm definitely not opposed to having an intro so if any of you are good at making intros want to make me an intro

Wanted to channel art whatever. It is hit me up the d-ends. Let's top okay. Sorry. I keep getting sidetracked

focus Connor come on focus

Hold on it's my door locked

Okay, thank you. God. Thank you

It's the dog poop emoji

inflatable giant poop emoji costume 60 bucks

iPhone X

1200 bucks

Embarrassing your kid in front of three million plus people on the Internet

Priceless still better than I do five

Okay, that's it

That was that was unfair that I would way too far this guy his son are nothing and I'm saying this very

Sincerely it's gonna. Send our nothing like daddy. Oh child abuse

I think it's ice-cream. It's nice. It's ice cream. It's chocolate ice cream with eyes on it

Dad's am I right so the joke here is the dad is saying that poop emoji looks like a pile of chocolate ice cream

I'm making it. Let's get some eyeballs on there and then

How does the ice cream do with the iPhone 10

Now you know those are the jokes you're gonna get when you look at a channel. That's primarily catering towards kids

It's hard for me to be like ah these people are so terrible

It's just a bad joke is what it is kind of comes with the territory of these technology channels

This feels really good at my hands. You sadistic pieces you're gonna have so much fun

Just destroying that phone all of us watch aren't you hey everyone look at how amazing this $1,200 phone is

Be a real shame if I blast it through it with a saw

How do you explain that occupation your friends and family? What do you do for work? Oh nothing me and my son just

Destroyed the latest iPhone like you know six figures a year doing it

It's kind of like a passion project, but except. It makes us a lot of money

I want to see this videos low effort, but that's not really the right term

I think it's more just like this type of content a solo effort

It's like everyone on the internet nowadays is making a video on the iPhone X. I mean, I know it's

But like guys we get it Apple made a new phone. Oh my gosh everyone

There's a new Apple product

Y'all have to smash it and do it and do whatever because it's the new iPhone the new iPhone everyone the new iPhone

It's understandable why these videos succeed on one hand it's like ASMR for the eyes and on the other hand

Everyone has curiosity everyone wants to know different things

And so if you think an object like a basketball and say what's inside a basketball people might think oh?

What's inside of us when I really thought about that?

but the problem of how this stuff is that none of these creators try to be unique they all hop in the same trends they

all make the same corny jokes and none of them really try to find ways to be unique and

Despite that videos like this one rack up millions of views for creators even though

They really don't put any sort of effort into this besides just the editing like you're telling me this requires effort

That's insane, I mean come on guys this cutting an iPhone in half

There's the two batteries

Insane reaction, dude you know what at least his kids on overhyping everything like Jake and Logan ball

Holy SH, dude I mean that it's actually getting scary on second thought keep doing you kids

We don't need anymore Logan Paul's in this world just be you I'm down for that we

Open so you don't have to you. That's so thoughtful

Thank you guys so much because whenever I buy a phone for $1,200 my first thought is what's inside this phone

Let me slice this open with a machine

It's like you could just unscrew the whole thing

But of course you don't want to be as your viewers aren't gonna watch that I just don't get the popularity of breaking stuff on

The internet spending $1,200 on a phone to destroy it it's like you're creating

potentially toxic waste for the environment you're doing all this stuff, but

Breaking stuff just isn't that cool breaking plates wow dude Logan cool. You like to break plates breaking an iPhone

That's insane guys cool. Just stop breaking stuff guys. It's not break

You don't need a break stuff to make a funny video or an interesting video of that so that's gonna be it guys I know

It's not some long expose on some crazy terrible channel

That's ruining the internet, but you're not gonna be able to get that every week

We don't have enough of channels like that on YouTube without all being said I hope you enjoyed this video

It was fun to make and I want to put my two cents in on

As to the whole new iPhone stuff life advice to young people out there don't buy a new phone to your old one breaks

The Apple Samsung all these companies advertising these new phones none of these phones really change in the world you might have like animated emojis

Cool, you know I don't and I don't need animated emojis in my life

If you do go ahead and buy it but spending $1200 on a phone

You can probably find better ways to spend your money as someone who's owned a flip phone an iPhone and pretty much everything in between

No new phones, not gonna change your life. Dude unless your phone's broken

Don't get anyone like the saying goes if it ain't broke don't fix it anyways

I do. Hope you all enjoyed the video if you did be sure to subscribe and leave a like it really helps me out

I just have to say guys because I don't know if having to hit it before this channel comes out or right after but

We're pretty much out 100 subscribers, and that's crazy

I know it's still nothing and I'm not gonna do this every time

I hit another 50 subscribers, but it is really cool to see that I have triple figures on subscribers

and I got a video that had a thousand views so it's kind of cool man like it's it's cool to see this progress and

I so much appreciate all of you guys saying you know you can see this channel going far

That's the goal like I'm trying to make this a big thing so join the bandwagon

Hop on while you can you're one of the very first hundred people watching this channel, so that's pretty cool

Be sure to subscribe if you're not already

I keep using this in my videos

But go follow me on social media at your boy car now pretty much every social media platform. You can find DM me

Send me a message whatever you want to do guys. I want to talk to you, so hit me up. It should be fun

Thank you so much for watching as always if this video gets a million likes

I will wear a hat the next video so be sure to smash that like button your challenge for today read a chapter of

Any book I'm taking this challenge myself because I have like three books that I'm trying to read I I keep forgetting to read them

So read a chapter of a book until next time. It's been your boy Connor. How I hope you got a great day

yes, some please feel on the face like the one and warm up for ian's with a

For more infomation >> iPhone X Destruction Videos Are The Worst (RANT) - Duration: 9:52.

-------------------------------------------

Best Video Editing Software For YouTube 2017 - Duration: 12:08.

Are you Looking For The Best Video Editing Software for Youtube

Lets Talk About All Free And Paid Video Editing Software in This Video

Which Are Available for both Windows and Mac Users

Hi My Name Is Piyush And you Are Watching Tech Guru Academy

Lets Start

So First Software Is ShotCut. This is a Free Software Which is Available For Both Windows And Mac Users Lets Talk About The Pros And Cons Of This Software

Pros Of This Softwares Are

Basic Joining An Trimming

You can add 2 Videos Or Add Pictures As Well

Second Is in this Software you Get Basic Level Of Effects

3rd is you can Add Titles and Text in Your Video From This Software

And the Biggest Pro Of this Software Is Its Absolutely Free

Lets Talk About Cons Of This Software

This Software Do Not Supports Advance Features Like Motion Tracing Or 360 Video Support

Second Software is HitFilm Express

Its a Free Software And Available for Both Windows And Mac User

Lets Talk About The Pros And Cons Of This Software

Pros: Support 2D &3D Composition Comes With Transition within editor, you Can Unlimited Tracks And Its Free Software

For more infomation >> Best Video Editing Software For YouTube 2017 - Duration: 12:08.

-------------------------------------------

NEW WHATSAPP STATUS | WHATSAPP STATUS VIDEO | #LOVE VIDEO 30SECOND | #LOVE STATUS VIDEO FOR WHATSAPP - Duration: 0:27.

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

Subscribe my channel-----{{NR MUSICAL}}

For more infomation >> NEW WHATSAPP STATUS | WHATSAPP STATUS VIDEO | #LOVE VIDEO 30SECOND | #LOVE STATUS VIDEO FOR WHATSAPP - Duration: 0:27.

-------------------------------------------

Best Funny Videos 2017. Good music :) - Duration: 3:10.

For more infomation >> Best Funny Videos 2017. Good music :) - Duration: 3:10.

-------------------------------------------

Try Not to Laugh or Grin: Funny Animal Videos - Funny Pet Animals Compilation [part 3] - Duration: 10:21.

Thanks for watching.

funny dog dance

funny animal videos

try not to laugh or grin while watching this

this cat doesn't need music for dancing

funny animal fighting.

Cat says - Beer is mine, Human.

funny cat

cat fails

kid at aquarium

this is a laugh, when you have to laugh on bad joke

funny puppy

talking cat

I love you

funny horse

funny pets playing together

lazy cat

kids and pets are best friends

cat can find place anywhere

funny animal vines

funny dog

funny parrot dancing on music

try not to laugh

funny farm animals

Please don't forget to like, comment, share and subscribe!!!

For more infomation >> Try Not to Laugh or Grin: Funny Animal Videos - Funny Pet Animals Compilation [part 3] - Duration: 10:21.

-------------------------------------------

Funny Cats Playing Ball (Like Soccer Player) with Each Other Video Compilation - Duration: 8:28.

Welcome

Cat Lovers

Kitty Cat Like Soccer Players.

Cats playing in my house.

Long time.

Full of energy.

Original uncut video.

Playing Nonstop

My best video ever.

Please Like, Share & Subscribe.

Thanks

For more infomation >> Funny Cats Playing Ball (Like Soccer Player) with Each Other Video Compilation - Duration: 8:28.

-------------------------------------------

What About Next? WSV CHANNEL | Funny Moments | Laugh Zone | Funny Videos - Duration: 0:37.

For more infomation >> What About Next? WSV CHANNEL | Funny Moments | Laugh Zone | Funny Videos - Duration: 0:37.

-------------------------------------------

Best Of Dubsmash Videos - Duration: 1:12.

Plz Subscribe My Channel

Plz Subscribe My Channel

For more infomation >> Best Of Dubsmash Videos - Duration: 1:12.

-------------------------------------------

How To Make Professional Type Animation Video - Make Real 3D Cartoon [Urdu - Hindi] - Duration: 22:24.

How To Make Professional Type Animation Video - Make Real 3D Cartoon [Urdu - Hindi]

For more infomation >> How To Make Professional Type Animation Video - Make Real 3D Cartoon [Urdu - Hindi] - Duration: 22:24.

-------------------------------------------

French bulldog funny | Best Cute French BullDog Puppies Videos Compilation 2017 | Part 110 - Duration: 4:58.

Thanks for watching

For more infomation >> French bulldog funny | Best Cute French BullDog Puppies Videos Compilation 2017 | Part 110 - Duration: 4:58.

-------------------------------------------

500 Jahre Reformation: Christen gesucht | Deutsch lernen mit Videos - Duration: 5:00.

For more infomation >> 500 Jahre Reformation: Christen gesucht | Deutsch lernen mit Videos - Duration: 5:00.

-------------------------------------------

Thanksgiving Videos for Kids | Songs and More for Children on Tea Time with Tayla - Duration: 14:19.

hey everyone in the month of November we celebrate Thanksgiving during this

special holiday time I like to remember all the many things that I'm thankful

for and since there are so many I decided to write a song about it called

I'm thankful and my two legs do walk I'm thankful for my mouth so I can smile

it I can talk I'm thankful for my tummy it loves to eat their food and be happy

and in such a peppy boo I'm thankful for my heart and how it keeps us beat

I'm thankful from my head down to my feet I'm thinking I can hear and I can

smell it I can see I think Oh like you taste inside check people just leave me

i i'm truly big bow yep a thanks for listening to my Thanksgiving song oh man

I'm just so thankful

hey everyone and I think about Thanksgiving there's so much to be

thankful for i bankfull four people in my life like my mommy and daddy and my

sisters and all of you my friends I'm also

really thankful for animals especially my pet dog mr. George he's thankful too

I am also thankful for other things like my eyes to see can you blink your eyes I

think we'll fur my ears to hear whoo that's how I hear noises I'm thankful

for my nose to smell mmm smells are good I'm also really thankful for my mouth so

I can talk I love to talk and sing and dance and play and tell jokes with my

mouth would you like to hear someone like Thanksgiving jokes you can share

them with your family on Thanksgiving I'll tell you some okay here's my first

joke what kind of key can't open any door do you have a guess

it's a turkey get a key open the door but not a turkey oh that's so funny

okay here's our next joke what is the most musical part of a turkey do you

know mr. George what about you it's a drumstick

turkeys at drumsticks Oh they'd probably like to play in a band

okay here's joke number three what is a scarecrows favorite fruit mmm

I like so many different fruits what do you think a scarecrow likes to eat I bet

he likes strawberries scarecrows are stuffed with straw so they must like

eating strawberries oh that was so silly here's joke number four what do you get

when a turkey lays an egg on top of a barn you get an egg roll that egg would

just roll right off the barn oh that's fun here's joke number five what side of

a turkey has the most feathers hmm that's a tough one it's the outside

there's no feathers on the inside of a turkey you're so silly

now let's do joke number six who is it hungry at Thanksgiving I'm always hungry

at Thanksgiving so who wouldn't be hungry we'll all tell you who the

turkeys he's already stuffed when dressing of course I love to eat turkey

and dressing hears joke number seven where does Christmas come before

Thanksgiving hmm well Thanksgivings in November and Christmas is in December so

how could Christmas ever possibly come before it I know in the dictionary

because C comes before T Christmas starts at the letter C and Thanksgiving

start - the letter T oh that's a clever one okay my last joke is a knock-knock

joke that means I need your help I'll say knock-knock and you say who's there

let's try it knock-knock Gladys

GLaDOS Thanksgiving are you glad it's Thanksgiving me too well that's all the

jokes I have for you now I hope your family gets a lot of laughs at

Thanksgiving

join with me five little turkeys jumping on a bed one jumped off and bumped his

head Tayla called the doctor and the doctor

said no more turkeys jumping on the bed four little turkeys jumping on the bed

one jumped off and bumped his head so Taylor called the doctor and the doctor

said no more turkeys jumping on the bed three little turkeys jumping on the bed

one jumped off and bumped his head so Tayla called the doctor and the

doctor said no more turkeys jumping on the bed two little turkeys jumping on

the bed one jumped off and bumped his head

Taylor called the doctor and the doctor said no more turkeys jumping on the bed

one little turkey jumping on the bed one jumped off and bumped his head

Tayla called the doctor and the doctor said no more turkeys jumping on the bed

that's such a silly song let's sing it again five little turkeys jumping on a

bed one jumped off and bumped his head Tayla called the doctor and the doctor

said no more turkeys jumping on the bed four little turkeys jumping on the

one jumped off and bumped his head so Taylor called the doctor and the doctor

said no more turkeys jumping on the bed three little turkeys jumping on the bed

one jumped off and bumped his head so Taylor called the doctor and the doctor

said no more turkeys jumping on the bed two little turkeys jumping on the bed

one jumped off and bumped his head Tayla called the doctor and the doctor

said no more turkeys jumping on the bed one little turkey jumping on the bed one

jumped off and bumped his head Tayla called the doctor and the doctor

said no more turkeys jumping on the bed turkeys are silly

we sure have a lot to be thankful for this Thanksgiving I'll see you next time

i'm teeter i'm I'm thankful for teachers teachers help you learn new things and

they really care about you a whole lot today I have a special teacher with me

her name's mrs. Montgomery hi bananas let's learn how to spell the word thanks

that sounds great will you teach us yeah here it goes T H a and K s that's both

thanks bananas let's try it together t H a in K s did we do it hey good job

bananas well mrs. Montgomery thanks for B

on tea time today we all really appreciate you make sure you tell your

teachers how thankful you are for them I wanted to talk to you about something

that I'm really thankful for water water is really important then we have to have

it to live here's some fun facts for you did you know that 75% of the earth is

covered by water there are five oceans we have the Pacific Ocean the Atlantic

Ocean the Southern Ocean the Indian Ocean and the Arctic Ocean

that's all five and that's a lot of water when you put water in the freezer

it turns into a solid that's what ice is and I love ice cubes

almost 22 billion plastic water bottles are thrown in the trash

every year that's a lot of plastic going to waste so I decided to do my part in

helping save the planet and I use a refillable water bottle it's pink and I

love it all this talk about water makes me thirsty I'm gonna go fill up my water

bottle each of our body parts has a different job to do and they're all

equally important why don't we start at the top of our bodies and go all the way

down to the bottom and learn about all the amazing things that our bodies can

do first we have our head our head holds really important things like our brains

that's what we use to remember things on the outside of our head we have two eyes

to see I noticed the smell two ears to hear and a mouth to talk and giggle next

we have our arms I have shoulders that I can shrug like

this I have elbows that I can bend I have

hands that I can shake and ten fingers that I can twinkle the middle part of my

body is called my torso there's lots of important muscles in there that help me

twist like this that's so much fun the last part of our body is our legs I have

two thighs that help me run really fast and I have knees that I can bend and

bounce on I also have two feet that help me jump real high and then I have ten

toes that I can twinkle there are a lot of important parts in our bodies did you

know that we have 206 bones some are really big and some are really small but

they're all really important remember not everyone has the same type of body

we're all different and that's what makes us so special I'm thankful for the

body that I do have so I'm going to do my best to take really good care of it I

hope you do the same if you want to see more it a grown-up's

help and visit tea time with Tayla calm to watch videos visit tailless store or

even send her an email

For more infomation >> Thanksgiving Videos for Kids | Songs and More for Children on Tea Time with Tayla - Duration: 14:19.

-------------------------------------------

Salman Khan's house in Mumbai Inside Video MUST MUST WATCH - LATEST PICTURES - Duration: 2:49.

salman khan house inside pictures

For more infomation >> Salman Khan's house in Mumbai Inside Video MUST MUST WATCH - LATEST PICTURES - Duration: 2:49.

-------------------------------------------

Social Media Video Compilation - Duration: 4:43.

[Music]

What's up, everybody?

My name is Kennedy, and today is Thursday.

On Thursdays I do whatever I want, and I had recorded a video.

But I decided that I didn't like it after watching a couple seconds of it.

So instead of uploading that video, I make social media videos.

So if you don't follow me, I upload them every Tuesday and Friday.

So if you would like to see those when they come out, go follow me on my Twitter, Facebook,

or Instagram, or all three of them.

Everything is @MissKennedyErin, just like my YouTube channel.

Um, so you can watch my social media videos when they come out.

Please enjoy this compilation of the social media videos that I put out on my

social media.

For more infomation >> Social Media Video Compilation - Duration: 4:43.

-------------------------------------------

Video Understanding: From Tags to Language - Duration: 58:29.

Okay so I think we should start.

It's my greater pleasure to have Gan Chuang here.

He's interviewed for a position in my group and he comes with

a rich communication background, his adviser is Andrew Yao.

I don't know how you could pick up a research topic which is

totally out of your adviser's domain of expertise, but

Chuang has a well rich experience.

He had did a widget in research student enrolled in Stanford,

Tsinghua, and also did a bunch of internship reimburse in both

Microsoft Research Asia and Google Research.

So, Chuang, the floor is yours now.

Look for it right.

>> Okay, thank you Gan for the introduction and okay.

Today it is great honor over here to present my PhD work,

with understanding from tags to language.

And currently we are in era of big multimedia data.

In a single minute,

they're about 300 videos uploaded to YouTube.

Then the natural question we will arrive,

how can we organize this large amount of consumer videos?

Okay, let's first take a look how YouTube attributes goal.

YouTube conduct video search based on the keyword match

between user queries and the video metadata such as title,

description, or comments.

Yes, this is exactly a very effective way and

efficient way to conduct a video search.

But there also is some problem for the video metadata.

The first one is the video metadata could be noisy or

irrelevant.

Since the user uploads a video to the website,

we cannot expect them to give a perfect title that exactly

matches the user's searching behaviors.

And secondly, we also observe there are a lot of content

on video that don't have the video metadata at all.

So, both of these case, we do certain we will be fail.

And it will be a pricey need to conduct content-based video

analysis.

And my research focused on understanding human actions and

the high-level events from Web and consumer videos.

Particular, in this talk, I will first introduce my efforts,

like designing effective network architecture to learn a robust

video center representation.

My paper published in CVPR 2015 is about the pioneer work,

the active learned work,of true success on the video

recognition task.

And we also provide the best model TRECVID multimedia event

detection competition, and also this year,

ActivityNet challenges.

We also provide the single best model on the YouTube 8

Million challenges.

Even though super active during our work for reasonably good,

performance for the video recognition, but we cannot

expect this kind of super active learning can scale up.

So in the second part I will introduce our effort that

learning video recognition from weak supervision.

First, I will show how web crowd, web images in the video,

can replace the human knowledge example to conduct

video recognition.

Now secondly, I will show that how we propose a new measure to

conduct zero-shot video recognition but

connect it with knowledge base.

At the end of my talk, I will introduce our most recent effort

that connects video understanding with the language,

through attribute visual captioning and

the video question segmentation.

Okay, let's first start from the video recognition part.

Currently, there are three challenges for

the video recognition.

The first there are that data has large interclass variations

and also inherently is very complex.

Make video recognition become super challenging with subtopic.

And secondly, it's very intensive and

a lot of time consuming to notate a video.

This is all the reasons why current video recognition

data science, all comes treated to handle classes,

much smaller than the email data.

Certainly we also realize there area wide number of video

concepts, because video concepts always consists of actions,

thing, objects.

So the combination is very complex with the video will

carry through product.

To protect the first challenges is we propose as we learning

video representations using Deep Neural Network.

Okay, so

the problem we have to solve is that given a video for testing,

we not only have to give it a event label such as this video.

We name it as a tenable trait and of course we also want to

localize the spacial-temporal key evidence.

For example in the search needs

to be contained in the background.

Also it's a pack deal.

It's a pack deal instead of a background window because

believe me it's a timing back trait.

And to achieve this goal we propose a trainable DevNet work

and out of DevNet work I create, it's there but I see end it from

work, we first pre-train the network using the image data and

then we fine tune it on the video data side.

However different from the existing approach that he takes

by that nature, video classification as an image color

scanner, which I mean that is he takes the key frame in and

the fifth floor of the network.

And finally they upgraded, so let's go over all the key frame

tools to achieve a final redirect key chain.

We see the video is a natural hyper temporal dimension,

our mutual tag and language information contained in a video

to achieve the video clarification.

So the first time your Internet can be proposed,

which video from a single image to model key frame we

call that actual video segment.

Then we put the video segment off input of the network.

As the second contribution that we proposed a learnable feature

of which and

layer here is we naming is the cross-frame max pooling.

That's to learn how to create the first orientation from

multiple key frame to a single writer and

then input key frame for that location.

Equip with the two technique contribution and

we achieve the passed the dot and

the tri-media multimedia event detection data size.

I want to mention that we boost the previous stop by red hills

40%.

And the traditional problem of input into artificial vector,

it told me the video in here for several years.

And our paper is about the first to said on

the Deep Neural Network that beat share of recommendation.

Since this is also apparently where I leave off I welcome

understanding why Deep Neural Network, can achieve with so

good results..

So, we visualize this spacial-temporal setting to mic

of the understanding what actually the Deep Neural Network

learned to localize during the training.

We find that it definitely looks like some virtue tool,

very interesting,

we definitely have some very interesting information.

For an example the attempting a bike trick,

if the region with the people right back have a high

response region and also for the dog throw,

the top region also have higher sentence to go.

Another very interesting observation is about the fourth

event for them, the playing back, playing fetch.

Actually, from my understanding,

I don't know if the person more important or

the dog more important there to distinguish playing fetch.

And also applying of shell since the detection approach in

the right most, it told me that a person is more important.

It's about more saliency but when I put them into key frame,

it told me that the dog but

if we need to disseminate the key effect.

So there's somehow indicate that the machine,

understanding the resources and

we somehow now toy around with our human communication system.

Even though this paper published two years ago has already

received quite nice attention, for example,

I want to mention to work upon the video network having done.

And the first work is the winner of last year ActivityNet

challenge, the main Temporal Segment Networks, and

then in the segment of pulse, sorry.

>> [INAUDIBLE] >> Sure.

>> The last one, they used to speak in the whole,

are you somehow against some shot from the video but

that's before you feed it.

Yeah answering that question we definitely first decompile

the video with the different segment, and

we only extract the middle key frame as the input into

our network to save computing time.

>> How do you decompile >> Yeah,

well actually it would be based on the kind of the change.

And that means we can detect the boundary, and

then we pick the meter here.

>> So if the decompile is very bad,

that's the effect of the final resolve of the key frame?

>> Yeah.

There's many papers talk about how we can

be clearer to the beta segment.

In the from our region,

it definitely will, you put >> Are you

using emotional information or static cues?

>> In this paper we do not use emotional information and

in the next slide I will introduce something about how we

frame with most information.

So let me continue my talk first and

actually favorite part which we are and around Mac in two work.

The first right camera segment work in the beginning of last

year I kid you not we do crisis challenging.

And reading this technical report the color indicate that

they are satisfied about my first tiny contribution that

he used in the video segment.

Instead of key frame has improved to improve the

traditional tertiary network, and one step further of that

thing also applies to measure, to obtain a float, the input.

I mean, you had a question about the modeling of information.

And the second one I want to mention is the winner of 2015.

Too many changes.

Actually they have the same idea of my second technical

contribution, they just started off directly averages the summit

education.

They're also using efficient coding,

but instead of using our method, they use

a more advanced method of coding to do the future vision.

And also our work has received attention in the video

localization, and also the network interoperability.

And I think you will hear more interesting about the recent

progress, and in this year we've been there, so

I'm not sure how many people know the ITT night.

Okay, they're actually it's more like a figure of the and

the year, they won the competition this year but

also take part in the competition.

And here I want to share you something why we can win

these challenges.

Some technical information.

But first of all I want to show you the proposed super deep.

Because our original observation is that the big LSTM for

video recognition is not useful, and

then we took this conversation from some small data set.

Such as UCR one,

MT it means that there's only 10,000 videos here.

And we speculated route, the therefore deeper RTM,

maybe from two reason.

The first of them maybe the dataset not enough and

now we don't need that big model to capture the data.

And second it may be that we're not clarity turning the network

architecture to make it deep.

So to figure out what it entails different RPM in the world for

video recognition we first tacked on the YouTube

eight million there, because the emergence of YouTube eight

million and also the large scale ITP net.

That, so that they, because they have a multiple just how

statistically a bigger set of video that they provide our,

our opportunity to reveal our convolution.

That when we now increase the tabs of the RPM,

it's still until bad performance.

And they're telling me the find is actually a maker for,

because we, we expanded the rhythm there for the people are,

the people are working on the harmonic recurring unit and

make the bar, bar right as it's created.

It's not worthwhile.

So here, we propose that we add shelves,

we analyze fast four connection between two recurring units.

And with this clarity,

we found that the model can work faster and knows better and

into a better performance than the shadow network.

And the second one I want to show is about we proposed

a purely attention network because we are inspire by

the recent successes from the Google brand, they say that it's

purely using the performance on the translation.

And we tried the same, it's a very similar idea,

on the temporal mode, we definitely observed guys,

only using the temporary cells RPM.

We really, it's also a really good performance.

A different further from proposing a shift in

the temporal layer, it verifies but we introduce some

normal parameter here and we hear about sixty-four.

And to learn a parameter, for

that to learn that works with a tangent here and

then we command them for better redirect.

And the third model we propose is about

using the temporal CNNs networks to replace the audio camera and

other recurring units.

The idea here is that we find that it's also you spare by

the reason to sell that they propose using kind of sequence

to replace the camera.

And we have the similar data in that, you simulate using

the count of to go through the frame.

It is really good compared to the camera framework.

And last one I want to show you exactly, also revisit measured

when we train the network for video recognition.

So in tutoring here, we find that for video recognition,

we do not using all the video segments for

the video investigation.

We only need a small work site.

So here we propose that during the competition,

we denote that probably the all the key frame with this element.

So like examples, it could be you and

also there are some diversity constraint to

bankroll with the segment to do the front-editing.

And we found that if you entail better to work and

also better performance.

Here I introduce others about how we design a network for

the video recognition but I'll

start at the beginning so that we can it video recognition.

So in the second part we introduce how can we learn

redirecting here is that here in that I put a query

on Google Image search engine.

Google YouTube will be the search engine that we fund.

The retruning images video are always quite irrelevant

to the search query.

So great that using this Google Chrome images and the video to

replace the tool even at work for the video recognition if we

have a tool, it'd be very cool to scale up the video.

With our contract, even though this is a really cool thing

about the For the web video if you start to have problems

with the unfocus any of that you can see the first symbol?

The mopping floor,

most of the video frames enough about the mopping floor only

a small part in the middle is it belongs to the mopping floor.

So video and trim only a small part of the frame

are relevant to the actual expense.

And the video images has the benefit of it because it always

capture and it also contain the highlight of action but

it may suffer from problems and also the YouTube.

You can observe the second example for the juggling ball.

So we can see the image of juggling ball.

So we can now process the images in one way on the video domain.

And for the third row we find that's the baby crow by that

designates the baby crow.

But they are in the right background.

[INAUDIBLE] With [INAUDIBLE] So we can now impact,

as a such clear image on the [INAUDIBLE] Video domain.

So, it seems very difficult to remove the nodes similarly

we're using [INAUDIBLE] Because [INAUDIBLE] Learning,

we do not have any guidance here.

But we have a key innovation here that we found out even

though we cannot separate remove the notes from web images, and

web videos, but if we can we can have some chance.

So the interesting thing is here that we found, that relevance

of images on the video frame are typically visually similar.

But there, the relevant part has their own distinctions.

For example, that for the nodes from the web images,

they have some logo of them Local images,

it's now always occur in the middle frame.

And then so

now as part of middle frame we always some background for them

With not occur in the I mean, in the bad image on those part.

So, it seem that the problem [INAUDIBLE] So we can

do the mutual filter in that [INAUDIBLE] So similar part.

[INAUDIBLE] steam power [INAUDIBLE] Together.

Best of this innovation, we propose [INAUDIBLE] Network.

And the first, take the video keyframe

of the input out [INAUDIBLE] Leading that work.

And this step works to [INAUDIBLE] Ability to

distinguish between the relevant video frame [INAUDIBLE].

But it has some benefits,

that it can be used to distinguish relevant images.

Because, I tell you that the nodes part of, I mean the nodes

part from my video and [INAUDIBLE] Are very different.

So the [INAUDIBLE] Highlight, and

also relevant to the query will be [INAUDIBLE] So

we can apply leading network to the images and the [INAUDIBLE].

We can base it on the protocol to set some thresholds, and

then The a highly related web images.

And since our images always contain the highlight, so

you can be pulled into the network.

It's in the network.

And in the network, you have the ability to localize the action,

and it then can be back to trim the video frame.

So the iteration both of narrows of the web images, and

this frame are removed, so we can send trim to temporary model

to model temporary information.

Here, we show them lately starting comparison results.

Further result of, when we first train our images,

are we Around 50% accuracy?

If we train our We will improve our what is over 10%,

it means that Is very useful for the recognition.

And we also compare with the For other For example,

between two model and we use the To model.

If the other Single using this model.

Another which we try to measure [INAUDIBLE] Image and

video together and network.

And it also [INAUDIBLE] Is still, I mean the performance

to still [INAUDIBLE] We first turn on video frame,

and then we using the video [INAUDIBLE].

>> Do you see [CROSSTALK] >> Yes.

>> But you know videos typically will have [INAUDIBLE]

Q hich [INAUDIBLE] don't have- >> Mm-hm.

>> So, did you try changing let's say the early filters,

which are most specifically [INAUDIBLE] Rather than mixing

them up?

>> Yeah, this is why I separated the Of the noise field,

noise mixing, and also the mixing.

The noise mixing, that will be just for.

And I mean,

the main thing that we also apply our filter emitter.

And then we mix all the image With the neural network.

And you can see that Also, is how batteries are.

And then, the problem that you mentioned that create beautiful

images and videos have a lot of labelled noise, right?

>> Yeah. >> So,

it's not clear to me how you're addressing that problem,

the packages?

>> Okay, this is what I described with the ladies

network, they were general understanding here.

So, the general Said noise of the video and video app and

mixed the irrelevant part of a different delay, but

the relevant part is still similar.

So this thing is that your Network that you see some

similar example, they will have higher detections to go, right?

>> Okay.

>> And same, the relevance about images may be So our images is

very similar appearance of the video frame, we will be wrong

here, and it's not a very similar part, we will run lower.

So then, we can separate the size of the hold,

to remove out the nodes of images.

But the,n the relevant, that relevant image we selected.

But, the images have to benefit it,

when we said they are always highlighted in red.

So, you can put them into the network to further

fine tuning it.

So, it is seems to more highlight our images, so

the next program,

you have the benefit to distinguish your highlight.

Things can be back to test on the video,

to like the top runs the video highlight.

>> Kwan? >> Mm-hm.

>> So when you the test the work chrome data to find related

data, do you some attention, or you test the diary bunch?

>> So this And you know Went public

We are not using We are just using For attention.

And while we definitely find some thorough artwork,

I think probably I am going to take a measure.

Yeah I think [INAUDIBLE] >> Also you can teach some

positive data mining.

Did you try to put a repeat this root back into

hash [INAUDIBLE] Collect data and retrain?

Did you find any benefit to repeat this process more than

one times, of mining posting data?

>> Yeah, it's a question Because in my paper, I only do one

iteration With multiple iteration [INAUDIBLE]

Yes, actually that question is

also asked by a reviewer is also asking me this question.

We definitely found that during multiple iterations,

it held maybe 0.2, some improvement, but big thing,

I think, also have some Curve [INAUDIBLE].

>> Sure, also there could be a perimeter data set bias, right?

Between the images and videos, which you are using for

doing the big Version?

So, have you tried training on one kind of data set, and try it

on a completely different data set to see the generalizations?

>> The [INAUDIBLE] Yeah,

this is actually what the results [INAUDIBLE] Here.

I want to [INAUDIBLE] Here so it's we also tax on [INAUDIBLE]

That we call all the video from [INAUDIBLE] Parts of data,

because it's more like [INAUDIBLE] Inquiry.

That means that, there have some We call it the testing video

in a secondary way.

And we also compare it with performance with other method,

we also find that it definitely much improved the results using

the legacy network.

I think for the time [INAUDIBLE] To another part,

we can talk about it, but let's talk offline, yeah?

Okay, even though the Port can achieve reasonably good

performance on the video recognition, but we cannot That,

we can turn the On the web data to I mean to become

[INAUDIBLE] video concept before hand.

So this is where I want to address problems of how can we

achieve some zero-shot video retrieval.

Is that how can we learn new video concepts without any

passive data.

Like how can that be fair?

And the problem is that, you came with a tactic query here,

and also you came with the video.

I'm gonna them with the and

maybe they're 20 [INAUDIBLE] okay maybe return each of that.

Each return to indicate [INAUDIBLE] video relevant to

the query.

So it's more like a [INAUDIBLE] approach and the traditional

approach so it probably that he [INAUDIBLE] the video and

the [INAUDIBLE] query into [INAUDIBLE].

[INAUDIBLE] to retrieval.

I agree this is very [INAUDIBLE] the retrieval based approach.

[INAUDIBLE] because you cannot

leverage in the inter [INAUDIBLE] category.

And the means video content category.

And to address the problem want to recognize the new

video content category with the soccer penalty,

even though we do not have a field hockey penalty typing

that if a video very similar to both field hockey penalty and

the soccer it will very likely will be the soccer penalty.

In other word, if a video have both a filter parity and

a software driver,

it will be very likely to be the software parity.

So then the problem return to with new video content recovery,

how can we identify the rate of from a content pool.

And to address the problem, when we're using a data driver to

is that we can many similarities between two concept names.

Let's give a concrete example.

For example, if there's a no load after a and

there is also a lot of between the front

with many pretty fun concepts and then we would have a ranking

score of all of the video concept names.

And then for this run we will find that the highly relevant

content would be [INAUDIBLE] and a big draw.

And the same thing with trending and ranking function that makes

the video very similar to the and run the higher.

And as some of videos something so

we will recognize the new concept.

And by the technique that we or to the best result on

multimedia event and the only secret I repeated with

is that we collaborating the relationship with instead,

the other team always trains hard to improve the, so

we're doing So we just quite different.

Credit a different domain.

Okay, let us go to I want to introduce some a more

interesting thing, a thing about [INAUDIBLE] but

first of all I want to introduce about is visual [INAUDIBLE]

it's about generating attractive caption.

Okay, The problem about the [INAUDIBLE] you [INAUDIBLE].

A lot of the [INAUDIBLE] becomes that many people

doing image captioning and many people will argue,

what's the real implication for the video captioning.

And we also find that the [INAUDIBLE] system always

generating a global description of the [INAUDIBLE] content.

[INAUDIBLE] and after that, instead of sending an email

to the people [INAUDIBLE] use smartphones [INAUDIBLE].

We can say that [INAUDIBLE].

For example.

You know make me smarter.

Pretty girl don't allow me.

So we don't now [INAUDIBLE] is more attractive to me and

also we find [INAUDIBLE] an another benefit is that.

You can see the images.

The images become popular,

not only because of the images content.

It's also because they have a perfect typo.

So and also another [INAUDIBLE] for

the demo is when I upload the images.

To a social media such as Facebook or Twitter.

It would always take me a long time to figure out or try to

type in a title to majors to attract more people to like my.

So if a machine can automatically it would be very,

very useful.

So and then we also find that using [INAUDIBLE] is

a kind of method [INAUDIBLE].

And this will be with [INAUDIBLE] generating

[INAUDIBLE] captioning.

And the methodology of our approach is that we propose

a [INAUDIBLE] and actually the [INAUDIBLE] is very easy.

[INAUDIBLE] We did compose the [INAUDIBLE] of

ITM in [INAUDIBLE] USV among the first

mentioned USVs don't equal to the [INAUDIBLE].

And so what we want to do with that during the caption

generation we want to sigh read the content part and

also the readers their part.

So we hope UNV Show the content of the using the to control

the content for images.

To do the steps.

And to achieve this goal images and the factual caption.

I mean if the and then we also have the monolingual romantic

sentence, a humor sentence that I am going to cover.

Since that is very hard to gather the [INAUDIBLE]

of the romantic sentence or humor sentence, so

we want to achieve a goal [INAUDIBLE].

So [INAUDIBLE] we want to [INAUDIBLE] general comments.

During the [INAUDIBLE] I met the first [INAUDIBLE] images using

the first here to generate [INAUDIBLE] is trigger including

the [INAUDIBLE].

In the folder for the second [INAUDIBLE] we were using

the [INAUDIBLE] to do the [INAUDIBLE] modelling.

Is that you came in on the fourth war to predict the second

war and so on.

And also, the third actually is also on the humor sentence.

Tomorrow the and with humor sentence.

And during the multitab we have the training.

We share the wight of U and the V among different tasks.

And S is by five.

You inform that the U and the V to control the content amidst

to [INAUDIBLE] on the image capture generation and

I still control this step.

And during the caption generation we can

just switch the style [INAUDIBLE] of s to generating

the [INAUDIBLE] whatever we want.

So if we want it generating to understand we can use

if we want to generate into the center we can use SR.

And if we want to generate a humor sentence we can use SH.

And let's see some concrete example, and

this is a caption generating by the caption bot.

It's about a dog laying in the grass.

And our romantic sentence will be a dog

running through the grass, to meet his lover.

And also the humor one will be a dog running through the grass in

search of the missing bones.

And luckily [INAUDIBLE] demo for the we will say some bonus

sentence like a man on a rocky hill next to a stone wall,

it seems really boring.

It will be romantic because you can say a man is rock climbing

to conquer the high.

And also the you're climbing the rock like a lizard.

And then we also show that our and data be to the domain.

So then, for for this video, maybe a man is playing a guitar.

And the romantic will be a man practices the guitar,

dream of being a rock star.

And the humor caption will be a man playing a guitar,

but re runs away.

So inside the caption is more attractive and

it also makes the image with the also more popular.

>> But the only way you can generate this is from

the planning data.

You have some forms of this, right?

>> Yeah, that's a really good question.

>> So you're not being creative, the is not being creative,

then it's copying things, right?

>> Yeah, so, of course, I want to say,

it doesn't mean I training,

it definitely characterize some kind of a [INAUDIBLE].

But I don't agree that if [INAUDIBLE]

copy the sentence down because we definitely find that

it means the [INAUDIBLE] just learning for them.

I mean people always using [INAUDIBLE] to

model [INAUDIBLE].

And let me say it's more likely modeling actually

modeling the [INAUDIBLE] and how people always make [INAUDIBLE].

>> Wouldn't it be interesting to see this generated?

>> [INAUDIBLE] you should go back to your training data and

have some semantic distance between your input captions.

What were the closest ones indicating data?

That only makes sense to understand what you

are doing here.

>> Yeah, yeah it's been that question.

And first I want to tell you that because we do not have pair

images of grammatic sentence- So

this means that you cannot use [INAUDIBLE] approach

to achieve this goal because you don't have the current images.

And we also definitely find that for

some language pattern actually it's [INAUDIBLE] and

if you generate it means that [INAUDIBLE] will be repeated.

We [INAUDIBLE] that we want to instill that how people

[INAUDIBLE] how they make them into [INAUDIBLE] consumers.

So we also want a machine learning to be humorous

[INAUDIBLE].

>> Yeah, are these two standing datasets,

are they separate datasets that you have labelled?

Some book or- >> All right [INAUDIBLE].

Yeah, actually which both that would be the first.

Yeah, it's something that both.

And also there's people that they have some

humorous sentence helper [INAUDIBLE] on this.

And the interesting thing is that we labeled the text and

data because we want to do the best quantity and

quality evaluation.

So we do some evaluation for the human personality.

>> So just a practical comment here,

it's very difficult to be funny where you want to.

It's much easier to be funny,

like your romantic set is all funnier.

>> Yeah, I agree that it's more like the other.

>> So you shouldn't really be doing anything humorous data

set, you shouldn't even do the romantic data set,

you should do like Donald Trump.

You should do things that are much more person

style that stand out.

>> Yeah actually we do not want to differentiate difference

[INAUDIBLE] because as you can tell you can see our table is

about [INAUDIBLE].

We do not [INAUDIBLE] but we want to do is if we can generate

the more interesting caption.

So during the evaluation we also do a very interesting

[INAUDIBLE].

That would be there reading with we're dealing

with the major here.

And then we run the machine for

both our current reading from other system and our system.

And we also use the one position.

If you want to upload the images to the website,

which caption do you prefer?

So this time,

how to reflag that- >> You have a few

options for [INAUDIBLE].

>> Yeah, yeah.

And one thing we've done which is related to this is to,

just for question and answering systems,

to make the answer sound like it comes from a certain style.

>> Yeah.

>> And one thing you can do is movie scripts like we did

the Star Wars >> Yeah.

>> And it's pretty hilarious what comes out,

you can give all these possibilities.

It'll be kind of a cool tool for social media.

>> Exactly, yeah, I agree.

>> Yeah, I want to readdress

this question about the sentence copy vector.

I think the projection of the english caption is more truly

based on the [INAUDIBLE] exactly what you are suggesting, but

with RSPM, you have the chance to combine fragments of

sentences from one sentence, and

the other from the other sentence.

>> Right, yeah.

>> But my question here about really this kind of style I

feel is much better [INAUDIBLE] position this

problem in the conversation and in the dialogue.

Because, really, you need to dig into other factors Into

a consideration like what's the question,

what the other person's personality is?

>> Yeah, maybe some conditionls or

something [INAUDIBLE] >> Yeah, there could be some

other factors like even considering social network is

actually loosely connected like a dialogue, right?

Like if you are making comments some people respond

maybe two days later, not immediately.

>> [LAUGH] >> But in chat bot scenario,

you need to take those contacts sort of immediately.

So this is from working

in collaboration with Shell company?

>> Yeah, this is the next time, last time industry protect.

>> It's all right, [INAUDIBLE].

>> Okay.

>> My other question is, you know if your USVD composition.

>> Yes.

>> It's- >> Yeah.

>> It's emphatically ambiguous, right?

I can multiply by a fine transformation [INAUDIBLE] so

how do you regularize it?

>> Yeah, you have to [INAUDIBLE] and

also ask [INAUDIBLE] multiple times by presenting it to work.

And other people, okay, let me come back to the slide.

I think we're gonna have enough time, yeah,

maybe we can come from the first StyleNet here.

Many people may ask that where do you [INAUDIBLE] they came up

with two metrics.

Yeah, let me you have your thing you and

I control the content as style or.

And we definitely we definitely two factor and

we find the performance of the three part

is from the performance [INAUDIBLE] metric.

This because when you're in the video [INAUDIBLE] how

a ladder of social space and it will interpret robust results.

And in the [INAUDIBLE] we're controlling the style using

the region, the first one is very, very simple.

And also it's a wild, you utilize in the style and

content control in the region for them,

where there are background, and well,

there are some method that, using matrix saturation that

can separate the content and the motion part.

And we borrow that here, and

we also, we make it work [INAUDIBLE] whole story.

Okay, the next thing I have to- >> I have one question.

>> Sure. >> So if s, you conjecture that

s is encoding style in this factorization, and

you can test that, right?

So you can put in an input and

then actually change the values of s by hand and

see when you get a different- >> Yeah, that's really, yeah,

it's really quite true.

Actually this is [INAUDIBLE].

0.5, plenty of things humorous.

I want to say, in this kind of framework because there's no

control of this, this kind of metric cannot work.

But we have another portion, so that would be why we multiply

something that definitely can do some mixed stuff, and

also really cool stuff, so that's why the item powers

in this sub and under some mention here.

>> For that part, I seem to have perhaps to person,

that's more interesting, make some adjustments.

>> Yeah.

I agree yeah.

It's prudent to do so.

It also could reflect how somebody can control itself.

It tells I mean it to how much that we want to

knowledge control measure for that yeah.

Okay so it's fine.

I also want to do another example and

also that actually to also work with Xiahou Dun, and because

last summer we also talk about capillary adjustment system.

You want to move deeper understanding of

the email content.

For example, stand that with mine,

but with more entity recognition approach.

For example, he had a really good,

easy recognition he could recognize for them is Obama.

And we could say that the person standing there is Obama.

But we can do one step further,

we can combine knowledge bases to do some reasoning for them.

Obama is a person of the Democratic party and

competitor of the Democratic party is the Republican party.

And the motif of the Republican party is the elephant,

so we can come to a very dependent sense.

Obama is not the Republican chair either.

So this is reasoning for the next,

because it means the first, maybe just ten years, but

but if we could generally give you,

generating the multi-benefit content.

Okay I think at the intro I will introduce

my most recent ICC paper that is how can

we best collaboratively draw the modeling vision and the language

through the new type of visual question segmentation.

And it's a problem we want to sort it out for them,

we have an image and we have location, I think very well,

I think most of you, it's very familiar with the traditional

visual captioning system, I mean you'll get a question about

images, if you were generating a answer about the images.

And, they will argue that there will be a credible answer,

it's a very important step.

You've got to measure the image to both understanding

the imagery content and also the natural process.

So, it's a very significant step for

the geometry modelling on which our image.

How would we argue that?

Only providing the text,

it's not enough to understanding the image content here.

So we are gonna insert location text,

we should also provide the visual items or

the visual segment to help you answer this question.

So for example it's the second image box,

it's a straight assembly, if you just answer yes or

no actually you have 50% to get the right answer right?

But no one know that you fully understanding you made it for

them, so you found there are total parts here.

And as a variant there,

you also do not know that if the answer is yes or answer is no.

You don't know that machine.

Definitely no words are needed if we can

localize answer another visual,

definitely understanding that you made content.

And so, we propose that, yeah.

>> So, I've always said that we always have a little bit trouble

and we should question the answer,

problem with the parallel setting that is more of you

provide the database and you provide the question then

you ask if the system will generate the answer.

Is it even more Important to

be able to see the future- >> Yeah yeah.

>> The question.

This question setting like set people off like for example.

>> For example what's so special is always between ten years

if you're just understanding this print, 90% will be correct.

And for some criteria,

I think the time recording is also another argument for why we

want to do the visual is that if we only trust on language and

answer parody, we get a logarithm of good performance,

and it's not the implement when you look at the images.

So that's how I decide this is the reason that should be

different from the quantization,

which would put them all property of the images.

So I think they are in concurrently with that work they

are another worker I miss from the Reno mister big way too is

that they try very hard.

It means, to control the tongue to get an equation to generate

a remote that works, answer.

But we've always thought in a different way,

you also look at what's beside them.

I think it's another very easy way to evaluate, to highlight

the importance of image content analysis, so we can task.

And with this benefit,

we show that it can be a benefit to two fundamental tasks.

The first task we show that we can enable a type of sort of

question-focused semantic segmentation.

The question that you'll tell me you wish you could

answer that question.

And also kind of because the reason task calls will

be question, I mean question can include many

clues to unify all image tasks what the object,

what it is, sometimes as is.

It's more like the operating gate detection problem

in tunnel vision, and the how many function is more

like the semantics segmenting or using segmentation.

And also what is the man doing is the actual recognition of

the images, and so Also some actual augmentation,

function and also something about reasoning.

If you're married,

you can visualize the ring it means that he's married.

Also any commercial shopping localize maybe

around McDonald's.

So this is the first half.

Second, we also show that, with this segmentation mask,

we can do supervised attention for VQA.

It's that, during the trip,

because we the people during the attention is more like

black balls, that they will pressure, and

the machine will automatically obtain different region.

And also, there are many paper that it show,

the image is very reliable.

And here we show that if we can truly, I mean,

with the question, region, and notation and

the outside parallels, we teach the machine where to see when we

turn the molecule and

also significantly improve the performance.

Here I want to show you some result.

First of all, I'm going to show the result from the notation.

At the front table I want to tell you that the best method we

propose is still far away from the current estimating step

method result.

This means that it's a piece level image understanding from

a long way is still very very training hard hard.

We need more effort in this domain, and in the second part

we show that we only noted about 10% of the original visual

data set with the segmentation tools, and even you

can improve the performance by improving the file.

This means that, if we can teach a machine to see clearly,

it will do better for reasoning, and also to answer the question

or something, so I think it's a good start that means that to

teach a machine to see where it can and we can

first understand the image content to the career path. Yeah?

>> When you analyze the query,

the textual query,

do you take in the entire question as a single entity?

Or do you break it up into sub-queries and do different

imaging processing tasks based on what the sub query is?

>> Just reading that section.

[CROSSTALK] >> Verify for the answer.

So if I ask like where are the sheep in the picture

versus where are the sheep and the cows in a single picture.

Would you write that up into two queries, or would the machine

try and isolate that as a single query with two options?

>> I think it's the second one.

It's that we try both inquiries using some word play for

each one and the average of them.

And although, we try encoding the sentence into one single

encounter, we do not decompose the query for elegance.

But I agree with you that if we can, I mean,

if there is this view open with some problem, how can

we understand their language is tougher than I think is to do.

In kind of that view, it's probably with the second life

being encoded in your entries.

>> The way that you attack this as a database problem,

you sort of, you generate operators from text, and

you sort of create almost like a query plan through

the distribution, so the to think of like an image

processing plan that's sort of formulated the same way.

>> Yeah, I think it definitely might need research.

We can do about the language part, and

also many things we can do about the emailing part, so that's why

I say it's a very interesting time, how anyone can try this.

[LAUGH] Yeah.

Okay I think, okay.

>> So you mention that you can reasonably,

is this how many single creations are there,

is it also possible to measure dimensions of the image?

So for example,

what is the angle of the left part of things like that?

Is that something that's part of the students there?

>> So I think your question is about the answer of ambiguity.

Because for them, you ask this question for this one.

What's the natural result?

An answer for the question, almost I feel I miss what

indicated in the questions about the tower and

what the answer is a person.

So true, yeah,

doesn't remember label this data set we definitely found out and

let me have I mean very many we created.

So we define the rule that if we only allow that,

we only let the user to annotate the answer part instead of what

the question.

You only need to annotate it under,

this is the confusion.

And also during the labeling task, we also,

to answer that question, you can now define some region as

the particular region for the other.

And we also gave some answer and I told that they may say,

you need to see all the images as answer.

Or you are unsure, we should combine these.

So we would idea the software for certain, and

we want the data set to be more clean, and

also we focus more on the region, we understand different

relationship with anti relationship.

So we lower down certain criteria so

we delete it during the data collaboration.

Yea.

Okay, I think I should conclude my talk, and the first thing I

want to recognize is that my work is one of the first so

I ask of you for the last we do qualification.

And also we played with a new research direction that how we

can utilize the video image to do the video recognition.

And also we propose a new matter with connecting knowledge ways

to open and carry video retrieval, and fortunately we

take it a step further to try to video captioning.

And finally, we collect the new resource question data set

to facilitate a better drawing the modelling and

the additional knowledge.

And in the next time I'm very interest to doing

the following with our hobby, and

the first one is about how using meta-learning to design

the Neuro Network architecture to do the video analyzing.

Sorry because you'll know kind of like how all

the Neuro Network is really is the human advantages.

And we want to currently work this, working globally.

How can we use another network to improve the network and

to learning Neuro Network by doing the network.

And I think it's because currently there

are misunderstanding in the glocal of doing this

that many people want to do some deep learning.

You only need to give a data again.

They will generally look good, and it will be very cool.

Secondly, I also want to do more work about Canadian,

understanding this storage base as because current evidence is

any still limited to the real livo, and we should keep

understanding of what's contained in the video content.

What's the cultural relationship.

I thought we would also want to understand the operating

humanity, the video, and the connection,

the deep understanding with the measure,

which I think he is about.

For example we watched a video about how we can stimulate

the action performed in the way video,

like we're wanting to stimulate the action.

And also very interesting about you assume the generality

models for the videos and for primitive essays, for

the future prediction.

And then lastly I will also want to treat the video as

a knowledge base, and how can we leverage this

in the temporal continuity to build the video for

learning with, so no difference.

So for prior learning, for no use,

I think there is many work we plowed here.

Okay I think that's all my talk.

I'm very happy you're here to listen to my talk.

>> [APPLAUSE] >> We're almost out of time, but

in case you have any questions it seems we can still help.

>> So when you choose which images to let through for

reprinting, do you have a different parameter setting?

>> Parameter setting, which part?

Do you mean- >> In the link to exceed part.

>> What?

>> Link to exceed?

>> Maybe you can repeat?

Yeah so- >> This supervised approach?

>> No, when we find too many of them,

>> Yeah when you do that-

>> They do

not return the network.

We just put the data in to further continue for that one.

>> Yeah yeah yeah, when you're returning the-

>> They're-

>> When you do that,

are you choosing the images where you have the most

confidence?

>> Yes. >> If you had to reject some,

for every batch you have the same confidence.

>> Yeah, it's that we don't consider with that we don't

confuse that with our in our case.

And we just start to remove the noise here.

>> Okay. >> So of all the work

that you have presented, which one do you think is closest to

taking out of the lab- >> What?

>> To the real world?

>> Sorry, can you say it again?

>> Which particular work will be ready to ship, be-

>> To ship the product?

Yea, I think that for the redirect data for the learning,

it's already in the Google search engine,

inside to how many hours that it's already translated for

the publisher write the book,

you'll say that to a lot of video typing using my network.

>> I see.

>> And also for the focal learning and

also they worked on the video per my insistence.

>> He said, I said.

>> Okay. >> We can.

>> Send him to her.

So we're, man,

friend, man, node.

For more infomation >> Video Understanding: From Tags to Language - Duration: 58:29.

-------------------------------------------

Cars 3 Lightning McQueen and Sally Carrera Coloring Book Video for Kids - Duration: 3:04.

Cars 3 Lightning McQueen and Sally Carrera Coloring Book Video for Kids

Không có nhận xét nào:

Đăng nhận xét