>> Okay.
Hi, everyone.
Welcome to today's session.
New roads to follow, supporting
and advocating for data
visualization in the library.
I'm the academic coordinator at
NLML and I'm pleased to
introduce Ben Hoover, he's
currently active services and
instruction library at the
health sciences library research
learning commons at penN state
university.
He obtained a bachelors in
Humanities from Harrisburg in
2009 and graduated from the
University of Pittsburgh school
of information sciences with a
masters in library and
information science in 2011.
Ben attended the data
visualization institute in the
spring of 2016.
A couple of announcements.
All attenDees are eligible for
one MLA CE for the webinar.
The certificate will be
accessible after you take the
survey, I'll send out a
conclusion of the presentation.
A reminder that your feedback is
critical as we develop NMLM
programming in the future.
I'm happy to announce that if
you're a librarian in the mid
Atlantic region, Delaware, New
Jersey, New York and
Pennsylvania and accepted to the
data science visualization
institute for librarians, which
will be held April24th
through 28th at NC state, we are
offering two special
professional development awards
to support the $2,500
registration fee.
If this is you, contact your --
by January27th at
RBARGER@edu.
This will apply the application
or if you intend to.
Awards are given on a first
serve first come basis after
notification of acceptance.
With that, Ben, take it away.
>> Thank you for attending my
talk.
My name is Ben Hoover, I'm the
access services and instruction
librarian at the Harold health
sciences library.
Today I will share with you some
of my experiences in learning
and supporting data
visualization in the library.
I will go through the
presentation and leave time at
the end for taking questions.
So this is a simple data
visualization at tab below.
This is me playing with data
that we collect here in the
library.
I will go over basic Charts and
graphs are visualizations with
creation of new found sites and
search for already found
knowledge in data sets small and
large.
First I will go over -- one
definition of data visualization
according to the interactive
design foundation data
visualization is the graphical
display of abstract information
for two purposes.
Sentence making or finding
insights or research
visualization and communication.
I like this definition as it
pulls out the two reasons to
create data visualization to
communicate findings or find new
insights such as patterns or
outliers and data sets.
I won't go into data analysis
but data visualization is
another method for exploring and
analyzing data without using
statistical methods or along
with statistical methods.
The info graphic here shows the
companies or programs that
involve data including
applications for data
visualization, this is not all,
this is a smattering or example.
Data visualization is one facet
of the life cycle that all these
companies are part of.
And many programs in the list
are able to support more than
one step in the data life cycle,
can be used for analysis and
data visualization.
And also there's the data life
cycle, and I have a link there,
I will send out slides to
everybody at the end, I can pull
up here real quick.
This is just one guy for the
data life cycle, and as you can
see, analyzing data, data
visualization fall under here or
in to -- it also falls into
communication as well.
So today I will cover four
themes in the talk.
I will dig into library role in
data visualization, then I will
give an overview of the NC state
data visualization -- data
science visualization institute
for librarians, I will also
share my big take aways from the
institute.
Finally I will cover what I have
done and plan to do-- to
implement data visualization
service at the penN state
college of medicine.
So let's start with the
libraries role in data
visualization, for me this
started during a webinar
probably a little more than a
year ago about a data
visualization support service
through the NIH library.
The librarians there chose to
support their researchers with R
as it was the most widely used
at their institution and I
became interested in the idea of
one, the library supporting this
aspect of the research process
and also I found R to be
interesting as completely free
program.
That can do a lot of
visualizations.
And our library already supports
data management and
bioinformatics.
But doing a -- we found no
formal on campus support for
data visualization best
practices or practical support
for visualization tools, we
perceive this as a need and
library would be the best fit to
support data visualization for
research in scholarly
communication.
Data visualization is after all
another aspect of the research
process.
Plus for me, this is my personal
things is good data
visualization are easier to
understand and have less chance
being misinterpreted or may make
access to complex research more
accessible not just to other
researchers but also science
reporters and/or maybe the
public.
Big data and lots of data are
reality of our day.
Even though big data is a bit of
a cliche term at this point like
the web 2.0 thing.
Large data sets continue to get
larger and they're more complex
and there's also more data in
general.
So typically more complex data
sets can be more approachable
using a exploratory data
visualization to look at
different facets of a data set
that has lots of different
variables.
And that leads to dashboards
that can be almost a third form
of a data visualization,
dashboards are coming up if not
already very important will be
very important for a lot of
different institutions,
libraries are already starting
to adopt being the leader of
dashboard.
Subscriptions from the
assessment conference I went to
but also hospitals and academia
are building dashboards to
collect all data into one place
and do assessment.
On their institution.
The public is interacting more
with research presenting to info
graphics online.
Visualizations there potential
of crossing the gap again
between expert and layman.
Some I was then asked by library
administration here if I would
be interested in attending the
NC state library date science
and visualization institute for
librarians, I agree, and I went.
The institute is intensive five
day nine to five program, the
institute is well worth the time
in the investment.
I recommend attending if playing
to support data visualization at
your institute.
Some of the things I found
interest, this is an aside, it
was interesting to see
librarians from different higher
education and health sciences
libraries.
Coming from different sectors of
social sciences, medical
sciences, mathematics,
statistics all those, it was
interesting to skip all these
librarians in a room and have us
learn so many different software
tools.
Again, there were so many things
we went over, no one was an
expert on everything so we
bonded in this intense training
experience.
In the training we explored
first the tools, the many tools
for data visualization, the
principles of data
visualization, best practices of
data visualization, and what I'm
calling the connective pieces of
what need to happen before and
after data visualization.
Data visualization as a field is
an intersection of many
disciplines.
It is an interdisciplinary.
This slide has a few examples,
happen to be an expert in none
of these fields.
However, I believe it can be an
advantage I'm approaching data
visualization as not a expert in
one of these disciplines that
directly affected but I can
actually just approach data
visualization as its own area of
expertise and not as an
extension of another discipline.
I can focus on best practices
and learning tool toos for
better data BIZ.
On the other hand it's been a
struggle if I have to continue
to learn more about how the
disciplines at my institution
use visualization.
And I'm -- I feel like I'm going
to be learning a lot and it's
fine, data visualization is an
amazing subject, I'm trying to
to learn R, have been for
months, I have the basics down
but some of these tools can be a
long-term investment of time.
So next I will talk about tools.
We went over an exhaustive
number of tools premium and
free, this included
quantitative, which most of them
were, and some qualitative
tools.
Some examples we started with
excel, R, GIFF status, SPSF in
vivo, there were many more,
while we got to try these tools,
it was important that we did,
also nice to see common themes
among them, when you approach a
program there's usually a
standard interface if it's code
line.
Or gooey.
Many of these tools do more than
visualize data, and are involved
in many aspects of the data life
cycle.
We spent a half a day covering
best practices in data
management and in the
manipulation in XL.
XL more researchers use XL than
any other visualization tool
because it's available and has a
low barrier of entry of use.
It's a great way to ruin
visualation if you don't know
what you're doing.
Excel is a great tool for
cleaning data, but this tool is
not good for doing complex
visualizations versioning, data
archiving, not a safe place to
store data and excel file can be
easily changed ruining your data
set.
So that was one thing we learned
about that tool.
Most of these software packages
free or paid are going to be
able to do some type of charts,
graphs, maps, and/or network
visualizations but almost none
of them will be the best tool
for all your visualizations,
each will have a set they can do
they're better at so typically
what you want to do is if you --
you have a visualization, you
want to know or you have a data
set you need to know what
visualization will work with it
or if you have a discipline
program, what kind of data can
be use with it.
Too tools that came closest to
being universal are R and
python.
They are both free.
Code commands programs some code
line programs, that have
packages available for creating
different visualizations.
These programs have high Baars
of entry for use, I would say a
close third is TABLOE, it is a
gooey interface a point and
click interface, it is they are
really trying hard to be a one
stop shop for your data
visualization however they're
not as customizable yet and it
is a paid subscription.
And I don't know the price but I
know it's not free.
Each of these tools are great
for visualizations, but they
also play roles in analysis or
versioning or data management as
well.
It's important to understand
where the software fits into the
data life cycle, want to mention
that again.
So next, we're on to principles
again we went over different
definition of data
visualization.
Each discipline has kind of a
different approach to it.
The one I showed you is kind of
the one I like but you can look
up -- everybody has a different
spin on data BIZ.
So we learned principles from
data visualization including
types of data visualization,
info graphics so that's the idea
of great for communicating known
insights or findings.
These are many times publicly
available, not always.
Research or exploratory
visualization, so used by
researchers or created for
anybody to look at if they're
open to explore a data set,
visualizations are used as a
data set reused for different
yes, sir.
-- questions.
Then we learned about dashboards
as well, that exploratory
visualization for assessment or
ongoing research.
Many facets usually customizable
choices for limiting values such
as time, dashboards are great
tools for assessment, usually in
place for long periods of time
so year after year after year
you can access data if it's
pooling.
Best practices.
We learned quite a few of the
basic roles or kind of the
philosophy data visualization is
simple design, low cognitive
load, if the idea of the right
chart for the right data and you
want to avoid distortion.
So we discussed using simple
clear visual sayings using
cognitive load.
The goal is to be easy to
understand while not having to
process too much or have
background knowledge on the
topic.
This is important to reduce
noise in graphs and remove any
unneeded Marks.
A fancy visualization is not
always the best visualization.
You want to a visualization that
is just simple as possible.
I have a 3-D pie chart.
A bad example how not to
visualize data.
Humans struggle to process area
especially across the
perspective so two -- pie charts
are great for approximate
information of up to four
variables.
This has five.
Understanding it's good
understanding say variable A is
bigger than variable B.
But it's not good for granular
or incremental differences like
the difference between purple
and red might be hard to figure
out on this graph.
With spent time identifying
accidental or malicious chart
distortion, including oversees
to change in data by messing
with variables such as the
minimum or maximum time on the X
or Y axis.
So you can make it like oh we
had a huge drop in sales but
look at it over the long term
it's a slow steady decrease.
Same thing with maybe money
earned, you can have it look
really like you're not making
any money but you just made your
Y way too high.
It's usually accidental but it
could be malicious if you wanted
to.
Finally data visualization
stands on clean reproducible
data.
So lead into the next slide, so
what is clean data?
And as I said before
visualization there's one facet
of the data life cycle, data
collection includes keeping
records of data collection and
creating a data book for
understanding the data sets and
field labels for later reuse or
reproducibility.
Clean data is data that had been
manipulated in a good way or
formatted for use in analysis be
it statistics or visualization.
So formatting data pay attention
to what program you use to
visualize and format to meet the
requirements of the program.
You're going to have a data set
you need to make it fit into
parameters of the visualization,
you don't want to make sure you
know what the parameters are
before you manipulate your data.
Versioning tools such as get hub
are essential for
reproducibility or researcher
sanity, first developed for
programmers, get hub is a free
tool that allows versioning a
file type for one or more users
making your steps possible
through forks.
Which is where it is literally
like you each get a version you
can do your own, changes and
then combine them later or just
you can see what the other
person did at the same time.
Clean data requires you to
search for and fix input
mistakes on your data set.
Formatting outliers that create
distortion in your
visualization.
One example can be values
attributed from survey
non-approximately cable like
yes, no, nonapplicable, the
program or if you're doing
electronically or input paper
into a program or whatever,
everything is marked NA so now
you have letters in a numeric
data set that throws off the
data visualization.
Then finally data must be stored
and accessible in some form
through data repository, these
days publishers, universities,
institutes and the government
are all getting into the data
repository business.
It's important to know where is
your data going to be in the
end.
These are all the connective
pieces that -- you should be
aware of at your institution.
I'm going to now talk about my
big take aways.
From the institute one is that
there are so many tools out
there you can possibly ever want
the use pay or free, doesn't
matter, and there's new
companies and new programs
starting -- seemingly every day.
So it's really important -- you
don't need to know how to use
them all but you may want to
stay kind of up on if there's a
new program that's popular
across the entire discipline
that maybe you should look at
learning if -- it's a discipline
you support through liaison
program or library.
Data visualization is already
everywhere.
And it will only grow in
importance as the massive data
of all sciences increases,
either small data sets, there's
a lot of or one gigantic data
set, data visualization is more
and more important.
If we're to get through all this
data and make it easily
digestible by humans.
Also just do it, that was one of
the things I found before I went
to the institute, dabbling with
-- I downloaded, it was free, it
started going and playing with
it.
I was amazed what you could do,
a few packages in a free data
set that you got from the
government.
So don't be afraid to engage in
visualization.
I'm going to talk later about
what I have done to start a
service here at the college of
medicine, but even before that I
had multiple faculty engage me
about what are we -- what do we
do now, what do they do, gave me
some ideas of parts of service
that I could do already.
It's really Ohio it's not --
kind of scary first but once you
get your head around it, it's
not that bad.
It's a service you should
support to a library.
Finally good data business is
not easy to create, and it is a
fascinating field of study.
All right.
So next I will start -- talk
about starting a program at the
penN state college of medicine
medical center for
visualization.
This is a panoramic picture of
the technology innovation sample
at the health science library.
As I was going to my experience,
I did a week long -- the data
business institute but also I
have been learning a lot of
these programs about -- a couple
of them, while that's going on,
we had a renovation going on so
I was able to -- we were able to
at the library bake into the
renovation high end computing,
software licenses for software
being used by researchers, we
got a video wall and we use that
for doing training for data
visualization.
I was able to offer workshops in
that space, I am keeping drop in
hours in the sand box so
somebody had a data business
question they come in sit down
talk to them and make me
available like data business
office hours.
Going on what I have done, this
is what I formally did is we
evaluated what tools research we
were using, we purchase software
freely available in the library
for them, the ones we are
supporting right now is we have
excel, SPSS, R and SAS, and we
have a educational copy of
TABLOE.
One of the first things I did
after the data business
institute was I participated in
the national day of making event
which is a national event on
June17th, it will happen
again this year, it was to show
off the potential what I did is
I through up data visualization,
showed off free software that
anybody could use and I used it,
we were in the cafeteria of our
institution so I used it to
engage with some researchers and
start a conversation.
I am presenting a webinar on
data business basics on
January26, that's only in our
institution and it's half an
hour webinar to engage with
researchers, raise awareness for
data visualization and good data
visualization.
Just talking to researchers
students, weapon array of
students here, they have
different research needs, just
from talking to them that was
where we made our choices on our
software purchases.
I talked about starting drop in
hours, I am still learning R I
have the basic of R, I am trying
to start a hang out group or R
support group.
There is a minority of
researchers here using R but
they're very enthusiastic.
We can grow a user base and kind
of a network of our users on
campus, so trying to do that.
I will and have started pitching
my evolving skills to my liaison
department, it's to five
departments in the college of
medicine.
And so I always am willing to
consult the them as well.
That is the conclusion of my
presentation.
And I will leave you with this.
Another simple data business
messing around the TABLO, we got
it at the end of the year.
If you have any questions, now
is the time, thank you for
listening to my presentation.
>> Thank you, Ben.
That was a ton of food for
thought.
Anyone who has a question feel
free to though questions into
the chat box and we can address
some to Ben or they will just --
he's happy to answer questions
so now is a great time.
>> I'll start with the first
one.
What are good ways to train if
our directors won't support
sending us to the institute?
Well, I would say one thing I do
a lot of, we have access to
Linda.com.
That's a -- that's how I have
been learning R is Linda.com.
Because one thing I will say
about the institute is you're
going to go over so many tools,
you pretty much just kind of
learned the parameters of the
tools, you will do one or two
examples for each.
So you're not actually going to
learn learn everything, you're
going to get an idea about each
one of the tools.
So when it comes to training for
a particular program, I would
say there's Linda.com, I know
there's free trainings maybe
CORSERA.
I have seen some data business
courses come through this kind
of MUKES.
Uh-oh.
Wow.
So I don't know if that answered
your question but hope it
helped.
You can always train yourself.
From Hannah.
The institute is NC state, go
back and show you the link.
>> Okay.
I can go back and -- all right.
Do I have a lot of faculty
participation?
I would say at this point we
have just finished our
renovation, I have talked to a
handful of faculty.
I'm going to try to use them
like my champions if you will or
-- like my very excited people
who were proactively searching
me out.
So I'm going to try to use them
to spread the word to the
departments.
So I don't have a ton of faculty
participation yet but I'm hoping
that it continues to grow.
National data mentioned, day of
making -- the day of making is
every June -- that might be what
you're talking about day of
making is June17th.
We grabbed on to library because
we also support 3-D printing,
3-D modeling, data
visualization, I think we have
some other stuff.
We were doing on the day of
making.
We just kind of like set up shop
in the cafeteria for about three
hours and we had people coming
in and stopping by.
We had -- I think we had
radiology had a 3-D MRI
translate -- a machine called an
echo pixel, so they were up with
this as well.
We had -- we didn't have 3-D
printers yet so we had somebody
from the simulation center here
had brought in their 3-D
printers so it's kind of a
interdepartmental event.
How many people come to your
data drop in sessions so far?
I have had exactly two.
One was literally saw me doing
stuff and came in, we're not
working on something yet but
they were like wow, I love to do
that, I was like I would love to
help you do that.
The service started this movant,
the library -- this month.
We ended renovation, we ended at
the very end of the year so I
have been ramping stuff up this
month.
Recent project used.
I have been playing with library
data so far.
I showed you the TABLOE.
I have been doing stuff at
GIFFE, I have it in the sand
box, I didn't save it to my M
drive but I used GIFFE, but
mostly dabbling -- I don't have
visual saying on me, those are
home on my other PC.
Sorry.
Are you considering virtual
reality?
We actually have -- this is from
NLM SIS data 15.
Are you considering virtual
reality as tools for data
visualization?
Possibly.
We do have a HTC 5 that I think
our IT people Ryan KLINGER in
particular is trying to get up
and running in that tech sand
box.
We talked about use cases, that
is something we talked about
especially for VR but also I
think right now the first thing
they are looking at is maybe
using it in simulation.
But it has come up, medical
simulation.
Tests follow-up.
What is the best way you found
to promote drop in sessions.
Pretty much what I have been
doing so far is word of mouth
and just being there.
We are kind of definitely --
center of the college of
medicine considering the cross
over college of medicine in the
hospital as well.
So I have been doing word of
mouth, I'm probably going to do
advertisements, we have these
kind of -- we have all these
digital bulletin boards, for
drop in hours to make it
available put them on our
website.
So far it's word of mouth and
have a few people come in.
From Hannah.
Pretty much the two questions I
have dealt with again from the
drop in F one was I want to be
able to do that.
There are programs and we have
some of them here that can do
the visualization that you're
looking at, which is an example,
data business request.
The other researcher was the
same thing, I think it's at this
point in my service people are
interested in upping the game on
their data visualization.
So I haven't had a specific
question I worked on yet, it's
mostly been raising awareness
and showing off what we have for
people to use.
From Robin.
Educational license for TABLOE
was the library not advocating
for it?
What we have is a teaching copy
of TABLOE which is much less
expensive than a working copy of
it.
So it's like yeah, so it's only
in the library and only in -- we
only have it in the sand box
right now, we think it's a great
tool, I mentioned earlier, I was
actually an assessment
conference this fall, TABLOE was
like TABLOE TABLOE TABLOE.
We're experimenting with it and
going to try to advocate for it
because it's a powerful tool.
From Melissa.
Was there a needs assessment
done to determine the service
was one needed before the
renovation?
Yeah, again, word of mouth
getting liaison departments
there was interest in data
visualization, there's no
service.
We have -- we have an approach
that -- what do researchers have
to support?
Just looking -- first looking
around like they have high end
computing support, they have
cores for that, they have
statistical support, there's no
visualization support and when
we brought it up to liaison
departments they were
interested.
Yes.
So let's say it wasn't the most
formal thing, we did check
around and asked about -- yeah.
So any more questions?
>> We'll see if anyone else has
any further questions but
meantime I'll remind folks
applications for data science
and visualization institute for
librarians are due on the
27th I believe of this
month.
If you miss the announcement
from the beginning NNLMR is able
to offer two special
professional development awards
to support the $2,500
registration fee.
If folks are interested feel
free to send me an email at UJV
at PIT.EDU or RBARGER@PIT.EDU.
They're given on first serve
basis after acceptance.
And we're thrilled to support
librarians going to this
institute and learning all these
overwhelming and incredible
resources.
>> Yes.
I will answer this question,
somebody else as well.
From Sara, what is the
difference between info graphics
data visualization, info
graphics is a type of
visualization, info graphics
typically are about just
communicating your findings,
they're typically interactive
but you can do a search usually
there's things like TABLEAAUX,
they have Kayly or weekly info
photographics, they did smoking
rates across the United States
or like different habits, they
do show up, I don't know -- I
read this in news aggregator
called dig and there's usually
an info graphic every day,
interactive data visualization
so it is a type of data
visualization.
It's like one big insight I
probably mentioned, but couple
of times is that for me I
thought data visualization is
all about info graphic, when --
we watch the NIH, webinar, it
was amazing, this director is
supporting researchers, I want
to be able to do that and it's
more about the exploratory
aspect of data visualization
that I have not seen before, it
is both transmitting your
information and also exploring
say your research trying to find
insights, communicating your
insights or finding insights.
>> And last month MAR hosted one
of the librarians here at the
university health science
library system and a webinar
about info graphics and walk
through what info graphics are
and demonstrate pick toe chart,
freely available resource, I
will put the link for that
course in the chat box, folks
want to dig into that that is
available to view at any time.
I don't see any further
questions then.
>> Okay.
>> We will wrap up a bit early,
make sure you fill out the
survey if you are interested in
MLA CE and also give us feedback
into give us some ideas for
future programming.
Thank you again, Ben.
>> No problem.
Thank you, everybody, for
attending.
>> Have a great afternoon,
everyone.
Không có nhận xét nào:
Đăng nhận xét