Visualization for People + Systems

Visualization for People + Systems

>>Welcome everybody and
those online as well. It’s my pleasure to be introducing
Dominik Moritz here today. Dominik Moritz is one
of the stars in this, recently who’s graduating right now. Dominik spent two summers
interning with us, so he’s a fairly well-known
[inaudible] but he’s done a lot of interesting work
since he’s even finished. So he’ll be presenting some of that. He’s currently finishing up his PhD at U-Dub with Jeff Heer
and I think before that you did your undergraduate Hasso
Plattner [inaudible] awesome. So for those that are
attending remotely, send me questions online, I’m not going to be running
Teams but I’ll be monitoring my e-mail so that meant you
could ask questions that way. So without further do welcome.>>All right, thanks
Steve. So this talk is about visualization and how it helped to improve the way we create and think
about visualizations. To illustrate what I’ve done, let’s look at how we usually
make visualizations. This is how a prototypical
data scientists would make a chart. Let’s say here for the precipitation in Seattle throughout the year. First they would start by importing some library such as mat plot lib. Then read in the dataset, filter to only Seattle because
that’s what we want to look at. Then extract the month and
the year from the data, aggregate it and then
flatten it again. Finally, initialize the chart and then draw the line for the chart here and then once we’ve done
that we can start setting up the title and axis and so on
and finally show the chart. There’s a lot to be
desired from making visualizations this way and it’s pig comes particularly obvious when we want to compare
the data for Seattle, with something like New York City. Now we have to add another groupby
here by the location at a club map that tells us what the different cities
should be represented with. Then we have to draw a line
for each city individually, and let’s not forget
the legend because otherwise our readers won’t know
what the colors actually mean. So a small change here and what
we want to visualize entails changes to the data processing
as well as the rendering. Even though we went
through all this effort, the plot that we made in
the end is a static plot. If you wanted to make
this chart interactive, we would probably have to rewrite
it in something like JavaScript. This also was Python code, in R this would look very different and the issue here is that
you would essentially have to rewrite it because
the specification is bound to low-level specification
of how this language works. It’s also hard to optimize
this code because we’re interleaving the specification
and the execution. So if you wanted to run
the same visualization for all citizens in the world, this probably wouldn’t scale. The issue here is that
the system doesn’t really have any way to optimize it, because it doesn’t
have any flexibility about how it’s executing things. At the end of the day, we
don’t even know whether the chart that we’ve made
is actually a good design. For instance we, could
have forgotten the legend, and the tool doesn’t
really provide us any feedback and tells us that we
might have forgotten something. The challenge for the
designer is that for any good design like this chart here, there are a number of poor designs like the one
we’re forgetting legend or where we’re using
colors that imply order between the data
that’s not really there. Or the one where we’re using
dots instead of lines. Here you have to
actively read the chart. You’re fast object recognition
system can’t kick in and help you get see how
these two cities differ. These are just
a few examples of many many other possible design
set you might create. So she faced with the challenge of these visualization
tools that we have. For my PhD I set out to
answer this question. How do we create the next generation
of visualization systems, where users can rapidly
create good designs? They don’t also day rely on visualizations to reveal
patterns in the data, and so it’s important
for the systems to not be in the way of
rapid exploration. At the same time,
good visualization tools should prevent us from making bad design decisions and instead
encourage effective design. Harvard tools that we
have lacked consideration of perceptual principles and
so can’t help us doing that. The scale of data analysts, that analysts is in the work with has far outpaced the
tools that they use. But as the what is drowning
in data we need to have new tools that work at
the required skill. Harvard tools often fail to provide
either of these two things. So the guidance or as
the scalability and the fundamental problem is that
for our visualization tools are designed for manual authoring
like languages like matplotlib or GG plot knows. Good design however, is then
the responsibility of a designer, and we teach visualization designs
through classes or books or by experience but our tools do not provide any
computational guidance, and we don’t really
help us with that. So what I’ve done is to
expand this triangle and figure out ways to design domain-specific languages
where people and systems can meaningfully participate
in the visualization process. So here system should help
us make better charts, make better design decisions. Similarly, optimizations
to make visualizations scale to large data often rely
on abstracting away the people. They rely on abstracting away what
the user exactly wants to do, and this has been a quite
successful endeavor to abstract away some of the details and just figure out the
optimizations at a high level. But because these tools have very little understanding
of the user’s goals, we’re missing a lot of
opportunities for optimizations. I’m going to show you some examples where I leveraged an understanding of people’s tasks and their capabilities
to inform system design. Here understanding the user
and how they interact with the data has enabled me to
discover new optimizations. So these two research thrusts
have led me to my mission. Which is to develop tools
for data analysis and communication that richly integrate the strengths of
both people and machines. People have human intuition
and they’re the ones who ultimately decide on the value that it derived
from an analysis. Machines scale and they reliably
produce the same results. So we want tools that help us
benefit from both of these together. Following this mission, I have developed a number of systems
that I’ll talk about today. Particularly I’m going to talk about four systems that are
roughly in two groups. First ones are former
models of visualization. Where the goal is to provide guidance in tools to help us
create better charts. First thing I’m going to talk
about is Vega-Lite which is a high level grammar for
interactive multi-view graphics. While Vega-Lite is
useful in its own right, the motivation was to serve as a representation for tools
that generate visualizations, and this won a Best Paper Award
in 2016 [inaudible]. Draco then fulfills his goal of guiding in tools that help us
create effective visualizations. But at the same time, by figuring out a way to actually
build knowledge into tools, we’re able to inform our understanding
of visualization design. This won the Best Paper Awards
in 2018 at this.>>What happened in 2017?>>Hamlin won the Best Paper
Award at [inaudible] . Then on the scalable
visualization site, I’m going to show you user centered optimizations for making
visualizations skeletal edge data. The first system I’m going to talk
about is Falcon which enables interactions at
an unprecedented scale and we’re going to present
that a chi in a month. When we want to get even larger data
and be able to visualize it, we need to research approximations, but approximations are
approximations and so, there are potentials for errors. Actually here, in MSR together with the inefficiency by the people
from the vibe group, we approach this problem from a user experience perspective
and we looked at how we can make visualizations that are using approximations
more trustworthy. These four systems
are part of my work on visualization tools
for data scientists. I’m happy to talk about any of these other papers and
tools after my talk. For now, I’m going to focus
on these four though, and let’s start talking about these formal
models of visualization. If you have any questions
just feel free to drop me and you can can talk about that. Okay. So on this side here, our goal is to provide guidance and tools and help people
create better charts. What that means is that, we want
people to be able to create visualizations accurately that are accurately and easily understandable. To do that we have to
apply the best practices and rules of effective visual design. You’ve already seen some
of these examples of choices that you might want to
avoid when creating visualizations. To computationally reason
about this design knowledge, we need to formalize it, and so that’s why there needs to be a formal model of this
design knowledge. This formal model needs to be
expressed in some language. So what we need is some language, some representations to
do that in and this needs to be a convenient yet
powerful representation. This representation
should be declarative, because we want to reason about
design and not the execution, and then the declarative
specification we’re separating the specification
from the execution. The specification should
also be high level with few constructs so that
we can reason about it. The way we’re going to do
that is through Vega-Lite. Once we have this representation, we need to bridge the gap to actually guidance and we could do that through automated reasoning over this
formal model of design knowledge. So all of this I’ve implemented
in a system called Draco. So this gives you an overview of
where Vega-Lite and Draco fit into this goal of providing guidance. So all we need to start with
is this Draco presentation. To talk about this representation, let’s talk about how we
actually make visualizations. Usually, start with
some data set like this one here about the weather
in Seattle and New York. We’ll pick some fields from this like the city, date and temperature, apply transformations to it to
being aggregated filter and so on, and finally, create the chart. So what we want is, we want
a representation of this process, but we want a representation
with few constructs. We don’t want to enumerate
all possible charts, all possible combinations
that you might do here. So to capture this process using just a small number of
building blocks that compose, we’re building on something
called the Grammar of Graphics. Here, specify the data and transformations that
are operating on this data, and then the visual encoding itself. The way we express visualizations
in the Grammar of Graphics is as a mapping from data to visual properties of a mark
for instance alignment. So exactly for the chart that
you’ve seen before, what we do is, take the city field and map it to the color which then creates
one line for each city, and each of those also
have a different color. We then map the month to the X position and
the average temperature to the Y. So to express this in
the Grammar of Graphics, we need scales which
are functions that map from the data domain
to the visual domain, guides, which are
visualizations of these scales. So that guides it’s
just umbrella term for axes and legends and you
see the axes and legends here. Then finally, the mark which are
the data representative graphics. You could use points,
lines, or areas, but for this one, we’re using
a line to make this line shape. So these are the
declarative concepts, but they need
an actual implementation and that is what Vega-Lite is. It’s a computational format for the building blocks of visualization. Here’s what it looks like. We first specify the data,
concretely the weather. csv file, specify the mark type, and then Vega-Lite makes
these encoding that describe visualizations very explicit by grouping related
properties together. Particular, to make this chart, we have one x encoding that takes the month of the date and
maps it to the x position, the average temperature to y and the city to color These
specifications are concise because, all we’re doing is
composing components rather than enumerating
a particular chart type, there’s no chart type line here. It’s a mark type and then together with encoding so
you get a particular chart. But while it’s concise
it’s also expressive. By composing and encoding marks Vega-Lite supports
an expressive range of graphics including
statistical graphics such as this one here. Yeah.>>So I’m not from this field site. This may be in a harsh a bit. Is Wilkinson’s work considered to
be both sufficient and necessary for expressing everything you need to express about these visualizations?>>For it’s just like
graphics, that’s pretty. Do I have to repeat
questions by the way? Do I have to repeat question?>>No. [inaudible].>>Okay. So the question was
whether Wilkinson’s grammar of graphics is necessary and sufficient. It’s a pretty general concept
that has been applied in both D three and GG plot and
has become pretty popular. So I would say it’s, we have good indicators that it’s
necessary and sufficient, there’s no formal proofs for it. Yeah. But we’re not the first ones to use it and certainly not the
last ones to use either.>>Actually, they celebrated
their 25th anniversary of the publication of the original both the summary at Seattle in,
the American suggests this.>>Okay. So you just said, we’re celebrating
our 25th anniversary. It certainly had a lasting impact. So with this, with building
on the Grammar of Graphics, we’re able to express these statistical graphics such
as this one here that shows the temperature for different days
in Seattle for multiple years, as well as the
precipitation on the size and the weather type as the color. This computational format for the building blocks of visualizations built on
the Grammar of Graphics, but we extended it to
also support things like composition to create
these multi-view graphics. We do this through operations
such as concatenation, faceting, layering, and repeating here to show a summary and the raw data
at the same time. So this static plot already
provides a lot of value, but interaction adds
another dimension to interrogate the data even further. Vega-Lite is the first
language to introduce high-level declarative
abstractions to make interactive charts
and so you can make this chart in a specification
that is fairly short. You can specify
interactive charts with the same ease as you can
specify static plots. Well this specification here includes the visualizations and
all of the interactions. If you want to learn
more about Vega-Lite, you can go to our website
where we have documentation, examples, an online
editor and a lot more. The reason why we
created Vega-Lite was for tools that generate
visualizations. The concrete format is
a convenient JSON Syntax. JSON is native to the web and easy to generate from
any programming language. Because we designed
Vega-Lite in this way, it started ecosystem of tools. For instance, the Voyager
systems as UI tools and many other things,
these Voyager systems. There are also a number of bindings
for programming languages. One that I’m particularly excited about is a language called Altair, which provides an alternative syntax for these Vega-Lite
specifications but in Python. What’s great about it is that
it uses the same concepts. People are building
these kinds of bindings for Vega-Lite in other
languages such as R, Julia, Elm, Scala,
Haskell, and many others. Building a community around this
language has been very rewarding. But it also helped us define our research direction and
drive the tool forward. Speaking of Python as
one of these languages, who here is using
Project Jupyter? Jupyter Lab? It turns out you already have Vega-Lite installed on your computer. Vega-Lite shifts as the default plotting library in Jupyter Lab now. Jupyter Lab has become the most popular data science
platform for Julia, R, and Python.>>Hey, Tom. I’m curious, a couple of slides back when you’re talking about the language bindings, I’m trying to figure out how that works because Vega-Lite
is so declarative. You don’t typically compute
things that are in there. So are the language bindings
mostly just to bind data to an otherwise fairly static
Vega-Lite description, or are people actually using it to compute various visuals
or marks or whatever?>>The bindings, what they do is
they generate the specification. So here this specification, we import Altair and
load the dataset. This part here then generates
the JSON specification. But you have the Pythonic API for it to generate
that JSON specification. Then use the Vega-Lite compiler and runtime engine and so on to
actually render the chart. So all this does is
generate the JSON. Then this JSON gets processed by the JavaScript
libraries that we have.>>But those computation that
you can do in the language, is it mostly just to generate data?>>In this language?>>So if I were to write him Altair program as opposed
to just a JSON file, one thing I could do is find it
to fresh to the computed data.>>Yeah.>>Is there anything else I would
tend to do in those languages?>>So the question
was if you bind it to data that might be might be updating, how would that maybe affect
also the visualization?>>Sure. Let’s run with that.>>Yeah. So right now it doesn’t, but we are certainly working
on automatically when you start binding a dataset
to an Altair chart, then you change that dataset then you could automatically
just rerun it.. So you can make interactive
applications through this.>>Syntax here is just literals. So way to, in Python, programmatically write a literal. I think Rob’s asking
instead of a literal, is there any scope for doing computation and changing that instead of just having
it be able to do?>>So the question is if we’re
specifying here as a literal, but can you actually
do computation in it?>>This is a programming language.>>It’s in a programming language.>>Yeah, right. What would one do?>>You can do computations in Python beforehand before you pass
the data to Vega-Lite. But then all this is doing
is generating the literal. So that’s what it does
here. So it doesn’t do any.>>Purposely trying to keep some of that separation or reason about.>>Yeah, because then in the end, this JSON that you
generate from here, Altair just pass it
on to Jupyter Lab, which then takes care of
the actual execution of it. So Altair actually doesn’t
execute anything here.>>If the Altair construct
is built with a variable, and the variable changes
after the fact of declaration will the declaration itself change as the variable change or is that a static operation?>>Once this is outputted,
you can change the variables. It doesn’t affect it, but that’s
just an inherent property of the Python runtime.>>I was thinking if
that add something on top of the declarative nature of Vega and suddenly
makes it dynamic.>>If you use observable, those are reactive runtime engine, you can also use Vega there, and then you can actually hook into the reactive components of Vega. So then you can make fully reactive.>>Will it change the whole
declaration as opposed to just, unless it’s really differential.>>It’s differential. So we can talk about that more of that
later because there’s a lot more things to talk about. But I think here
the fundamental difference. This is not reactive because it’s in Python because the Python
runtime engine is not reactive. There’s observable which
has a reactive runtime, and then you can also benefit
from the reactive runtime that Vega-Lite or Vega has which I
haven’t talked about it all. But it’s a lot more things
to talk about, but let’s jump to impacts of
Vega-Lite now in Jupyter Lab. We have about 200,000
downloads on NPM but a million on a CDN used
in research project at various universities to teach
visualization design because it has these clean abstractions
featured by Nature, used by various companies. We’re pretty excited about this uptake by
the data science community of this tool that we’ve
built in a research group. But in particular, not just a tool
is I think exciting, but Vega-Lite was also praised by Brian Granger who’s
the lead developer of Project Jupyter as
perhaps the best existing candidate of a principled lingua franca
of data visualization. So that’s part I’m pretty
excited about that really think about it from a language
perspective and not so much from a tool perspective. Okay. So to summarize, Vega-Lite is easy to use
for people because it has these concise specifications,
reusable designs. We can use this in design with
different datasets easily. It facilitates us rapid
authoring for fast iterations, which is important for exploration. I think the most important thing
is that, as a designer, you don’t really have to think about the mechanics of making a chart, but you can focus on the data and the relationships
that you want to show. Primarily, we designed Vega-Lite
for programmatic generation. So we have this
declarative specifications where we are decoupling
the execution and high level domain
specific abstractions. Also these composable building
blocks that allow us to express a large range of graphics
with just small changes. Here this whole space that you are seeing is a combinatorial space. That’s defined by atomic updates
to a Vega-Lite specification. So this describes the space
but just because we can render the space doesn’t
necessarily mean that all the charts here are
useful or are not misleading. So how do we reason about the space
and actually provide guidance? That’s where Draco comes in, where my goal was to
provide a formal model of design knowledge for
automated reasoning in tools that generate
visualizations. Concretely what that means is that, I wanted to enable
automated design and critique which would help people
to author visualizations faster, but also make it safer to create these visualizations by automatically guiding designers towards
effective visual encodings. I didn’t just want to build
tools on top of this, but also make this knowledge that we are building
a shared resource, a platform for
a systematic discussion about design among researchers
and practitioners. So to actually implement this, Draco consists of
three components: The first one is a formal model of visual encodings
as sets of facts. Second is design knowledge
as constraints over these facts and then
the third mystery component that we’re going to talk about later. So let’s dive into
this representation first. What we want to do here is set up Draco to be able to reason
over visualizations, and the specifications that we use
in Draco is based on Vega-Lite. But if you remember,
the goal is to reason about it and we also want to reason within an encoding or
about incomplete specifications. So what I did was to
flatten this representation and that’s the specification format
that we use in Draco. You can see that it fairly matches
to the Vega-Lite specification. As a representation, we use
Ancestor programming which is similar to Prolog or
other logic programming languages. What we can do with this is
express this flow that you’ve seen before from data to
visualization in the end. But there’s usually something
more that we want to capture when we want
to do recommendation. Particularly, we know
something about the data, some properties of
these fields and we want to use that to give
better recommendations. Also, visualizations and
[inaudible] created for themselves. There’s a user who looks
at them and they have certain level of expertise
and visual literacy. They also have a task in mind, what they want to do
with the visualization. We should take that
into consideration when we’re making recommendations. This context is usually
implicit and not formalized. So in Vega-Lite, sorry, in Draco, I have extended Vega-Lite
to capture the context of the user’s task and debate the data properties through
additional attributes. Okay. So this is our representation. How do we make the computer
reason for us about this and how do we bridge this gap between the representation
and this reasoning? This is where this design knowledge
as constraints come in. These constraints express
preferences that are validated in perceptual experiments as well as general visualization design
best practices. There’s three sets of constraints. The first ones are
Attribute Domain constraints. They tell us what values can
we assign to attributes. For instance, the mark type. It should be one of
bar;line;area;point and we can express this as
constraints that look like this. Let’s say there’s a mark type and the mark has to be of
that type mark type, and it should be
exactly one of those. We have formalization
of all the constraints, but I’m just going to
show you the natural language version of these for now. Of course we don’t have
just constraints for the mark type, but also an encoding type
aggregation channels, and so on. Now that we know what values
we can assign to attributes, we need to know
what combinations are actually valid and this is what
the Integrity constraints are for. They constrain the combinations to valid visualizations that satisfy
the rules of visual design. These are hard constraints, meaning, they can never be violated. I implement about 70
hard constraints in the industry Draco system of things like only continuous fields
can be aggregated. So these are things
you can’t violate. Lastly, we have
Preference constraints which describe preferences within
the space of visual encodings, of these valid encodings
as soft constraints. Here are few examples. We prefer specifications with fewer encodings,
because simpler better. We don’t want to use
aggregation or prevent overlapping marks and these, we have about a 150 in these and because these are
soft constraints, it means they can be violated. But if you violet them,
you incur a cost. Then what the system could do is
optimize the sum of these costs. You can see that here the cost for the overlapping mark is the highest. So how do we use this?
You can use this to, for instance, to
visualization recommendation.>>Sir, a question.>>Yeah.>>Did you assign the costs
to each of those [inaudible].>>Hold that.>>Okay.>>Hold that thought. Yeah.>>Did you do anything to validate the consistency of the hard
constraints as a [inaudible]?>>The question was
did we do anything to validate the consistency
of the hard constraints? In terms of what’s?>>So they can be inconsistent,
logically inconsistent.>>If they are
logically inconsistent, we wouldn’t get any answers and they would know that pretty quickly.>>We might get any answer. We might validate everything
if they are inconsistent.>>So we might accept everything. The logic that we use is based on stable model semantics and in stable model semantics,
we don’t have that issue. I’ll talk more about
that later. Yeah.>>I’m also a little
surprise that the cost of the soft constraints
is not task-specific. So the occlusion of marks, let’s say if my task is
to look for outliers then marks occluding
each other doesn’t matter because the outliers
will still stick out. But if my task is to, let’s say do a comparison, then occlusion of the marks
is really bad, right? So it seems like the cost varies depending on what the end task is.>>Yeah. So the question was that the costs are
not task-dependent. Well, they actually can be. So you can write a constraint
that says if this task is this, don’t do occlusion and give
it a very high weight, and you could even give it
a negative weight for another task. It’s always a trade-off between different constraints and
the weights gives you a way to define that trade-off
very explicitly, but you can express
all of these things. Usually the way you do this, you have very general
high-level constraints, and then more specific
low-level constraints that override the
high-level constraints. Okay. So how do we actually use this? We can use this for
recommendation where you can formulate this problem as
finding optimal completions. So for instance, we could say we want a visualization
of the temperature. Translate that to
the set of constraints, combine it with our knowledge
base and use a solver to find the completion such as this chart here that shows us
the spread of the temperature. But this is only the optimal
completion according to those constraints and that
depends on the exact model. You and I might disagree about
the specifics of that model, but what’s great about this is
that we now have formalization, so we can actually have
a conversation about it. Also, Draco is primarily designed
to support an interactive loop where the user can
refine what they want to do because no model is complete. So here for instance, we might say this is not
a great chart because we can’t see how much data
is at one of these points. So the user could come in
and revise this and say, “I want a Binned temperature.” Then we add that fact and the constraint solver might come back and say, “Oh well, he is in chart.” But if you remember
the soft constraints from earlier, we had
this one that said, “prevent overlapping
marks” and actually has a higher weight than adding in other encoding that
use an aggregation. So in fact, what the optimal
completion of this one is it’s a histogram that really shows you the spread
of the temperature, but also the counts in each value. So with this iterative approach here where it’s really
an assistant and not trying to give you
the optimal visualization because that’s not possible. You get the benefit of scale of
the computer that assists here, but also not lose the benefits
of human expertise and intuition that really drives what
the recommendation is doing. At the core of this
are these weights. They’re are essential. So
where do these come from? Well, the state of the art method for this is something called
graduate student descent where we essentially fiddled
with the weights until we got the results that were
consistent with the research. But as I came up with a more principled way that
uses machine learning, which is the third mystery component. So here we want to learn the trade-offs between
different constraints from data. What kind of data do we have though? Well, ideally we would
have data of the form of some kind of partial input to a complete specification
because that’s the dataset or that’s the thing
that we are recommending. But there’s very little data
that we could learn from that has this kind of form. There exists though
an untapped resource of perceptual experiments that measure people’s performance
on a particular task. What’s cool about this is,
from these infer pairs of ranked visualizations where you know that one has
a higher score than another, and you do that if the score
is significantly different. These pairs are great
because they are ordinal. Meaning, we can combine
the results from different experiments as long as they somewhat
measure the same thing, but the measurement doesn’t
have to be on the same scale. We actually did that and
learned a Draco model from two datasets: One collected at the University of
Washington and one at Georgia Tech. So how do we actually
learn from this? First, we have as training data, these pairs of ranked visualizations where one is better than the other. We featurized this by
counting the number of violations of each soft constraint
for the visualizations. So we get are these two vectors: Positive and negative vector where each element is the number of
violations of the soft constraint. So the length is always the
number of soft constraints. That’s a way to engineer features. Then what we want to do is
separate those vectors and find a way to get the maximum
distance between them. We do that through a method
called learning to rank where we’re essentially
maximizing the weighted margin. What’s nice about this
is that this vector W which are the weights, these weights can be directly put back into the constraint program, and then we can use
the constraint solver to find the optimal completions again. It turns out if you write this in second line that’s like
three lines of code. Maybe a few more for
all the setup and stuff, but it’s a fairly standard method
that works really well. Okay, so we came up with
this way to from learn weights from data and then build
the Draco Model based on that. But what I think is exciting
about Draco, in general, is that it’s not
just this one tool or this one model but it really sits at the core of visualization
research where there’s disparate areas
of the theory side where people are thinking
of models of visual design. What we can do with Draco
has describe those or learn them from data and
systematically improve them. There’s the empirical side
where we could use Draco not just to model knowledge but also to
figure out what do we not know. Where are visualizations
where our model that we currently have cannot
distinguish visualizations? We can use that by just looking at the constraint program
and then potentially automatically designed experiments to fill those holes in our knowledge. On the system side of
the visualization community, we can help build
automated design tools and even translate research
into practical tools faster. So what Draco and Vega-Lite
together have helped us to do is create visualization tools where users can rapidly
create good designs. But I started automotive in this talk that we need
tool-set not to just provide guidance but also work at the size
that people need to work with. This has been a challenge for
a long time because Schell overwhelms existing tools every stage of the visualization process. So concretely there are two
distinct challenges when we want to visualize interact with billion
record datasets in real-time. Big data is overwhelming
and it is slow to process. So let’s look at these
two challenges separately. First, how do we visualize
a billion records? Well, the first thing we
could just do is plot it and Daniel’s favorite chart which is the
big data scatterplot where it’s really
hard to see anything because we just have this overlap. A standard solution is
to sample but it’s really easy to miss outliers here and probably still hard to see
the pattern in the data. A visualization where
the scalability depends only on the size of the
chart and not the size of the data is binned aggregation
where we’re essentially discretizing space and counting the number of requisite
fallen to had been. Now you can see that we’re
in building 99 here. Jayco could help us here
to do this for instance. What this binned aggregation
does is that it decouples the visual complexity from
the raw data through aggregation. We can use this to
visualize other datasets such as this one here that shows the delays for different
flights in the United States. This is a data set of
about 180 million flights of all commercial flights
since 1987 in the US. So by binning the variables and
counting the records in each bin, we get a sense of the shape
of the distributions of how much are flights delayed, or what times of day do they leave, or how far do they go. So visualizing a large dataset
this way isn’t difficult, but if you want to interrogate the data further and not just look at this one-dimensional distributions,
we need interactions. The challenge for analysis
tools is to manage the amount of data and computation
while remaining responsive, and so I addressed
these challenges with my work on scalable visualization. Concretely, the challenge here is, how do we interact with billion
record datasets in real time? I want to be able to do is operate
at the speed of human thought. So why is this real-time
parts important? Well, if we have delays in
Interactive Exploration Systems, we break the perceived correspondence between actions and responses. We reduce engagement and delays can also lead to
fewer observations being made. Poor support for
interactive exploration may even skew the analyst’s attention towards convenient data with all the implied selection biases
that come from this. So to deduce this, I’ve built a system called Falcon where data expansion looks like this. Let’s start with this dataset with the charge that
you’ve seen before, I want to know, what makes flights arrive before their scheduled time? The way we can do this is
by filter in this chart and select some subset of the data and engage in something
called Trust-building, where you can look at, let’s say, the early flights and
you see that it’s the long-distance flights
that are arriving early. It makes sense because these
are the flights that can make up for delays that might
have occurred when they took off. If you want us to look
at the opposite and see what makes flights very delayed, you see more movement in this departure time diagram
indicating that it’s the long-distance
flights that arrive, I’m sorry, it’s the late flights
that arrive very late. Then also makes sense because if there have been delays
throughout the day then the flights that are
coming later can’t take off early or can’t take off on time. If you want to see
whether this effect also generalizes to
these long distance flights so very delayed flights and very long flights we
can create two filters. So first, let’s look
at the flights that are just 60 minutes
delayed and then very far. What you’ll notice
is some movements in departure time diagram
where the effect goes away if we’re looking at
the long distance flights, but if you go to the short flights
again you see this bump. So this effect that we’ve seen does
not generalize and interaction helped us find this pattern that exists across multiple dimensions, even though we only had
one-dimensional charts. But in order to do make
this insight here, we actually used only a fraction
of the dataset and particularly, it was 0.008 percent of the data. So it is important to be able
to interact with data at scale. So now you’ve seen it, it works. But how can Falcon
actually be so real-time? How can it be so fast? Allow me to answer the question
I just asked and show you the interaction solve it
and show the log of this. So you see a visualization of
the interactions, the brushes, it first started in
the arrival time diagram and then went to the distance one. You can make a few observations here. Started with one chart, called this the active chart which is the one that you’re
currently interacting with and we only switched once
which was the active view. So there’s only one switch,
but at the same time, even though there
was only one switch, we did a lot of
these brushing interactions, these single movements or
just by a single pixel. So just looking at this, what we ideally would want is to prioritize the brushing latency
over the view switching latency. That’s actually a principle
trade-off that we’re making here. We’re making it because we know that these brushing interactions
are much more common, but it also turns out that
people are much more latency sensitive to brushing interactions than they are to view switching. It’s okay if there’s a delay. So by battling on
this principle trade-off, the idea is to support
brushing interactions with one view and then recompute when the user switches which
view they are interacting with. Let’s look at exactly
how I implemented that. When we started brushing
in one of the views, Falcon serves the request from a data cube which precomputes
all possible aggregates. When these then switches, which really they’re
interacting with, Falcon computes a new data cube and there is a lot of
delay while that happens. What’s great about this is that
the data cube is constant size, it only depends on the
number of bins and the number of pixels but
not the size of the data. Which also means we can do these brushing interactions in
constant time and there’s no bias towards certain part of the data and we can do
it entirely in a client. What we have done here
is use aggregation to decouple interactions from
queries over the raw data. This might ring a bell and
we did the same thing in these binned aggregate visualizations but there wasn’t a Visual Sight
inherits in the interaction space. When these are switched to
view, we have to recompute a data cube which requires
a password for the data, but as I said earlier
these view switches are rare and users are not as
latency sensitive to them. The idea of using data cubes is quite attractive because they
decouple queries from the raw data but the big problem
is that the size of data cube grows exponentially
with the number of dimensions, particular it’s the product of the number of bins
in each dimension. One way I run this is
what people have done in nanocubes which is
a specialized data structure that leverages sparsity
in these cubes. These cubes are still too large for the browsers and take hours to build. Another idea is immense, which still assumes a dense cube representation
but it decomposes to full cube into overlapping cubes
which are of low dimensionality. So what that means is we
could have a cube for each pairwise interaction
but that also limits the interactions to only
one brush at a time and the bin resolution and
still it takes hours to build. Falcon makes a fundamentally
different trade-off and limits interactions to only one view and so we only need a linear number of these small cubes. They’re in fact so small that we can just build them on the flight, which means when the user switches,
we can just recompute it. There’s actually no large
recomputation necessary. Computing these cubes just requires a SQL query over
some database system. So because of these trade-offs, I was able to build this demo
here where you’re looking at a dataset from
the Gaia Space Telescope that flies around the earth
and record stars. This is about 1.7 billion stars terabyte of data and
all these visualizations are running in a browser on this computer
and it’s buttery smooth because we decoupled
the introductions from the raw data. So with this, we’re able to interact with datasets
that are three orders of magnitude larger than
what was possible before in this real-time regime. When I put this code online, platform engineer’s Stitch Fix found it and actually
integrated it into their production environment
and he called me up and was really excited about it. He said, “With Falcon, it feels like I’m really interacting with my data.” Something that they weren’t
used to before because none of the visualization systems
work at the scale in real-time. So this is Falcon, and Falcon built
on the assumption that we can make a scan over the dataset when the user switches which
view they’re interacting with. But what if the data
is too large to even query it in reasonable time? So here, we can’t even really scan the full dataset and so making aesthetic plots can
already take too long. As bad as it is to wait for
a single chart to render, this problem gets
exuberated if we’re in exploration where we’re making
one chart after the other, where the insides from one depend on something
that we’ve done before. Here, latencies reduce engagement
and lead to fewer observations. There’s a well-known
trade-off that we could make here which is very attractive for data analysis and that
is to use approximation, where we’re trading
off some accuracy for huge gains and speed and
this is a well-known trade-off that’s used in number
of errors and there has been a tremendous amount of interesting and impactful
work to reduce errors. But in the end of the day, these approximations are still an approximation with
some possibility for an error and in this data exploration scenario
where there’s dependencies, because there’s a small chance
for every chart, the chance of an error having at least one error
is actually quite large, and it gets larger as
we look at more charts. So at the end of the day, users
still have to make a choice. Do they trust the approximations or wait for everything to complete? Many are not willing to accept
the possibility of errors, and so they have to
wait for everything to complete and we found that actually at studies here, here at Microsoft. Well, users had to make this
trade-off until three years ago. I do my internship here at MSR, we came up with this idea of optimistic visualization
where the idea was to think of these issues with approximations from
a user experience side, not so much as trying to reduce the errors but like what
is the fundamental issue? The fundamental issue is trust, you can’t trust the results. So to address this, we started with
the approximation that these approximations
are mostly right, but we didn’t offer a way to
detect and recover from mistakes. A particular analyst
would start using the initial estimates
but then the system computes the true results
in the background, so the analyst can continue the exploration using
the approximations. But then when the system has
finished the computation, it can tell you this, “Oh, there was something that
has changed fundamentally”, and so you know that if there was something
wrong, you will know about it. What this gives analysts is a way to use these approximations
which are quite attractive because they are
so fast but still trust them. So we implement this idea in system
called Pangloss where you have a visualization interface that also has these visualizations
of the uncertainty, that gives you a sense of
the potential of errors, as well as a sidebar that shows
you a history of previous charts. Once these have finished and changed their color
from orange to blue, you can click on them and look
at the difference between the approximation and
the precise answer, as well as looking
at the true answer. We evaluated this system in three case studies at Microsoft where we had people bring
in actually their own data, we wanted them to explore data that they are
intimately familiar with, and we found that approximation works really well for these analysts. Seeing something right away at
first glimpse was really great. But there was also
this need for guarantees. “With a competitor, I was willing
to wait 70 to 80 seconds, it wasn’t ideally interactive but it meant I was looking at all the data.” and this was important for them. They had meetings where it was
important to not say, “Oh, yeah. I think this is the right answer.” They had to have the certainty. The optimistic approach
worked really well for them. One of the participants said, “I was thinking of what to do next, and I saw it had loaded so
I went back and checked it. The passive update is very nice
for not interrupting my workflow.” So this is optimistic visualization and I’ll help you
talk more about this and the implications of this on
the user experience after my talk. This concludes the discussion of the four systems that I’ve
developed for my PhD. I followed the mission of, “Develop tools for data
analysis and communication that richly integrate the strengths
of both people and machines.” In Vega-Lite, what that means is, there’s a language that helps
you author visualizations. The concepts that we have have cleanly [inaudible] code and then
eventually to the the chart that you make. But by designing for programmatic generation instead
of just human authoring, we enabled new tools such as Altair. With Draco, we formalized designs so that tools can apply
this knowledge automatically. But by both and Draco, we can also use it to
inform our understanding of visualization because we
now have formal model of it. With Falcon and Pangloss, I implemented visualization system
that scales large data, and they leverage
a deep understanding of how users interact
with visualizations, which has enabled me to think
different about the way we design and optimize systems. For the future, I’m excited
to see how we can take these ideas and actually make them fully available
in end-user tools. The challenge here is that
what we need to do to get there is end-to-end integration into the tools that
people are already using, and so this is a research frontier
for Vega-Lite and Draco. For Vega-Lite, I am
currently working on reducing redundant computation
to make it scale to larger data. What that means concretely is
that we want to take these data flows that describe a visualization and we have an optimizer
that takes them and reduces redundant computation, as well as reorders computations
to make it more efficient, and so we already have that working. We’re also excited to take
some of these transformations and automatically push them
to scalable back-end systems. We have some prototypes of this and ask me more after
my talk about this links. On the Draco site, we showed the value of
formalizing visualization design and it helped us start a conversation in the
visualization community about it. Now is the time to tap into
the potential of Draco. Together with people at UW and Apple, I’m working on tools to browse, update and compare
this Draco knowledge basis, this model that we built. In particular, we’re
building UI tools where we want to be able to
evaluate the impact of new perceptual models
and really improve the model with the hand
tuning much better. The goal then is to also integrate
Draco into tools such as Altair which could also help us
to collect feedback from people who are actually
using the recommendations, and then use it to systematically improve the model and
continuously improve it using active learning methods and I’m looking for
collaborators there. I’m set to also see how we could
build domain-specific models. Draco currently targets
these single view specifications, but we could build the same for multi-view graphics or
recommending interactions. We talked about big data
already a little bit. We could automatically recommend
uncertainty visualizations, if we have a model of
what good ones are. Or use it for education, if we know we have a model of how difficult is it to
understand the visualization could automatically recommend
visualizations that are targeting a particular
level of education. Finally, I’m working with researchers at Northwestern
and we’re here to see how we can expand the task
model that’s currently in Draco. We have a very simple one that
supports values and summary tasks, but they exist more
sophisticated task taxonomists which some people in
this room have worked on. We would like to actually put
them into this formal model. Ultimately, what this work on
Vega-Lite and Draco should culminate in is some kind
of runtime engine for visualizations where analysts can look at the data
regardless of the scale, and system should automatically
help them optimize the execution as well as
the visual representation. What that means is to automatically
apply Falcons optimizations combined potentially
with approximation even. I’ve written about this at
a workshop a couple of years ago. The core idea here are these user and system aware
optimizations where we could transcend see that we automatically recommend
aggregation for large data and approximation for
very large data or take into account that if the
user’s on a mobile device, they not only have
a smaller screen but also have a slower network with
more latencies and less memory. So, can we take all of
these things into consideration when we’re providing
recommendations and charts? This might sound like
a crazy idea because usually we don’t think
in the tools that we’ve built about taking specifics of the user into
account or the system, and while let alone both together. But with Draco we can actually reason about how the analysts
interacts with the data. In all of these things
that I’ve shown you, I’ve demonstrated the benefit of declarative specifications and automated reasoning over
these specifications, in particular for visualization. But visualization is only part
of broader data analysis. So, I’m excited to see how
we could use the ideas, of for instance having a declarative specification
in Vega-Lite and then reasoning about it in Draco would do the same thing
for analysis more broadly. If you describe the analysis
process declaratively, we can reason about it and
automatically propagate errors, or suggest certain data to look at, have richer visualization
recommendations. Wherein people if they
accidentally used their test data for training
in a machine learning model. These are things that
could be expressed as a query over
a declarative specification. So, as exciting as it is to
build tools on top of this, Draco allowed us to have a different conversation about visualization design because
we have a former model of it. Similarly a former model of the
analysis process might allow us to have a conversation about different analysis styles and what it really means
to do the analysis. So, if you’re excited about
that, I’m happy to talk more. Thanks for your attention.>>May be a minor point, you mentioned through your
presentation that you aim at make systems and ideas. These are used by people, but it sounds like people
equals data scientists more than anybody you grabbed from
a supermarket let’s say. So, how will you characterize that people you talk about
on your presentation? I guess, as a follow-up you
mentioned that you want to leverage human capabilities when people do really well and make it
part of the process. You see as well a couple
of those human properties that you think are
essential that [inaudible] have that you want to leverage?>>So, first question was when I said people I mostly
focused on analysts here. Yes, I’m mostly focused on analysts because I only have one PhD type. A lot of the techniques can
also apply more broadly. So, in particular Vega-Lite is not just being used for by data analysts but also by journalists
and scientists and others. Visualizations are made
for the general public. The recommendations on
Draco could also be used for anyone not just
for data analysis, but the tools I’ve built mostly
are targeted data analysis. That’s been the kind of my target audience especially for the scalable visualization stuff, those other people who are mostly
face to these challenges there. The second question
was about combining. I said that I am combining
the strengths of both people and machines and why that. I think of visualization as
one way to look at data. Well, data exists in computers and we somehow want
to get a sense for the data. We can look at a table
of the data set but tables only give us so many values and you have
to actually actively read them. Whereas visualizations provide
a really high bandwidth interface from computers to two people. We can leverage all are
a powerful vision system and to quickly see patterns and see
problems [inaudible] in data. Whenever we’re doing analysis, the people are ultimately once we’re deciding what you actually
want to do with it. People are the ones who can have domain knowledge and
have an understanding of the goals and potential errors and
other factors that might influence the data
set that you’re seeing. They’re the ones who understand
if something is completely wrong. So, I think we always need this. You always need
the human intervention. So, I see data on schools as
assistants to the data analysts. They should help us make
the tedious processes, the tedious steps easier. So, I think when we started
having computers they were able to compute numbers
and quickly add numbers. We could have added
those numbers by hand, but I think computers
are much more scalable and do it much more reliably. So, we should diverse that. Today, with tools like
[inaudible] for instance, we’re able to automate some of the hard parts of
exploratory analysis. Exploratory analysis being
this process that you go through when you’re first getting a data and you want to
get a sense for it. There’s really two challenges. The first one is in order to do the six point analysis you
want to look at a lot of charts. So, that can be quite tedious
to make all of these charts. Second, you want to follow
the best practices of doing this analysis which
is to start broadly, looking at all
the univariate summaries before diving into
specific questions. But it turns out that’s not how when we look at data we
actually usually operate. If you look at something
and if we see something interesting we dive
into those questions. We’ve seen that in classes
and in interview studies. So, Vigor there helped to automate
the tedious parts by giving you a gallery of
visualizations and also this gallery shows
the univariate summaries first. So, here we have this automation of the tedious parts and guiding people
to follow best practices. Draco did something similar
for visualization where it automates the tedious parts of specifying all the
details of a chart. But it also helps you
follow best practices. Similarly for Vigor-Lite and
similarly for all these else. So that’s kind of how I see
these two working together. Put a lot of words for a question but I think you get a sense for it.>>Thank you.>>I kind of want to put that
exploratory analysis on its head.>>Okay.>>How do we design or presentation? So, you can imagine Power BI
Tableau [inaudible] billions of dollars
and all these tools, kind of leave the design aspect
of [inaudible] we say, it’s on you the office to
like make good visualization. That’s on you the consumers of those reports that actually
understand what’s going on. So, how do you think your systems can
help to may be solve the [inaudible]?>>So the question was how do
we design for presentation? Just a little bit of context, in the visualization that we can do, there’s these two things that we’re designing for exploration
or exploitation, or for presentation and analysis. These are considered
two separate things. I actually personally think
those are closer than they are often are perceived to be. I think, a lot of the tools
that have built can help with the presentation
aspects as well. Vega-Lite visualizations
are highly customizable. You can create themes, you can customize exactly what
you want the x’s to be, and the legends, and
marks, and so on. Another problem that you pointed
out was the blank Canvas problem, where if you’re making a chart, you sometimes don’t really
know where to start. I think, Draco can be
really helpful there.>>Especially with Tasks.>>With what?>>With Tasks. So the author knows, “Hey, I want to be able to
see outliers in this data.” That can be a really
powerful primer to say, “Oh, I’m going to look at
this class,” and he’ll say, “There askew, maybe bar charts because they are
aggregated bar charts, scatter plots, probably [inaudible].”>>So you brought up
this point of there is a task and if
the user knows the task, then it can provide recommendations to
visualization that show that. What’s neat about Draco is that
because of the constraints, information can propagate
in any direction.>>So you can lint.>>You can also use
it for linting which actually people at MIT are doing now. But this multi-directional thing is pretty great
because you could say, for this dataset, what is
the optimal visualization? For a particular task, what’s
the optimal visualizations? For a particular task and data, what is the optimal visualization? Or for a particular visualization, what tasks does it solve? You could go in any direction
here as long as the weights and soft constraints and hard constraints
are set up correctly. Of course, that’s an assumption here. So I think it can be
very powerful there. The biggest challenge for Tasks I
think is to figure out how do we actually get to that
because we’re not going to have a drop down in the UI that says, “Which task do you want to do today? I would like to do the summary task.” I think there’s a huge potential for inferring these
partial specifications from natural language queries, for instance because language is ambiguous and there’s usually some intent that you
can derive from it. We could use that to then provide
recommendations for later. So I think Draco could
be very powerful in here because we wouldn’t have to
build an end-to-end model, where we say natural language
to visualization, which has a problem that people have tried to solve, but it’s tricky. But instead, it could do
natural language to partial input. Just think of it as
a much more solvable problem and then use Draco for the rest, which has encoded
all this information and knowledge about visual design. So we go back to the beginning of your question about presentation. There’s actually something
interesting about not thinking of them as
something too different. Often think of exploration actually
as presentation to myself. Sometimes my future self,
sometimes my past self. When we’re presenting
a chart to someone, they are going to look at it
and do some inference on it, so they’re also doing analysis on it. So I don’t know the visualization itself would be vastly different. The process of making
the chart is different. In an analysis tool, you make more charts and you have to actually know that you’re
not making wrong insights. Probably, you also have your data
processing stuff there. When if you have a visualization
that somebody made for you, they probably tried to
tell you a story and they hopefully made sure that that story is consistent with the data then there is not
misleading the reader. They should be closer
together than more separate.>>So I just want to return to
part of what you said about Draco. So to train the rule weights, it sounds like you systematically reviewed the literature and
coded it as these pairs. So I want to hear a
little more about that. So like how many papers
are we talking about, and how straightforward was it
to decide to encode it as pairs, like did it always work out
or sometimes did you say, “I don’t know if this
paper’s really.” You know what I mean? Was
it a good fit all the time or was it sometimes awkward?>>So the question was about how did we actually built
the Draco model, and we reviewed the literature, and you said to built the system? Let me distinguish
a little bit between what we could do and what
we have actually done. So I talked about
the machine learning part as one way to get the weights. The model that we
ship actually doesn’t use those trained weights
because there are no comprehensive datasets
that are large enough to actually
train a full model. So the way we tune the
weights is by tuning by hand. But I think the more important
aspects actually of Draco are the hard constraints and soft constraints. So where
did those come from? The hard constraints
are formed a lot by just the experience in our group of making charts,
building chart tools. The three came out of our group, the prefuse and prodevis from Jeff, then the Voyager system. A lot of the knowledge
that we’ve gained there made its way into Draco. A lot of the rules in Draco are
taken pretty much directly from Compu SQL which is the underlying
engine that’s under Voyager. So the rules are coming from
a lot of these systems.>>I see.>>Most of the rules actually
fairly obvious rules, like you can only make a bar chart
if you have one continuous. You have continuous, continuous
and make a bar chart or discrete, discrete you can make
a bar chart, and sends out. Not many people have
actually written papers about these super obvious things. But those you need to actually know about if you want to build
a system that works on it. So the majority are
these fairly obvious things. A lot of other rules were
then the best practices of visual design comes from
our experience with building tools as well as looking at
things like Tableau, and ggplot, and others, and taking inspiration from those. So following the best practices. Then there’s the rankings, where we use effectiveness
rankings, building on work, starting with Cleveland and McGill, and APT, and lots of other things
that came afterwards. I can’t give you an exact number of how many papers
we have looked, but suddenly we’ve
read the literature and made sure that it’s
consistent with those things. I haven’t really found many
conflicts between different papers. Which I think is partly because we really looked at the rules
that are generally applicable. There’s also an
interesting conflict in the visualization research
between the really, really low-level perceptual
rules and perceptual research. We know and care if a pixel
here and a pixel there, this is how different they look, and if they’re here, this
is how different they look. The very high level of, don’t make 3D pie charts. There’s very little that actually
connects those two right now. I think with Draco, we can start to have a conversation
and figure out, how do they actually,
where do they meet? How might we be able to use
these low-level rules to infer high-level rules that
don’t make 3D pie charts. But a lot of the knowledge that goes into systems right now is
following best practices.>>Correct me if I’m
wrong, the UDub and NYU states that you referred
to was specifically, in a limited domain, you actually had these [inaudible]
like studies where they would do comparisons between
X and Y and ranking. So for those two papers, you were able to code those and those I think were one
of the ones you learned on.>>Yes. I learned the
one from UDub was on scatter plots with
three encodings, where two of them had to be X and Y. The other one was size, or color, or I think row. It only looked at
six or so task types, which you couldn’t
have to ball them to these two tasks types
of somewhere in value. So you can see that’s a tiny subset of all the possible
charts that Draco can make. Let alone what I
think I like to make.>>But then you were able
to use those distances. So it wasn’t we’re going to scan 100 different papers
and derive those.>>So this is for
the machine learning part, for the machine learning part. For the actual model that we have
that has the hand-tuned weights, those are mostly based on
the Voyager weights plus a little bit of
additional weights for like the task stuff that
Voyager doesn’t even have. There, what we did
is generate visuals, generate recommendations, and
look at them in the UI tool, and see whether they make sense. Keep fiddling with the
weights until it does. So to make that process
to look more principled, we’re working on this Draco tuner
that I alluded to a little bit, where we’re starting
to build a test set of examples that we want to be, where we want the order
to be a particular order. We’ll make it very easy to create new pairs and to then also
change the weights quickly, change the weight,
and quickly see what effect does that have
on the recommendations. When I built the model that
we had in the Draco paper, it was really mean
changing of weights, and then rerunning some process, and look at a few examples
in like a file browser. The process wasn’t
very comprehensive, but we really need your items
to be able to do that. I really see Draco as the
infrastructure that we need to do that. I think it’s a very clean way to model visualization knowledge
and built knowledge on top of. But I don’t think
the model that I’ve built, the particular one,
is the final answer. It certainly isn’t. I know
a lot of problems with it.>>Just a thought. See this helps for a lot of things
even better for Draco. I always get antsy when people try to automate design and
thinking that this knowledge, this thing that you can commit to paper and their rules
when this is, really it is ambiguous very thing. It feels like what you’re
formalizing there is perceptual properties of people. It’s almost constraints seem to
refer directly to people really are have difficulty perceiving differences between things that
are overlapping or things, their color or too close. It feels that you are
more close to include perceptual lessons than
the sign itself. You might.>>I agree.>>I’m trying to think how to give it more steam underneath to
help formalize because as we discover more about
the way we process information, perhaps relative to the task, Draco will welcome those. Whereas, the sign is
such a fickle thing, and I can see a system that really put them on
rails as to these are the proper [inaudible] you need to produce because they are perceptually
good versus I don’t care, I just want pie charts.>>A good designer knows
when to break the rules.>>Yes. So it’s not even a question. It’s perhaps a conversation to have about refining the language you use may help you focus on
goals in a different way.>>So your point, I’m just going to repeat it because
it wasn’t a question, was about that Draco doesn’t
really encode design, but more what it encodes are these perceptual rules
and best practices.>>It’s not more of
a question, but a statement. Do we think that it’s encoding design or truly actually perceptual? What we
know about perception.>>I’ve coupled with this. I’ve come for this talk
and when I talk about it, I talk about design and
best practices as encoding those. But really, I think
it’s encoding rules of perceptual making charts. Because we’re mainly targeting the exploratory analysis scenarios or the data analysis scenarios.
Data analysis scenarios. The process of designing
a chart is usually a data analyst in a notebook
making a chart, designing a chart. I think of those two
is interchangeably. So I think of them person who makes a visualization as
a visualization designer. If we’re really thinking
about design as this creative process where you’re telling a story
and really trying to find colors that are invoking certain associations or maybe you try to intentionally make certain parts
harder to read off the chart, those are all things that are far outside of what Draco is targeting. If Draco could help you to say, “Hey, your breaking a rule here.” But then you as designer
you can say, “Yep. Thanks for telling me. I know. I’m glad you told
me, but now I know.” It could help designers
who are not as experienced to know that
they’re breaking a rule. You can even make a model that
breaks certain rules intentionally. Design is and remains a process, a creative process, that I
certainly do not want to automate. Again, the goal for all
of these tools is to be an assistant that helps you with the tedious parts and
warns you if you’re doing something that doesn’t
follow the best practices. You can do whatever you
wanted with those warnings. But always, we can
provide those warnings.>>Well, we got to the end unless
they’re any other questions. Let me check one last time. No one. No more questions.>>Okay.>>Let’s thank our speaker.>>Thank you.

Related Posts

Creative Connections: Brentside High School, Ealing (Partner, Year Two)

Creative Connections: Brentside High School, Ealing (Partner, Year Two)

I’m Mike Roddy, I’m the Arts College Director at Brentside High School in the London Borough of Ealing. Art is
Behind The Scenes: “Beauty and the Beast” @ Taylor Performing Arts Center ~ Brett Lark
Flint School of Performing Arts July 2018 Video Newsletter

Flint School of Performing Arts July 2018 Video Newsletter

July is a great month at the FSPA the FSPA offers lessons for all instruments and voice you can take

5 Replies to “Visualization for People + Systems”

  1. The next gen AR/VR operating/hardwired-AI-operator and "meta browser" I've been personally dev'ing since 2014,
    "MEnU" for Hololens and FPGA core PCs & super smart devices,
    replaces windows with scalar
    "Bloch bubbles" and tesseract "Text clouds", bubbles within bubbles.

    This is necessary as people find out quantum computers are vital for quasi crystallized holograms, with 0 lag with 0 point,
    will replace the Internet with the Neuronet– made up of multiple Internet-of-Things (IoTs).

    The SUPER-OS features both classic Apps and newer quantum-sensor/tensor based "mApps"; micro neuro nets.

Leave a Reply

Your email address will not be published. Required fields are marked *