Transcript of a discussion on how data analysis services
startup BlueLabs in Washington helps presidential campaigns better know
and engage with potential voters.
Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.
Dana Gardner: Welcome to the next edition of the
Hewlett Packard Enterprise (HPE) Voice of the Customer podcast series. I’m
Dana Gardner, Principal Analyst at
Interarbor Solutions,
your host and moderator for this ongoing discussion on business digital
transformation. Stay with us now to learn how agile companies are
fending off disruption in favor of innovation.
Our next case study explores how data-analysis services startup
BlueLabs in Washington, D.C. helps presidential campaigns better know and engage with potential voters.
We'll
learn how BlueLabs relies on analytics platforms that allow a
democratization of querying, of opening the value of vast big data
resources to more of those in the need to know.
In this
example of helping organizations work smarter by leveraging innovative
statistical methods and technology, we'll discover how specific types of
voters can be identified and reached.
Here to describe
how big data is being used creatively by contemporary political
organizations for two-way voter engagement, we're joined by
Erek Dyskant Co-Founder and Vice President of Impact at BlueLabs Analytics in Washington. Welcome, Erek.
Erek Dyskant: I'm so happy to be here, thanks for having me.
Gardner:
Obviously, this is a
busy season for the analytics people who are
focused on politics and
campaigns. What are some of the trends that are
different in 2016 from just four years ago. It’s a fast-changing
technology set, it's also a fast-changing methodology. And of course,
the trends about how voters think, react, use social, and engage are
also dynamic. So what's different this cycle?
Dyskant: From a voter-engagement perspective, in
2012, we could reach most of our voters online through a relatively small set of
social media channels --
Facebook,
Twitter, and a little bit on the
Instagram
side. Moving into
2016, we see a fragmentation of the online and
offline media consumption landscape and many more folks moving toward
purpose-built social media platforms.
If I'm at the
HPE Conference and I want my colleagues back in D.C. to see what I'm seeing, then maybe I'll use
Periscope,
maybe Facebook Live, but probably Periscope. If I see something that I
think one of my friends will think is really funny, I'll send that to
them on
Snapchat.
Where
political campaigns have traditionally broadcast messages out through
the news-feed style social-media strategies, now we need to consider how
it is that one-to-one social media is acting as a force multiplier for
our events and for the ideas of our candidates, filtered through our
campaign’s champions.
Gardner: So, perhaps a way
to look at that is that you're no longer focused on precincts
physically and you're no longer able to use broadcast through social
media. It’s much more of an influence within communities and identifying
those communities in a new way through these apps, perhaps more than
platforms.
Social media
Dyskant:
That's exactly right. Campaigns have always organized voters at the
door and on the phone. Now, we think of one more way. If you want to be a
champion for a candidate, you can be a champion by knocking on doors
for us, by making phone calls, or by making phone calls through online
platforms.
You can also use one-to-one social media
channels to let your friends know why the election matters so much to
you and why they should turn out and vote, or vote for the issues that
really matter to you.
Gardner: So, we're talking
about retail campaigning, but it's a bit more virtual. What’s
interesting though is that you can get a lot more data through the
interaction than you might if you were physically knocking on someone's
door.
Dyskant: The data is different. We're
starting to see a shift from demographic targeting. In
2000, we were
targeting on precincts. A little bit later, we were targeting on
combinations of demographics, on soccer moms, on single women, on single
men, on rural, urban, or suburban communities separately.
Moving to 2012, we've looked at everything that we
knew about a person and built individual-level predictive models, so
that we knew each person's individual set of characteristics made that
person more or less likely to be someone that our candidate would have
an engaging conversation through a volunteer.
Now, what
we're starting to see is behavioral characteristics trumping
demographic or even consumer data. You can put whiskey drinkers in your
model, you can put cat owners in your model, but isn't it a lot more
interesting to put in your model that fact that this person has an
online profile on our website and this is their clickstream? Isn't it
much more interesting to put into a model that this person is likely to
consume media via TV, is likely to be a
cord-cutter, is likely to be a social media trendsetter, is likely to view multiple channels, or to use both Facebook and media on TV?
That
lets us have a really broad reach or really broad set of interested
voters, rather than just creating an echo chamber where we're talking to
the same voters across different platforms.
Gardner:
So, over time, the analytics tools have gone from semi-blunt
instruments to much more precise, and you're also able to better target
what you think would be the right voter for you to get the right message
out to.
One of the things you mentioned that struck me
is the word "predictive." I suppose I think of campaigning as looking
to influence people, and that polling then tries to predict what will
happen as a result. Is there somewhat less daylight between these two
than I am thinking, that being predictive and campaigning are much more
closely associated, and how would that work?
Predictive modeling
Dyskant: When I think of
predictive modeling,
what I think of is predicting something that the campaign doesn't know.
That may be something that will happen in the future or it may be
something that already exists today, but that we don't have an
observation for it.
In the case of the role of polling,
what I really see about that is understanding what issues matter the
most to voters and how it is that we can craft messages that resonate
with those issues. When I think of predictive analytics, I think of how
is it that we allocate our resources to persuade and activate voters.
Over
the course of elections, what we've seen is an exponential trajectory
of the amount of data that is considered by predictive models. Even more
important than that is an exponential set of the use cases of models.
Today, we see every time a predictive model is used, it’s used in a
million and one ways, whereas in 2012 it might have been used in 50, 20,
or 100 sessions about each voter contract.
Gardner:
It’s a fascinating use case to see how analytics and data can be
brought to bear on the democratic process and to help you get messages
out, probably in a way that's better received by the voter or the
prospective voter, like in a retail or commercial environment. You don’t
want to hear things that aren’t relevant to you, and when people do
make an effort to provide you with information that's useful or that
helps you make a decision, you benefit and you respect and even admire
and enjoy it.
Dyskant: What I really want is for
the voter experience to be as transparent and easy as possible, that
campaigns reach out to me around the same time that I'm seeking
information about who I'm going to vote for in November. I know who I'm
voting for in 2016, but in some local actions, I may not have made that
decision yet. So, I want a steady stream of information to be reaching
voters, as they're in those key decision points, with messaging that
really is relevant to their lives.
I
want a steady stream of information to be reaching voters, as they're
in those key decision points, with messaging that really is relevant to
their lives.
I also want to listen to what
voters tell me. If a voter has a conversation with a volunteer at the
door, that should inform future communications. If somebody has told me
that they're definitely voting for the candidate, then the next
conversation should be different from someone who says, "I work in
energy. I really want to know more about the
Secretary’s energy policies."
Gardner: Just as if a salesperson is engaging with process, they use
customer relationship management (CRM),
and that data is captured, analyzed, and shared. That becomes a much
better process for both the buyer and the seller. It's the same thing in
a campaign, right? The better information you have, the more likely
you're going to be able to serve that user, that voter.
Dyskant:
There definitely are parallels to marketing, and that’s how we at
BlueLabs decided to found the company and work across industries. We
work with Fortune 100 retail organizations that are interested in how,
once someone buys one item, we can bring them back into the store to buy
the follow-on item or maybe to buy the follow-on item through that same
store’s online portal. How it is that we can provide relevant messaging
as users engage in complex processes online? All those things are
driven from our lessons in politics.
Politics is
fundamentally different from retail, though. It's a civic decision,
rather than an individual-level decision. I always want to be mindful
that I have a duty to voters to provide extremely relevant information
to them, so that they can be engaged in the civic decision that they
need to make.
Gardner: Suffice it to say that good quality comparison shopping is still good quality comparison decision-making.
Dyskant: Yes, I would agree with you.
Relevant and speedy
Gardner:
Now that we've established how really relevant, important, and powerful
this type of analysis can be in the context of the 2016 campaign, I'd
like to learn more about how you go about getting that analysis and
making it relevant and speedy across large variety of data sets and
content sets. But first, let’s hear more about BlueLabs.
Tell me about your company, how it started, why you started it, maybe a bit about yourself as well.
Dyskant: Of
the four of us who started BlueLabs, some of us met in the
2008 elections and some of us met during the
2010 midterms working at the
Democratic National Committee (DNC).
Throughout that pre-2012 experience, we had the opportunity as
practitioners to try a lot of things, sometimes just once or twice,
sometimes things that we
operationalized within those cycles.
Jumping
forward to 2012 we had the opportunity to scale all that research and
development to say that we did this one thing that was a different way
of building models, and it worked for in this congressional array. We
decided to make this three people’s full-time jobs and scale that up.
Moving
past 2012, we got to build potentially one of the fastest-growing
startups, one of the most data-driven organizations, and we knew that we
built a special team. We wanted to continue working together with
ourselves and the folks who we worked with and who made all this
possible. We also wanted to apply the same types of techniques to other
areas of social impact and other areas of commerce. This
individual-level approach to identifying conversations is something that
we found unique in the marketplace. We wanted to expand on that.
Increasingly,
what we're working on is this segmentation-of-media problem. It's this
idea that some people watch only TV, and you can't ignore a TV. It has
lots of eyeballs. Some people watch only digital and some people consume
a mix of media. How is it that you can build media plans that are aware
of people's cross-channel media preferences and reach the right
audience with their preferred means of communications?
Gardner:
That’s fascinating. You start with the rigors of the demands of a
political campaign, but then you can apply in so many ways, answering
the types of questions anticipating the type of questions that more
verticals, more sectors, and charitable organizations would want to be
involved with. That’s very cool.
Let’s go back to the
data science.
You have this vast pool of data. You have a
snappy analytics platform
to work with. But, one of the things that I am interested in is how you
get more people whether it's in your organization or a campaign, like
the
Hillary Clinton campaign, or the DNC to then be able to utilize that
data to get to these inferences, get to these insights that you want.
What
is it that you look for and what is it that you've been able to do in
that form of getting more people able to query and utilize the data?
Dyskant:
Data science happens when individuals have direct access to ask complex
questions of a large, gnarly, but well-integrated data set. If I have
30
terabytes
of data across online contacts, off-line contacts, and maybe a sample
of clickstream data, and I want to ask things like of all the people who
went to my online platform and clicked the password reset because they
couldn't remember their password, then never followed up with an e-mail,
how many of them showed up at a retail location within the next five
days? They tried to engage online, and it didn't work out for them. I
want to know whether we're losing them or are they showing up in person.
That type of question maybe would make it into a
business-intelligence (BI)
report a few months from that, but people who are thinking about what
we do every day, would say, "I wonder about this, turn it into a query,
and say, "I think I found something." If we give these customers phone
calls, maybe we can reset their passwords over the phone and reengage
them.
Human intensive
That's
just one tiny, micro example, which is why data science is truly a
human-intensive exercise. You get 50-100 people working at an enterprise
solving problems like that and what you ultimately get is a positive
feedback loop of self-correcting systems. Every time there's a problem,
somebody is thinking about how that problem is represented in the data.
How do I quantify that. If it’s significant enough, then how is it that
the organization can improve in this one specific area?
All
that can be done with business logic is the interesting piece. You need
very granular data that's accessible via query and you need reasonably
fast query time, because you can’t ask questions like that when you're
going to get coffee every time you run a query.
Layering
predictive modeling allows you to understand the opportunity for impact
if you fix that problem. That one hypothesis with those users who
cannot reset their passwords is that maybe those users aren't that engaged in
the first place. You fix their password but it doesn’t move the needle.
The
other hypothesis is that it's people who are actively trying to engage
with your server and are unsuccessful because of this one very specific
barrier. If you have a model of user engagement at an individual level,
you can say that these are really high-value users that are having this
problem, or maybe they aren’t. So you take data science, align it with
really smart individual-level business analysis, and what you get is an
organization that continues to improve without having to have at an
executive-decision level for each one of those things.
Gardner:
So a great deal of inquiry experimentation, iterative improvement, and
feedback loops can all come together very powerfully. I'm all for the
data scientist full-employment movement, but we need to do more than
have people have to go through data scientist to use, access, and
develop these feedback insights. What is it about the
SQL,
natural language, or APIs? What is it that you like to see that allows
for more people to be able to directly relate and engage with these
powerful data sets?
It's
taking that hypothesis that’s driven from personal stories, and being
able to, through a relatively simple query, translate that into a
database query, and find out if that hypothesis proves true at scale.
Dyskant:
One of the things is the product management of data schemas. So
whenever we build an analytics database for a large-scale organization I
think a lot about an analyst who is 22, knows
VLOOKUP,
took some statistics classes in college, and has some personal stories
about the industry that they're working in. They know, "My grandmother
isn't a native English speaker, and this is how she would use this
website."
So it's taking that hypothesis that’s driven
from personal stories, and being able to, through a relatively simple
query, translate that into a database query, and find out if that
hypothesis proves true at scale.
Then, potentially take
the result of that query, dump them into a statistical-analysis
language, or use database analytics to answer that in a more robust way.
What that means is that our schemas favor very wide schemas, because I
want someone to be able to write a three-line SQL statement, no joins,
that enters a business question that I wouldn't have thought to put in a
report. So that’s the first line -- is analyst-friendly schemas that
are accessed via SQL.
The next line is deep
key performance indicators (KPIs).
Once we step out of the analytics database, consumers drop into the
wider organization that’s consuming data at a different level. I always
want reporting to report on opportunity for impact, to report on whether
we're reaching our most valuable customers, not how many customers are
we reaching.
"Are we reaching our most valuable
customers" is much more easily addressable; you just talk to different
people. Whereas, when you ask, "Are we reaching enough customers," I
don’t know how find out. I can go over to the sales team and yell at
them to work harder, but ultimately, I want our reporting to facilitate
smarter working, which means incorporating model scores and predictive
analytics into our KPIs.
Getting to the core
Gardner:
Let’s step back from the edge, where we engage the analysts, to the
core, where we need to provide the ability for them to do what they want
and which gets them those great results.
It seems to
me that when you're dealing in a campaign cycle that is very spiky, you
have a short period of time where there's a need for a tremendous amount
of data, but that could quickly go down between cycles of an election,
or in a retail environment, be very intensive leading up to a holiday
season.
Do you therefore take advantage of the cloud
models for your analytics that make a fit-for-purpose approach to data
and analytics pay as you go? Tell us a little bit about your strategy
for the data and the analytics engine.
Dyskant:
All of our customers have a cyclical nature to them. I think that almost
every business is cyclical, just some more than others. Horizontal
scaling is incredibly important to us. It would be very difficult for us
to do what we do without using a cloud model such as
Amazon Web Services (AWS).
Also, one of the things that works well for us with
HPE Vertica
is the licensing model where we can add additional performance with
only the cost of hardware or hardware provision through the cloud. That
allows us to scale up our cost areas during the busy season. We'll
sometimes even scale them back down during slower periods so that we can
have those 150 analysts asking their own questions about the areas of
the program that they're responsible for during busy cycles, and then
during less busy cycles, scale down the footprint of the operation.
I
do everything I can to avoid aggregation. I want my analysts to be
looking at the data at the interaction-by-interaction level.
Gardner: Is there anything else about the
HPE Vertica OnDemand
platform that benefits your particular need for analysis? I'm thinking
about the scale and the rows. You must have so many variables when it
comes to a retail situation, a commercial situation, where you're trying
to really understand that consumer?
Dyskant: I
do everything I can to avoid aggregation. I want my analysts to be
looking at the data at the interaction-by-interaction level. If it’s a
website, I want them to be looking at clickstream data. If it's a retail
organization, I want them to be looking at point-of-sale data. In order
to do that, we build data sets that are very frequently in the billions
of rows. They're also very frequently incredibly wide, because we don't
just want to know every transaction with this dollar amount. We want to
know things like what the variables were, and where that store was
located.
Getting back to the idea that we want our
queries to be dead-simple, that means that we very frequently append
additional columns on to our transaction tables. We’re okay that the
table is big, because in a
columnar model, we can pick out just the
columns that we want for that particular query.
Then,
moving into some of the in-database machine-learning algorithms allows
us to perform more higher-order computation within the database and have
less data shipping.
Gardner: We're almost out
of time, but I wanted to do some predictive analysis ourselves. Thinking
about the next election cycle, midterms, only two years away, what
might change between now and then? We hear so much about machine
learning, bots, and advanced algorithms. How do you predict, Erek, the
way that big data will come to bear on the next election cycle?
Behavioral targeting
Dyskant:
I think that a big piece of the next election will be around moving
even more away from demographic targeting, toward even more behavioral
targeting. How is it that we reach every voter based on what they're
telling us about them and what matters to them, how that matters to
them? That will increasingly drive our models.
To do
that involves probably another 10X scale in the data, because that type
of data is generally at the clickstream level, generally at the
interaction-by-interaction level, incorporating things like Twitter
feeds, which adds an additional level of complexity and laying in
computational necessity to the data.
Gardner: It
almost sounds like you're shooting for
sentiment analysis on an
issue-by-issue basis, a very complex undertaking, but it could be very
powerful.
Dyskant: I think that it's heading in that direction, yes.
Gardner:
I am afraid we'll have to leave it there. We've been exploring how data
analysis services startup BlueLabs in Washington, DC helps presidential
campaigns better know and engage with potential voters. And we've
learned how organizations are working smarter by leveraging innovative
statistical methods and technologies, and in this case, looking at
two-way voter engagement in entirely new ways -- in this and in future
election cycles.
So,
please join me in thanking our guest, Erek
Dyskant, Co-Founder and Vice President of Impact at BlueLabs in
Washington. Thank you, Erek.
Dyskant: Thank you.
Gardner:
And a big thank you as well to our audience for joining us for this
Hewlett Packard Enterprise Voice of the Customer digital transformation
discussion.
I'm Dana Gardner, Principal Analyst at
Interarbor Solutions, your host for this ongoing series of HPE-sponsored
interviews. Thanks again for listening, and please come back next time.
Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.
Transcript
of a discussion on how data analysis services startup BlueLabs in
Washington helps presidential campaigns better know and engage with
potential voters. Copyright Interarbor Solutions, LLC, 2005-2016. All
rights reserved.
You may also be interested in: