Showing posts with label HAVEn. Show all posts
Showing posts with label HAVEn. Show all posts

Monday, December 01, 2014

Hortonworks Accelerates the Big Data Mashup between Hadoop and HP Haven

Transcript of a BriefingsDirect podcast on how companies are beginning to capture large volumes of data for past, present and future analysis capabilities.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we're focusing on how companies are adapting to the New Style of IT to improve IT performance, gain new insights, and deliver better user experiences — as well as better overall business results.

This time, we're coming to you directly from the recent HP Big Data 2014 Conference in Boston to learn directly from IT and business leaders alike how big data changes everything … for IT, for businesses and governments, as well as for you and me.

Our next innovation interview highlights how Hortonworks is now working with HP on the management of very large datasets. We'll hear how these two will integrate into more of the HP Haven family, but also perhaps into the cloud, and to make it easier for developers to access business intelligence (BI) as a service.
Fully experience the HP Vertica analytics platform...
Get the free HP Vertica Community Edition

Become a member of myVertica
To learn more about these ongoing big data trends, we are joined by Mitch Ferguson, Vice President of Business Development at Hortonworks. Welcome, Mitch.

Mitch Ferguson: Thank you, Dana. Pleasure to be here.

Gardner: We’ve heard the news earlier this year about HP taking a $50-million stake in Hortonworks, and Hortonworks' IPO plans. Please fill us in little bit about why Hortonworks and HP are coming together.

Ferguson: There are two core parts to that answer. One is that the majority of Hadoop came out of Yahoo. Hortonworks was formed by the major Hadoop engineers at Yahoo moving to Hortonworks. This was all in complete corporation with Yahoo to help evolve the technology faster. We believe the ecosystem around Hadoop is critical to the success of Hadoop and critical to the success of how enterprises will take advantage of big data.

Ferguson
If you look at HP, a major provider of technology to enterprises, both at the compute and storage level but the data management level, the analytics level, the systems management level, and the complimentary nature of Hadoop as part of the modern data architecture with the HP hardware and software assets provides a very strong foundation for enterprises to create the next generation modern data architecture.

Gardner: I'm hearing a lot about the challenges of getting big data into a single set or managing the large datasets.

Users are also trying to figure out how to migrate from SQL or other data stores into Hadoop and into HP Vertica. It’s a challenge for them to understand a roadmap. How do you see these datasets as they grow larger, and we know they will, in terms of movement and integration? How is that path likely to unfold?

Machine data

Ferguson: Look at the enterprises that have been adapting Hadoop. Very early adopters like eBay, LinkedIn, Facebook, and Twitter are generating significant amounts of machine data. Then we started seeing large enterprises, aggressive users of technology adopt it.

One of the core things is that the majority of data being created everyday in an enterprise is not coming from traditional enterprise resource planning (ERP) or customer relationship management (CRM) financial management systems. It's coming from websites like Clickstream, data, log data, or sensor, data. The reason there is so much interest in Hadoop is that it allows companies to cost effectively capture very large amounts of data.

Then, you begin to understand patterns across semi-structured, structured, and unstructured data to begin to glean value from that data. Then, they leverage that data in other technologies like Vertica, analytics technologies, or even applications or move the data back into the enterprise data warehouse.

As a major player in this Hadoop market, one of the core tenets of the company was that the ecosystem is critical to the success of Hadoop. So, from day one, we’ve worked very closely with vendors like Microsoft, HP, and others to optimize how their technologies work with Hadoop.

SQL has been around for a long time. Many people and enterprises understand SQL. That's a critical access mechanism to get data out of Hadoop. We’ve worked with both HP and Microsoft. Who knows SQL better than anyone? Microsoft. We're trying to optimize how SQL access to Hadoop can be leveraged by existing tools that enterprises know about, analytics tools, data management tools, whatever.

That's just one way that we're looking at leveraging existing integration points or access mechanisms that enterprises are used to, to help them more quickly adopt Hadoop.
The technology like Hadoop is optimized to allow an enterprise to capture very, very large amounts of that data.

Gardner: But isn’t it clear that what happens in many cases is that they run out of gas with a certain type of database and that they seek alternatives? Is that not what's driving the market for Hadoop?

Ferguson: It's not that they're running out of gas with an enterprise data warehouse (EDW) or relational database. As I said earlier, it's the sheer amount of data. By far, the majority of data is not coming from those traditional ERP,  CRM, or transactional systems. As a result, the technology like Hadoop is optimized to allow an enterprise to capture very, very large amounts of that data.

Some of that data may be relevant today. Some of that data may be relevant three months or six months from now, but if I don't start capturing it, I won't know. That's why companies are looking at leveraging Hadoop.

Many of the earlier adopters are looking at leveraging Hadoop to drive a competitive advantage, whether they're providing a high level of customer service, doing things more cost-effectively than their competitors, or selling more to their existing customers.

The reason they're able to do that is because they're now being able to leverage more data that their businesses are creating on a daily basis, understanding that data, and then using it for their business value.

More than size

Gardner: So this is an alternative for an entirely new class of data problem for them in many cases, but there's more than just the size. We also heard that there's interest in moving from a batch approach to a streaming approach, something that HP Vertica is very popular around.

What's the path that you see for Hortonworks and for Hadoop in terms of allowing it to be used in more than a batch sense, perhaps more toward this streaming and real-time analytics approach?

Ferguson: That movement is under way. Hadoop 1.0 was very batch-oriented. We're now in 2.0 and it's not only batch, but interactive and also real-time, and there's a common layer within Hadoop.  Hortonworks is very influential in evolving this technology. It's called YARN. Think of it as a data operating system that is part of Hadoop, and it sits on top of the file system.

Via YARN, applications or integration points, whether they're for batch oriented applications, interactive integration, or real-time like streaming or Spark, are access mechanisms. Then, those payloads or applications, when they leverage Hadoop, will go through these various batch interactive, real-time integration points.

They don't need to worry about where the data resides within Hadoop. They'll get the data via their batch real-time interactive access point, based on what they need. YARN will take advantage of moving that data in and out of those applications. Streaming is just one way of moving data into Hadoop. That's very common for sensor data. It’s also a way to move it out. SQL is a way, among others, to move data.
Fully experience the HP Vertica analytics platform...
Get the free HP Vertica Community Edition

Become a member of myVertica
Gardner: So this is giving us choice about how to manage larger scales of data. We're seeing choice about the way in which we access that data. There's also choice around the type of the underlying infrastructure to reduce costs and increase performance. I am thinking about in-memory or columnar.

What is there about the Hadoop community and Hortonworks, in particular, that allows you to throw the right horsepower at the problem?

Ferguson: It was very important, from Hortonworks perspective from day one, to evolve the Hadoop technology as fast as possible. We decided to do everything in open source to move the technology very quickly and leverage the community effective open-source, meaning lots of different individuals helping to evolve this technology fast.

The ability for the ecosystem to easily and optimally integrate with Hadoop is important. So there are very common integration points. For example, for systems management, there is the Ambari Hadoop services integration point.

Whether it's an HP OpenView or System Center in the Microsoft world, that allows it to leverage, manage, or monitor Hadoop along with other IT assets that those management technologies integrate with.

Access points

Then there's SQL's access via Hive, an access point to allow any technology that integrates or understands SQL to access Hadoop.

Storm and Spark are other access points. So, common open integration points well understood by the ecosystem are really designed to help optimize how various technologies at the virtualization layer, at the operating system layer, data movement, data management, access layer can optimally leverage Hadoop.

Gardner: One of the things that I hear a lot from folks who don't understand yet how things will unfold, is where data and analytics applications align with the creation of other applications or services, perhaps in a cloud setting like a platform as a service (PaaS).

It seems to me that, at some point, more and more application development will be done through PaaS with an associated or integrated cloud. We're also seeing a parallel trajectory here with the data, along the same lines of moving from traditional systems of record into relational, and now into big data and analytics in a cloud setting. It makes a lot of sense.
What a number of people are doing with this concept is called the data lake. They're provisioning large Hadoop clusters on prem, moving large amounts of data into this data lake.

I talked to lot of people about that. So the question, Mitch, is how do we see a commingling and even an intersection between the paths of PaaS in general application development and PaaS in BI services, or BI as a service, somehow relating?

Ferguson: I'll answer that question in two ways. One is about the companies that are using Hadoop today, and using it very aggressively. Their goal is to provide Hadoop as a service, irrespective of whether it's on premises or in the cloud.

Then we'll talk about what we see with HP, for example, with their whole cloud strategy, and how that will evolve into a very interesting hybrid opportunity and maybe pure cloud play.

When you think about PaaS in the cloud, the majority of enterprise data today is on premises. So there's a physics issue of trying to run all of my big data in the cloud. As a result, what a number of people are doing with this concept is called the data lake. They're provisioning large Hadoop clusters on premises, moving large amounts of data into this data lake.

That's providing data as a service to those business units that need data in Hadoop -- structured, semi-structured, unstructured for new applications, for existing analytics processes, for new analytics processes -- but they're providing effectively data as a service, capturing it all in this data lake that continues to evolve.

Think about how companies may want to leverage then a PaaS. It's the same thing on premises. If my data is on premises, because that's where the physics requires that, I can leverage various development tools or application frameworks on top of that data to create new business apps. About 60 percent of our initial sales at Hortonworks are new business applications by an enterprise. It’s business and IT being involved.

Leveraging datasets

Within the first five months, 20 percent of those customers begin to migrate to the data-lake concept, where now they are capturing more data and allowing other business entities within the company to leverage these datasets for additional applications or additional analytics processes. We're seeing Hadoop as a service on premises already. When we move to the cloud, we'll begin to see more of a hybrid model.

We are already starting to see this with one of Hortonworks large partners, where you put archive data from on premises to store in the cloud at low-cost storage. I think HP will have that same opportunity with Hadoop and their cloud strategy.

Already, through an initiative at HP, they're providing Hadoop as a service in the cloud for those entities that would like to run Hadoop in a managed service environment.
We're seeing Hadoop as a service on prem already. When we move to the cloud, we'll begin to see more of a hybrid model.

That’s the first step of HP beginning to provide Hadoop in a managed service environment off premises. I believe you'll begin to see that migrate to on-prem/off-prem integration in a hybrid opportunity in the some companies as their data moves off prem. They just want to run all of their big-data services or have Hadoop as a service running completely in HP cloud, for example.

Gardner: So, we're entering in an era now where we're going to be rationalizing how we take our applications as workloads, and continue to use them either on premises, in the cloud, or hybrid. At the same time, over on the side, we're thinking along the same lines architecturally with our data, but they're interdependent.

You can’t necessarily do a lot with the data without applications, and the applications aren’t as valuable without access to the analytics and the data. So how do these start to come together? Do you have a vision on that yet? Does HP have a vision? How do you see it?

Ferguson: The Hadoop market is very young. The vision today is that companies are implementing Hadoop to capture data that they're just letting fall on the floor. Now, they're capturing it. The majority of that data is on premises. They're capturing that data and they're beginning to use it in new a business applications or existing analytics processes.
Fully experience the HP Vertica analytics platform...
Get the free HP Vertica Community Edition

Become a member of myVertica
As they begin to capture that data, as they begin to develop new applications, and as vendors like HP working in combination with Hortonworks provide the ability to effectively move data from on premises to off premises and provide the ability to govern where that data resides in a secure and organized fashion, you'll begin to see much tighter integration of new business or big-data applications being developed on prem, off prem, or an integration of the two. It won't matter.

Gardner: Great. We've been learning quite a bit about how Hortonworks and Hadoop are changing the game for organizations as they seek to use all of their data and very massive datasets. We’ve heard that that aligns with HP Vertica and HP Haven's strategy around enabling more business applications for more types of data.

With that, I'd like to thank our guest, Mitch Ferguson, Vice President of Business Development at Hortonworks. Thank you, Mitch.

Ferguson: Thank you very much, Dana.

Gardner: This is Dana Gardner. I'd like our audience for joining us for a new style of IT discussion coming to you from the recent HP Big Data 2014 Conference in Boston. Thanks to HP for sponsoring our discussion, and don't forget to come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast on how companies are beginning to capture large volumes of data for past, present and future analysis capabilities. Copyright Interarbor Solutions, LLC, 2005-2014. All rights reserved.

You may also be interested in:

Friday, November 14, 2014

HP Analytics Blazes New Trails in Examining Business Trends From Internal Data

Transcript of a BriefingsDirect podcast on new trends in gathering and analyzing large amount of data, both traditional structured data and user-generated data.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we're focusing on how companies are adapting to the new style of IT to improve IT performance and deliver better user experiences, as well as better business results.

This time, we're coming to you directly from the recent HP Big Data 2014 Conference in Boston. We're here to learn directly from IT and business leaders alike how big data, cloud, and converged infrastructure implementations are supporting their goals.
Fully experience the HP Vertica analytics platform...
Become a member of myVertica
Our next discussion is a thought leadership insight discussion with an executive in the Analytics Group at HP, to learn how HP analyzes its own data and then provide examples of how other organizations can benefit from these examples as well.

Please join me in welcoming our special guest, Pramod Singh, Director of Digital and Big Data Analytics at HP Analytics, based in Bangalore, India. Welcome, Pramod.

Pramod Singh: Thank you, Dana. Thank you very much.

Gardner: First, tell us a little bit about the Analytics Group at HP, what you do, and what’s the charter of your organization.

Singh
Singh: We have a big analytics organization in HP, it’s called Global Analytics and serves the analytics for most of HP. About 80 to 90 percent of the analytics happening inside of HP comes out of this eco-system. We do analytics across the entire food chain at HP, which includes the supply chains, marketing, and sales.

What I personally lead is an organization called Digital Analytics, and we are responsible for doing analytics across all digital properties for HP. That includes the eCommerce, social media, search, and campaign analytics. Additionally, we also have a Center of Excellence for Big Data Analytics, where we're using HP’s big-data technologies, which is that framework called HAVEn, to help develop big-data solutions for HP customers, as well as internal HP.

Gardner: Obviously, HP is a very large global company. What sort of datasets are we talking about here? What’s the volume that you're working with?

Data explosion

Singh: As you know, a data explosion is happening. On one end, HP has done a very good job over the last six to seven years of getting most of their enterprise data into something called an enterprise data warehouse. We're talking about close to two petabytes of data, which is structured data.

The great part of this journey is that we have taken data from 700-800 different data marts into one enterprise data warehouse over the last three to four years. A lot of data that is not part of the enterprise is also becoming an important part of making the business decisions.

A lot of that data I personally deal with in the digital space, is what we call the human-generated data, the social media data, which no enterprise owns. It’s open for anybody to go use that. What I've started to see is that, on one hand, we've done a really good job of getting data in the enterprise and getting value out of it.

We've also started to analyze and harvest the data that is out in the open space. It could be blogs, Twitter feeds, or Facebook data. Combining that is what’s bringing real business value.

Gardner: How large is your organization and how long have you been there?

Singh: The Global Analytics organization is more than 1,000 people spread through different parts of the world. A big chunk of that is in Bangalore, India, but we have folks in the US and the UK. We have a center in Guadalajara, Mexico and couple of other locations in India. My particular organization is close to 100 people.
We've also started to analyze and harvest the data that is out in the open space. It could be blogs, Twitter feeds, or Facebook data. Combining that is what’s bringing real business value.

Gardner:  How long have you been there, and what’s your background? How did you end up at HP Analytics?

Singh: I have a PhD in pure mathematics, and before that I had an MBA in marketing. It's a little bit of an awkward mix there, and got in into analytics space in mid '90s working for Walmart.

I built out Walmart’s Assortment Planning System in late '90s and then came to HP in 2000 leading an advance data-mining center in Austin, Texas. From there I evolved into doing e-business analytics for few years and then moved to customer knowledge management. I spent five years in IT developing analytics platform.

About year-and-a-half ago, I got an opportunity to lead the big-data practice for this organization called Global Analytics. In five years, they had gone from five people to more than 1,000 people, and that intrigued me a lot. I was able to take the opportunity and move to India to go lead that team.

More insights

Gardner: Pramod, when we look back into this data, do you gain more insights knowing what you're looking for, or not knowing what you're looking for? What kind of insights were the unexpected consequences of your putting together this type of data infrastructure and then applying big-data analytics to it?

Singh: We deal with that day-in and day-out. I’ll give you a couple of examples there. This is something that happened about three or four years ago with HP. We were looking at a problem that was a classic problem in marketing to the US small and medium-sized business organizations (SMBs). We had a fixed budget for marketing, and across the US, there are more than 20 million SMBs. The classic definition of an SMBs is any business with 100-500 employees.

HP had an install base of a small part of that. We realized that particular segment of a SMBs is squeezed between a classic consumer, where you can do mass marketing, such as TV advertising, and an enterprise, where you can actually put bodies, your people who have relationship. SMBs are squeezed in between those two extremes.
The question then became what do we do with that? Again, when you do data mining and analytics, you may not know where this will lead you.

On one hand, you can't reach out to every single one of them. It’s just way too expensive to do that. On the other hand, if you try to go do the marketing, you don’t get the best out of it.

We were starting to work on something like that. I was approached by a vice president in marketing who said revenues are declining and they had a limited marketing budget. They didn’t know what to do.

This is where one of those unexpected things came in. I said, "Let’s see in that install base whether there are different segments of customers that are behaving differently." That led us on kind of a journey where we said, well, "How do we start to do that right? Let’s figure out what are the different attributes of data that I can capture."

On one hand, if you look at SMBs, you can capture who they are, what industry segment they're in, how many employees they have, where are they based, who the CEO is. It's what we call firmographics.

On the other hand, you have classes of data involving their interaction with HP. It could be things like how many PCs or servers they bought, how long ago did they buy it, how much money they spent, the whole transactional aspect of it.

Then, there are some things that are derived attributes. You may be able to derive that in the last one year they came to us four times. What interaction did we have on the website,? For example, did they come to us through a web channel? If they did, how many email offers were sent to them? How many of those were clicked? How many of those converted? Those are the classes of data that we could capture.

The question then became what do we do with that? Again, when you do data mining and analytics, you may not know where this will lead you.

Mathematical modeling

We thought that maybe there are different classes of customers. We pulled our data together and started to do mathematical modeling. There are techniques called clustering, analytical techniques called K-Means, and things like that. We started to get some results and to analyze them. In this type of situation, we have to be careful, because there are some things that may look mathematically correct, but may not have a real business value behind it.

Once we started to look at those things, we went through multiple iterations. We realized that we were not getting segments or clusters that were very distinct. One day, I was driving home in Austin, and I said, "You know what? Who they are I don’t control, but as far as what they're doing with HP we have a reasonably good understanding."

So we started to do clustering based only on those attributes, and that’s where an "aha" moment came. We started to find these clusters, which we call segments, where we eventually found a cluster which was that 7 to 8 percent of the population that brought in 45 percent of revenue.

The marketers started to say that this was a gold mine. That’s what we never expected to happen. We put together a structure. Once we figured out these four or five clusters, we tried to figure out why they were clustered together. What’s common?
Fully experience the HP Vertica analytics platform...
Become a member of myVertica
We built out a primary research thing, where we took a random sample out of each one of those clusters, interviewed those guys, and were able to build a very good profile of what these segments were.

There are 20 million SMBs in US, and we are able to build a model to predict which of these prospects are similar to the clusters we had. That’s where we were able to find customers that looked like our most profitable customers, which we ended up calling Vanguards. That resulted into a tremendous amount of  a dollar increment for HP. It's a good example of what you talked when you find unexpected things.

We just wanted to analyze data. It led us to a journey and ended up finding a customer group we weren't even aware of. Then, we could build marketing strategy to actually go target those and get some value out of it.

Gardner: At the Big Data Conference, I've spoken to other organizations who are creating an analytics capability and then exposing that to as many of their employees as possible, hoping for this very sort of unexpected positive benefit. Is there a way that you're taking your analytics either through visualization or tools and then allowing a larger population within HP to experiment with it?

Singh: We're trying to democratize the analytics as much as we can. One thing we're realizing is that to get the full value, you don't want data to stay in silos. So there are a couple of things you have to do. In terms of building out an ecosystem where you have good set of motivated people and where you can give them a career path, we have created this organization called Global Analytics. You get a critical mass of people who challenge each other, learn from each other, and do lot of analytics.

But also it’s very important that on the consumption side of it, you have people who are analysts and understand analytics and get the best value out of it. So they try to create that ecosystem. We have seen both ends of it.

Good career path

If you just give them to one data miner or analytics person in one team, sometimes the person does not find an ecosystem to challenge himself or herself. We're trying to do it on both sides of the fence, so that we can provide people with a good career path.

Hiring these folks is not easy. Once you've hired them, retaining them is not easy. You want to make sure to create an ecosystem where it’s challenging enough for these people to work. It also has to be an ecosystem where you continually challenge them and keep training them.

The analytical techniques are evolving. When I started doing it, things were stable for years. Now, the newer class of data is coming in, newer techniques are coming in, and newer classes of business problems are coming in. It’s very important that we keep the ecosystem going. So we try to do it on both sides.

Gardner: Very interesting. HP, of course, has its own line of products for big-data analysis. You're such a large global enterprise that you're doing lots of analysis, as any good business should, but you're also being asked to show how this works. Are there some specific use cases that demonstrate for other enterprises what you've learned yourselves.
You want to make sure to create an ecosystem where it’s challenging enough for these people to work. It also has to be an ecosystem where you continually challenge them and keep training them.

Singh: There are several that we can talk about. One is in a social media space. I briefly talked about that. My career evolved of doing analytics in what I call "data inside the enterprise." But, over the last couple of years, we started to go look at data outside the enterprise.

Recently we went and looked at a bank. We were able to harvest data from the Internet, publicly available data like Glassdoor, for example. Glassdoor is a website where employees of a company can put their feedback, talk about the company, and rate things.

We were presenting it to the executives of this particular bank and we were able to get all the data and tell them the overall employee morale. We figured out that the life-work balance for the employees wasn't very good.

The main component that the employees weren't happy about was their leave policy and their vacation policy. We drilled down and figured out that the bankers seemed to be fairly happy, but the IT guys and analysts weren't very happy. Again, this is one example where we didn't ask for a line of data from the customer. This data is publicly available. You and I, or anybody else, can go get it. I can do that same analysis for HP or any other company.

That’s where I believe the classes of analytics we're doing is changing. A lot of times, your competitive differentiator is the ability to do things with that data. Data is a corporate asset and it will be, but this class of what we call the user-generated data is changing analytics as a whole. The ability to go harvest it and, more importantly, get value out of it will be the competitive differentiator.

Gardner: Any other use cases that demonstrate the power of a particular type of platform, let’s say Vertica in HAVEn, where you've got the power of a columnar architecture and you've got the ability to bring in unstructured data from Autonomy? Maybe there are a couple of use cases that demonstrate the unique attributes of HAVEn when it comes to inclusivity and the comprehensive nature of information today?

Game changer

Singh: Let me talk about a couple of the things that happened in the HAVEn ecosystem. One of the main work forces in HAVEn is our massively parallel database called Vertica. In addition to being a database where we can ingest data very quickly, ingest large volumes of data, and run query performance, the game-changer for us as an analytics practitioner for me has been ability to do analytics in database.

If I look at my career over the last 20-22 years, most of the times what happens in the analytics space is that you have data residing in a database or an enterprise data warehouse. When you want to build a model, you take the data out and use an analytics platform like SAS, R, or SPSS. You do something there and you either bring the data back into the environment or you run the models and publish them out.

What Vertica has done that's unique is given us a framework, and through the UDEF framework, we could build a data mining model and run it directly on a database engine and take the output out.

An example we took to HP Discover a couple of months ago was trying to predict a failure of a machine before the actual failure happens. HP has these big machines and big printers, which are very expensive.

Like lot of high-end devices these days, they send out a lot of data. They send out data about when you're using a machine. The sensors send out a lot of information, maybe the pressure of the valves, the kind of the temperature they're in, the kind of throughput they're giving you, or the number of pages you've printed.
Looking at each components of failure, we could predict with a certain probability when the machine will fail and with a certain probability.

Also, they give you data on the events when the machine was not performing optimally or actually failed. We were able to go ingest all that data, put the data onto in the Vertica  platform, and build predictive models using open source R language. We built a model that can predict the failure of a machine.

Looking at each components of failure, we could predict with a certain probability when the machine will fail and with a certain probability, so our service reps can actually be proactive and not wait for the machine to fail. That's one example of doing an in-database data mining using Vertica.

Another example used more components around the social-media space. One of the problems in the social-media space, and I think you guys are probably familiar with this, is finding influencers.

I gave a talk yesterday around figuring out how you do that. There are classical ways if you go by the uni-dimensional thing around the number of followers or retweets you have. Barack Obama or Lady Gaga would be big influencers, but Barack Obama, for cloud computing for HP, may not be a very big influencer.

So you build those classes of algorithms. My team has actually built out three patented algorithms to figure out how to identify influencers in the space. We've actually built out a framework where we can source that data from the social-media space, drop it into a Hadoop kind of an environment.

We use Autonomy to enrich and put some sentiments to it and then drop the data into the Vertica environment. In that Vertica environment, you run the compressed algorithms and get an output. Then, you can score and predict who is the influencer for the topic you are looking for.

Influencers

I gave the example of Barack Obama, in general a big influencer, but he is not influencer for all topics. Maybe in politics or the US government he's a big influencer, but not for cloud computing. Influencer is also a function of time. Somebody like Diego Maradona probably was a big influencer in soccer in the ’90s, but in 2014, not that much.

You have to make sure that you can incorporate those as part of the logic of your algorithm. We've been able to use the multiple components of HAVEn and build out a complete framework where we can tell numerically who the main influencers are and how influential they are. For example, if you get a score of 93 and I get a score of 22, you are almost four times as influential as I am.

Gardner: For other organizations that are interested in learning more about how HP Analytics is operating and maybe learning from your example, are there any resources or websites we can go to, where you are providing more information about HP Analytics?
You have to make sure that you can incorporate those as part of the logic of your algorithm.

Singh: Definitely. We work through our partners in Enterprise Services. We have our own website as well. There are multiple ways that you can approach us. You can talk to the Vertica sales team and they can connect to us. As I said, we do analytics for all of HP and for select customers. We do not have a direct sales arm to us. We work through our partners in Enterprise Services, as well as with software team.
Fully experience the HP Vertica analytics platform...
Become a member of myVertica
Gardner: We've been talking about how HP Analytics, based in Bangalore, India, and with organizational branches around the world, is examining internal data at HP, but also blazing trails in terms of how to better provide analytics to a large enterprise and SMB, whether it's HP or any other enterprise.

So a big thank you to our guest, Pramod Singh, Director of Digital and Big Data Analytics at HP Analytics. Thank you so much.

Singh: Thank you, Dana, thank you very much.

Gardner: And a big thank you also to our audience for joining us for this special new style of HP Discussion coming to you directly from the recent HP Big Data 2014 Conference in Boston. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and do come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast on new trends in gathering and analyzing large amount of data, both traditional structured data and user-generated data. Copyright Interarbor Solutions, LLC, 2005-2014. All rights reserved.

You may also be interested in:

Friday, August 22, 2014

Hybrid Cloud Models Demand More Infrastructure Standardization, Says Global Service Provider Steria

Transcript of a sponsored BriefingsDirect podcast on planning and preparing for a journey to cloud.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we’re focusing on how companies are adapting to the new style of IT to improve IT performance and deliver better user experiences, and business results. This time, we’re coming to you directly from the recent HP Discover 2013 Conference in Barcelona.

We’re here to learn directly from IT and business leaders alike how big data, mobile, and cloud, along with converged infrastructure are all supporting their goals.

Our next innovation case study interview highlights how European IT services provider Steria is exploring cloud standards and the use of cloud across hybrid models. We welcome on this subject Eric Fradet, Industrialization Director at Steria in Paris. Welcome, Eric.

Eric Fradet: Thank you, I’m glad to be here.

Gardner: For those of our audience who may not be overly aware of Steria, tell us a little bit about what you do, where you do it, and how your business is going?

Fradet: Steria is a 40-year-old service provider company, mainly based in Europe, with a huge location in India and also Singapore. We provide all types of services related to IT, starting from infrastructure management to application management. We help to develop and deploy new IT services for all our customers.

Gardner: There’s a lot of interest these days in trying to decide to what degree you should have a cloud infrastructure implementation on-premises, with some sort of a hosting provider, or perhaps going fully to a service-delivery model vis-à-vis a software-as-a-service (SaaS) or cloud providers. How are your activities at Steria helping you better deliver this choice to your customers?

Fradet: That change may be quicker than expected. So, we must be in a position to manage the services wherever they’re from. The old model of saying that we’re an outsourcer or on-premises service provider is dead. Today, we’re in a hybrid world and we must manage that type of world. That must be done in collaboration with partners, and we share the same target, the same ambition, and the same vision.

Gardner: We’re also seeing quite a bit of discussion about which platforms, which standards, and which type of cloud infrastructure model to follow. For your customers or prospects, how do you go to them now, when we’re still in a period of indecision? What are your recommendations? What do you think should happen in terms of the standardization of a cloud model?

Benefit, not a pain

Fradet: Roughly, I assume at first that the cloud must not be seen as disruptive by our customers. Cloud is here to accompany your transformation. It must be a benefit for them, and not a pain.

Fradet
A private solution should be the best as a starting point for some customers. The full public solution should be a target. We’re here to manage their journey and to define with the customer what is the best solution for the best need.

Gardner: And in order for that transition from private to public or multiple public or sourced-infrastructure support, a degree of standardization is required. Otherwise, it's not possible. Do you have a preferred approach to standardization? Are you working closely with HP? How do you think you will allow for a smooth transition across a hybrid spectrum?

Fradet: The choice of HP as a partner was based on two main criteria. First of all, the quality of the solution, obviously, but there are multiple good solutions on the market. The second one is the capacity with HP to have a smooth transition, and that means getting to the industrialization benefits and the economic benefits while also being open and interconnected with existing IT systems.

That's why the future model is quite simple. Our work is to know we have on-premises and physical remaining infrastructure. We will have some private-cloud solutions and multiple public clouds, as you mentioned. The challenge is to have the right level of governance, and to be in a position to move the workload and adjust the workloads with the needs.
We continue to invest deeply in ITSM because ITSM is service management.

Gardner: Of course, once you've been able to implement across a spectrum of hosting possibilities, then there is the task of managing that over time, not just putting it there, but being able to govern and have control. Is there anything about the HP portfolio, or what you’re doing in particular, that you think is important, as we try to move beyond strictly implementation, but into going operations?

Fradet: With HP, we have a layer approach which is quite simple. First of all, if you want to manage, you must control, as you mentioned. We continue to invest deeply in IT Service Management (ITSM) because ITSM is service governance. In addition, we have some more innovative solutions based on the last version of  Cloud Services Automation (CSA). Control, automate, and report remain as key whatever the cloud or non-cloud infrastructure.

Gardner: Of course, another big topic these days is big data. I would think that a part of the management capability would be the ability to track all the data from all the systems, regardless of where they’re physically hosted. Do you have a preference or have you embarked on a big-data platform that would allow you to manage and monitor IT systems regardless of the volume, and the location?

Fradet: Yes, we have some very interesting initiatives with HP around HAVEn, which is obviously one of the most mature big-data platforms. The challenge for us is to transform a technologically wonderful solution into a business solution. We’re working with our business units to define use-cases that are totally tailored and adjusted for the business, but big data is one of our big challenges.

Traditional approach

Gardner: Have you been using a more traditional data-warehouse approach, or are you not yet architecting the capability? Are you still in a proof-of-concept stage?

Fradet: Unfortunately, we have hundreds of data-warehouse solutions, which are customer-dedicated, starting from very old-fashioned level to operational key performance indicators (KPI) to advanced business intelligence (BI).

The challenge now is really to design for what will be top requirements for the data warehouse, and you know that there is a mix of needs in terms of data warehouses. Some are pure operational KPIs, some are analytics, and some are really big data needs. To design the right solution for the customer remains a challenge. But, we’re very confident that with HAVEn, sometime in 2014, we will have the right solution for those issues.

Gardner: Lastly, Eric, the movement toward cloud models for a lot of organizations is still in the planning stages. They are mindful of the vision, but they have also IT  housecleaning to do internally. Do you have any suggestions as to how to properly modernize, or move toward a certain architecture that would then give them a better approach to cloud and set them up for less risk and less disruption? What are some observations that you have had for how to prepare for moving toward a cloud model?
Cloud can offer many combinations or many benefits, but you have to define as a first step your preferred benefits.

Fradet: As with any transformation program, the cloud’s eligibility program remains key. That means we have to define the policy with the customer. What is their expectation -- time to market, cost saving, to be more efficient in terms of management?

Cloud can offer many combinations or many benefits, but you have to define as a first step your preferred benefits. Then, when the methodology is clearly defined, the journey to the cloud is not very different than from any other program. It must not be seen as disruptive, keeping in mind that you do it for benefits and not only for technical reasons or whatever.

So don't jump to the cloud without having strong resources below the cloud.

Gardner: Please join me in thanking our guest. We've been discussing transition to cloud with Eric Fradet, Industrialization Director at Steria in Paris. Steria is a large and leading European IT services provider. Thank you.

Fradet: Thank you.

Gardner: And also thank you to our audience as well for joining us for this special new style of IT discussion coming to you directly from the HP Discover 2013 Conference in Barcelona.

I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP sponsored discussions. Thanks again for joining, and come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a sponsored BriefingsDirect on planning and preparing for a journey to the cloud. Copyright Interarbor Solutions, LLC, 2005-2014. All rights reserved.

You may also be interested in:

Wednesday, July 23, 2014

UK Solutions Developer Systems Mechanics Uses HP HAVEn for BI, Streaming and Data Analysis

Transcript of a sponsored BriefingsDirect podcast on making telcos more responsive to customer and operators by using big-data tools and analysis.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we’re focusing on how companies are adapting to the new style of IT to improve IT performance and deliver better user experiences, and business results. This time, we’re coming to you directly from the recent HP Discover 2013 Conference in Barcelona.

We’re here to learn directly from IT and business leaders alike how big data, mobile, and cloud -- along with converged infrastructure -- are all supporting their goals.

Our next innovation case study focuses on how Systems Mechanics Limited is using elements of the HP HAVEn portfolio, to improve how their products can perform in processes such as business intelligence (BI), analytics streaming, and data analysis.

To learn more, we’re here with Andy Stubley, Vice President of Sales and Marketing at Systems Mechanics, based in London. Welcome, Andy.

Andy Stubley: Hello.

Gardner: So tell us a bit about what you do at Systems Mechanics. It sounds like a very interesting organization. You've been doing a lot with data, and monetizing that in some very compelling ways.

Stubley: Yes, indeed. System Mechanics is a UK-based organization. We’re principally a consultancy and a software developer. We’ve been working in the telco space for the last 10-15 years. We also have a history in retail and financial services.

Stubley
The focus we've had recently and the products we’ve developed into our Zen family are based on big data, particularly in telcos, as they evolve from principally old analog conversations into devices where people have smartphone applications and data becomes ever more important.

All that data and all those people connected to the network cause a lot more events that need to be managed, and that data is both a cost to the business and an opportunity to optimize the business. So we have a cost reduction we apply and a revenue upside we apply as well.

Quick example

Gardner: Can you give us a quick example, just so our listeners who might not be familiar with all of these technical terms and instances of use would understand? What’s a typical way a telco will use Zen, and what would they do with it?

Stubley: For a typical way, let’s take a scenario where you’re looking in network and you can’t make a phone call. Two major systems are catching that information. One is a fault-management system that’s telling you there is a fault on the network and it reports that back to the telecom itself.

The second one is the performance management system. That doesn’t specify faults basically, but it tells you if you’re having things like thresholds being affected, which may have an impact on performance every time. Either of those can have an impact on your customer, and from a customer’s perspective, you might also be having a problem with the network that isn’t reported by either of the systems.

We’re finding that social media is getting a bigger play in this space. Why is that? Now, particular the younger populations with consumer-based telcos, mobile telcos particularly, if they can’t get a signal or they can’t make a phone call, they get onto social media and they are trashing the brand.

They’re making noise. A trend is combining fault management and performance management, which are logical partners with social media. All of a sudden, rather than having a couple of systems, you have three.

In our world, we can put 25 or 30 different data sources on to a single Zen platform. In fact, there is no theoretical limit to the number we could, but 20 to 30 is quite typical now. That enables us to manage all the different network elements, different types of mobile technologies, LTE, 3G, and 2G. It could be Ericsson, Nokia, Huawei, ZTE, or Alcatel-Lucent. There is an amazing range of equipment, all currently managed through separate entities. We’re offering a platform to pull it all together in one unit.

The other way I tend to look at it is that we’re trying to turn the telcos into how you might view a human. We take the humans as the best decision-making platforms in the world and we probably still could claim that. As humans, we have conscious and unconscious processes running. We don’t think about breathing or pumping our blood around our system, but it’s happening all the time.
We use a solution with visualization, because in the world of big data, you can’t understand data in numbers.

We have senses that are pulling in massive amount of information from the outside world. You’re listening to me now. You’re probably doing a bunch of other things while you are tapping away on a table as well. They’re getting senses of information there and you are seeing, and hearing, and feeling, and touching, and tasting.

Those all contain information that’s coming into the body, but most of the activity is subconscious. In the world of big data, this is the Zen goal, and what we’re delivering in a number of places is to make as many actions as possible in a telco environment, as in a network environment, come to that automatic, subconscious state.

Suppose I have a problem on a network. I relate it back to the people who need to know, but I don’t require human intervention. We’re looking a position where the human intervention is looking at patterns in that information to decide what they can do intellectually to make the business better.

That probably speaks to another point here. We use a solution with visualization, because in the world of big data, you can’t understand data in numbers. Your human brain isn’t capable of processing enough, but it is capable of identifying patterns of pictures, and that’s where we go with our visualization technology.

Gather and use data

Gardner: So your clients are able to take massive amounts of data and new types of data from a variety of different sources. Rather than be overwhelmed by that, through this analogy to being subconscious, you’re able to gather and use it.

But when something does present a point of information that’s important, you can visualize that and bring that to their attention. It’s a nice way of being the right mix of intelligence, but not overwhelming levels of data.

Stubley: Let me give you an example of that. We’ve got one customer who is one of the largest telcos in EMEA. They’re basically taking in 90,000 alarms from the network a day, and that’s their subsidiary companies, all into one environment. But 90,000 alarms needing manual intervention is a very big number.

Using the Zen technology, we’ve been able to reduce that to 10,000 alarms. We’ve effectively taken 90 percent of the manual processing out of that environment. Now, 10,000 is still a lot of alarms to deal with, but it’s a lot less frightening than 90,000, and that’s a real impact in human terms.

Gardner: Very good. Now that we understand a bit about what you do, let’s get into how you do it. What’s beneath the covers in your Zen system that allows you to confidently say we can take any volume of data we want?
If we need more processing power, we can add more services to scale transparently. That enables us to get any amount of data, which we can then process.

Stubley: Fundamentally, that comes down to the architecture we built for Zen. The first element is our data-integration layer. We have a technology that we developed over the last 10 years specifically to capture data in telco networks. It’s real-time and rugged and it can deal with any volume. That enables us to take anything from the network and push it into our real-time database, which is HP’s Vertica solution, part of the HP HAVEn family.

Vertica analysis is to basically record any amount of data in real time and scale automatically on the HP hardware platform we also use. If we need more processing power, we can add more services to scale transparently. That enables us to get any amount of data, which we can then process.

We have two processing layers. Referring to our earlier discussion about conscious and subconscious activity, our conscious activity is visualizing that data, and that’s done with Tableau.

We have a number of Tableau reports and dashboards with each of our product solutions. That enables us to envision what’s happening and allows the organization, the guys running the network, and the guys looking at different elements in the data to make their own decisions and identify what they might do.

We also have a streaming analytics engine that listens to the data as it comes into the system before it goes to Vertica. If we spot the patterns we’ve identified earlier “subconsciously,” we’ll then act on that data, which may be reducing an alarm count. It may be "actioning" something.

It may be sending someone an email. It may be creating a trouble ticket on a different system. Those all happen transparently and automatically. It’s four layers simplifying the solution: data capture, data integration, visualization, and automatic analytics.

Developing high value

Gardner: And when you have the confidence to scale your underlying architecture and infrastructure, when you are able to visualize and develop high value to a vertical industry like a telco, this allows you to then expand into more lines of business in terms of products and services and also expand into move vertical. Where have you taken this in terms of the Zen family and then where do you take this now in terms of your market opportunity?

Stubley: We focus on mobile telcos. That’s our heritage. We can take any data source from a telco, but we can actually take any data source from anywhere, in any platform and any company. That ranges from binary to HTML. You name it, and if you’ve got data, we could load it.

That means we can build our processing accordingly. What we do is position what we call solution packs, and a solution pack is a connector to the outside world, to the network, and it grabs the data. We’ve got an element of data modeling there, so we can load the data into Vertica. Then, we have already built reports in Tableau that allows us to interrogate automatically. That’s at a component level.

Once you go to a number of components, we can then look horizontally across those different items and look at the behaviors that interact with each other. If you are looking at pure telco terms, we would be looking at different network devices, the end-to-end performance of the network, but the same would apply to a fraud scenario or could apply to someone who is running cable TV.
The very highest level is finding what problem you’re going to solve and then using the data to solve it.

So multi-play players are interesting because they want to monitor what’s happening with TV as well and that will fit in exactly in the same category. Realistically, anybody with high-volume, real-time data can take benefit from Vertica.

Another interesting play in this scenario is social gaming and online advertising. They all have similar data characteristics, very high volume and fixed data that needs to be analyzed and processed automatically.

Gardner: We have quite a few other organizations that are exploring how to use the available technologies to gather and exploit data. Are there any lessons learned, any hindsight perspectives you can provide as other organizations, whether they’re using their own technology or using third-parties or some combination, what should you keep in mind as you begin this journey?

Stubley: A lot of the lessons have been learned time-and-time again. Frequently, people fall into the same traps over-and-over again. Insanity is not learning from previous mistakes, isn’t it? What we see most often, particularly in the big-data world, and I see this in a number of different forms, is when people are looking for the data to be the solution, rather than solving the business problem.

The very highest level is finding what problem you’re going to solve and then using the data to solve it. You won’t identify the problems just with big data itself. It’s too big a problem and it’s irrational, if you think about it.

One of the great things we have is that a number of solutions that sit on a big data enable you to make a starting point and then step, in small steps, to a big data solution that is more encompassing, but proven at every stage of the way. It’s a very classic project behavior with proof at every point, delivery at every point, value at every point. But please, please don’t think big data is the answer.

Why Vertica?

Gardner: One last question delving into the process by which you’ve crafted your architecture and capabilities. How long have you been using Vertica, and what is it that drove you to using it vis-à-vis alternatives?

Stubley: As far as the Zen family goes, we have used other technologies in the past, other relational databases, but we’ve used Vertica now for more than two-and-a-half years. We were looking for a platform that can scale and would give us real-time data. At the volumes we were looking at nothing could compete with Vertica at a sensible price. You can build yourself any solid solution with enough money, but we haven’t got too many customers who are prepared to make that investment.

So Vertica fits in with the technology of the 21st century. A lot of the relational database appliances are using 1980 thought processes. What’s happened with processing in the last few years is that nobody shares memory anymore, and our environment requires a non-shared memory solution. Vertica has been built on that basis. It was scaled without limit.

Gardner: And as you mentioned, Andy, Vertica is part of the HAVEn family and Hadoop, Autonomy, and other security aspects and compliance aspects are in there as well. How do you view things going forward, as more types and volumes of data are involved?
Vertica fits in with the technology of the 21st century. A lot of the relational database appliances are using 1980 thought processes.

You’re still trying to increase your value and reduce that time to delivery for the analysis. Any thoughts about what other aspects of HAVEn might fit well into your organization?

Stubley: One of the areas we’re looking at that I mentioned earlier was social media. Social media is a very natural play for Hadoop, and Hadoop is clearly a very cost-effective platform for vast volumes of data at real-time data load, but very slow to analyze.

So the combination with a high-volume, low-cost platform for the bulk of data and a very high performing real-time analytics engine is very compelling. The challenge is going to be moving the data between the two environments. That isn’t going to go away. That’s not simple, and there is a number of approaches. HP Vertica is taking some.

There is Flex Zone, and there are any number of other players in that space. The reality is that you probably reach an environment where people are parallel loading the Hadoop and the Vertica. That’s what we probably plan to do. That gives you much more resilience. So for a lot of the data we’re putting into our system, we’re actually planning to put the raw data files into Hadoop, so we can reload them as necessary to improve the resilience of the overall system too.

Gardner: Well good luck with that. I hope perhaps we can hear more about that in the next HP Discover event, but I’m afraid we’ll have to leave it there for this particular venue. I’d like to thank our guest, Andy Stubley, Vice President of Sales and Marketing at Systems Mechanics Limited, based in London. Thank you so much, Andy.

Stubley: Thank you very much.

Gardner: And a big thank you to our audience, as well, for joining us in this special new style of IT discussion coming to you directly from the HP Discover 2013 Conference in Barcelona. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP sponsored discussions. Thanks again for listening, and do come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a sponsored BriefingsDirect podcast on making telcos more responsive to customer and operators by using big-data tools and analysis. Copyright Interarbor Solutions, LLC, 2005-2014. All rights reserved.

You may also be interested in: