Showing posts with label HP Big Data. Show all posts
Showing posts with label HP Big Data. Show all posts

Thursday, July 30, 2015

Full 360 Takes Big Data Analysis Cloud Services to New Business Levels

Transcript of a BriefingsDirect discussion on the benefits of joining data analysis and the cloud.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation.

Gardner
Our next cloud case study interview highlights how Full 360 uses big data and analytics to improve their financial operations. To learn how, we're joined by  Eric Valenzuela, Director of Business Development at Full 360, based in New York. Welcome, Eric.

Eric Valenzuela: Good morning. Thank you for having me.

Gardner: Tell us about Full 360 and the role it plays in the financial sector. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

Valenzuela: Full 360 is a consulting and services firm, and we purely focus on data warehousingbusiness intelligence (BI), and hosted solutions. We build and consult and then we do managed services for hosting those complex, sophisticated solutions in the cloud, in the Amazon cloud specifically.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Gardner: And why is cloud a big differentiator for this type of service in the financial sector?

Valenzuela: It’s not necessarily just for finance. It seems to be beneficial for any company that has a large initiative around data warehouse and BI. For us, specifically, the cloud is a platform that we can develop our scripts and processes around. That way, we can guarantee 100 percent that we're providing the same exact service to all of our customers.

Valenzuela
We have quite a bit of intellectual property (IP) that’s wrapped up inside our scripts and processes. The cloud platform itself is a good starting point for a lot of people, but it also has elasticity for those companies that continue to grow and add to their data warehousing and BI solutions.

Gardner: Eric, it sounds as if you've built your own platform as a service (PaaS) for your specific activities and development and analytics on top of a public cloud infrastructure. Is that fair to say?

Valenzuela: That’s a fair assumption.

Primary requirements

Gardner: So as you are doing this cloud-based analytic service, what is it that your customers are demanding of you? What are the primary requirements you fulfill for them with this technology and approach?

Valenzuela: With data warehousing being rather new, Vertica specifically, there is a lack of knowledge out there in terms of how to manage it, keep it up and running, tune it, analyze queries and make sure that they're returning information efficiently, that kind of thing. What we try to do is to supplement that lack of expertise.

Gardner: Leave the driving to us, more or less. You're the plumbers and you let them deal with the proper running water and other application-level intelligence?

Valenzuela: We're like an insurance policy. We do all the heavy lifting, the maintenance, and the management. We ensure that your solution is going to run the way that you expect it to run. We take the mundane out, and then give the companies the time to focus on building intelligent applications, as opposed to worrying about how to keep the thing up and running, tuned, and efficient.

Gardner: Given that Wall Street has been crunching numbers for an awfully long time, and I know that they have, in many ways, almost unlimited resources to go at things like BI -- what’s different now than say 5 or 10 years ago? Is there more of a benefit to speed and agility versus just raw power? How has the economics or dynamics of Wall Street analytics changed over the past few years?
We're like an insurance policy. We do all the heavy lifting, the maintenance, and the management.

Valenzuela: First, it’s definitely the level of data. Just 5 or 10 years ago, either you had disparate pieces of data or you didn’t have a whole lot of data. Now it seems like we are just managing massive amounts of data from different feeds, different sources. As that grows, there has to be a vehicle to carry all of that, where it’s limitless in a sense.

Early on, it was really just a lack of the volume that we have today. In addition to that, 8 or 10 years ago BI was still rather new in what it could actually do for a company in terms of making agile decisions and informed decisions, decisions with intent.

So fast forward, and it’s widely accepted and adopted now. It’s like the cloud. When cloud first came out, everybody was concerned about security. How are we going to get the data in there? How are we going to stand this thing up? How are we going to manage it? Those questions come up a lot less now than they did even two years ago. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

Gardner: While you may have cut your teeth on Wall Street, you seem to be branching out into other verticals -- gaming, travel, logistics. What are some of the other areas now to which you're taking your services, your data warehouse, and your BI tools?

Following the trends

Valenzuela: It seems like we're following the trends. Recently it's been gaming. We have quite a few gaming customers that are just producing massive amounts of data.

There's also the airline industry. The customers that we have in airlines, now that they have a way to -- I hate this term -- slice and dice their data, are building really informed, intelligent applications to service their customers, customer appreciation. It’s built for that kind of thing. Airlines are now starting to see what their competition is doing. So they're getting on board and starting to build similar applications so they are not left behind.

Banking was pretty much the first to go full force and adopt BI as a basis for their practice. Finance has always been there. They've been doing it for quite a long time.

Gardner: So as the director of business development, I imagine you're out there saying, "We can do things that couldn’t have been done before at prices that weren’t available before." That must give you almost an unlimited addressable market. How do you know where to go next to sell this?
At first, we were doing a lot of education. Now, it’s just, "Yes, we can do this."

Valenzuela: It’s kind of an open field. From my perspective, I look at the different companies out there that come to me. At first, we were doing a lot of education. Now, it’s just, "Yes, we can do this," because these things are proven. We're not proving any concepts anymore. Everything has already been done, and we know that we can do it.

It is an open field, but we focus purely on the cloud. We expect all of our customers will be in the Amazon cloud. It seems that now I am teaching people a little bit more -- just because it’s cloud, it’s not magic. You still have to do a lot of work. It’s still an infrastructure.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
But we come from that approach and we make sure that the customer is properly aligned with the vision that this is not just a one- or two-month type commitment. We're not just going to build a solution, put it in our pocket, and walk away. We want to know that they're fully committed for 6-12 months.

Otherwise, you're not going to get the benefits of it. You're just going to spend the money and the effort, and you're not really going to get any benefits out of it if you're not going to be committed for the longer period of time. There still are some challenges with the sales and business development.

Gardner: Given this emphasis on selling the cloud model as much as the BI value, you needed to choose an analytics platform that was cloud-friendly and that was also Amazon AWS cloud-friendly. Tell me how Vertica and Amazon -- and your requirements -- came together.

Good timing

Valenzuela: I think it was purely a timing thing. Our CTO, Rohit Amarnath, attended a session at MIT, where Vertica was first announced. So he developed a relationship there.

This was right around the time when Amazon announced that they were offering its public cloud platform, EC2. So it made a lot of sense to look at the cloud as being a vision, looking at the cloud as a platform, looking at column databases as a future way of managing BI and analytics, and then putting the two together.

It was more or less a timing thing. Amazon was there. It was new technology, and we saw the future in that. Analytics was newly adopted. So now you have the column database that we can leverage as well. So blend the two together and start building some platform that hadn’t been done yet.
There are a lot of Vertica customers out there that are going to reach a limitation. That may require procuring more hardware, more IT staff. The cloud aspect removes all of that.

Gardner: What about lessons learned along the way? Are there some areas to avoid or places that you think are more valuable that people might appreciate? If someone were to begin a journey toward a combination of BI, cloud, and vertical industry tool function, what might you tell them to be wary of, or to double-down on?

Valenzuela: We forged our own way. We couldn’t learn from our competitors’ mistakes because we were the ones that were creating the mistakes. We had to to clear those up and learn from our own mistakes as we moved forward.

Gardner: So perhaps a lesson is to be bold and not to be confined by the old models of IT?

Valenzuela: Definitely that. Definitely thinking outside the box and seeing what the cloud can do, focus on forgetting about old IT and then looking at cloud as a new form of IT. Understanding what it cannot do as a basis, but really open up your mind and think about it as to what it can actually do, from an elasticity perspective.

There are a lot of Vertica customers out there that are going to reach a limitation. That may require procuring more hardware, more IT staff. The cloud aspect removes all of that.

Gardner: I suppose it allows you as a director of business development to go downstream. You can find smaller companies, medium-sized enterprises, and say, "Listen, you don’t have to build a data warehouse at your own expense. You can start doing BI based on a warehouse-as-a-service model, pay as you go, grow as you learn, and so forth."

Money concept

Valenzuela: Exactly. Small or large, those IT departments are spending that money anyway. They're spending it on servers. If they are on-premises, the cost of that server in the cloud should be equal or less. That’s the concept.

If you're already spending the money, why not just migrate it and then partner with a firm like us that knows how to operate that. Then, we become your augmented experts, or that insurance policy, to make sure that those things are going to be running the way you want them to, as if it were your own IT department.

Gardner: What are the types of applications that people have been building and that you've been helping them with at Full 360? We're talking about not just financial, but enterprise performance management. What are the other kinds of BI apps? What are some of the killer apps that people have been using your services to do?
I don’t know how that could be driven if it weren’t for analytics and if it weren’t for technology like Vertica to be able to provide that information.

Valenzuela: Specifically, with one of our large airlines, it's customer appreciation. The level of detail on their customers that they're able to bring to the plane, to the flight attendants, in a handheld device is powerful. It’s powerful to the point where you remember that treatment that you got on the plane. So that’s one thing.

That’s something that you don’t get if you fly a lot, if you fly other airlines. That’s just kind of some detail and some treatment that you just don’t get. I don’t know how that could be driven if it weren’t for analytics and if it weren’t for technology like Vertica to be able to provide that information.

Gardner: I'm afraid we'll have to leave it there. You've been learning about how Full 360 uses HP Vertica in the Amazon cloud to provide data warehouse and BI applications and services to its customers from Wall Street to the local airport. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

So join me in thanking Eric Valenzuela, Director of Business Development at Full 360 in New York. Thanks so much, Eric.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Valenzuela: Thank you.

Gardner: And I'd like to thank our audience as well for joining us for this IT innovation discussion. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.


Transcript of a BriefingsDirect discussion on the benefits of joining data analysis and the cloud. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Tuesday, July 28, 2015

How Big Data Technologies Hadoop and Vertica Drive Business Results at Snagajob

Transcript of a BriefingsDirect discussion on how an employment search company uses data analysis to bring better matches for job seekers and employers.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Our next innovation case study interview highlights how Snagajob in Richmond, Virginia -- one of the largest hourly employment networks for job seekers and employers – uses big data to improve their performance and to better understand how their systems provide rapid services to their users.

Snagajob recently delivered nearly 500,000 new jobs in a single month through their systems. To learn how they're managing such impressive scale, we welcome Robert Fehrmann, Data Architect at Snagajob in Richmond, Virginia.

Robert Fehrmann: Thank you for the introduction.

Gardner: First, tell us about your organization. You’ve been doing this successfully since 2000. How are hourly workers different from regular employment? What type of employment are we talking about? Let's understand the role you play in the employment market.

Fehrmann: Snagajob, as you mentioned, is America's largest hourly network for employees and employers. The hourly market means we have, relatively speaking, high turnover.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Another aspect, in comparison to some of our competitors, is that we provide an inexpensive service. So our subscriptions are on the low end, compared to our competitors.

Gardner: Tell us how you use big data to improve your operations. I believe that among the first ways that you’ve done that is to try to better analyze your performance metrics. What were you facing as a problem when it came to performance? [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

Signs of stress

Fehrmann: A couple of years ago, we started looking at our environment, and it became obvious that our traditional technology was showing some signs of stress. As you mentioned, we really have data at scale here. We have 20,000 to 25,000 postings per day, and we have about 700,000 unique visitors on a daily basis. So data is coming in very, very quickly.

Fehrmann
We also realized that we're sitting on a gold mine and we were able to ingest data pretty well. But we had problem getting information and innovation out of our big data lake.

Gardner: And of course, near real time is important. You want to catch degradation in any fashion from your systems right away. How do you then go about getting this in real time? How do you do the analysis?

Fehrmann: We started using Hadoop. I'll use a lot of technical terms here. From our website, we're getting events. Events are routed via Flume directly into Hadoop. We're collecting about 600 million key-value pairs on a daily basis. It's a massive amount of data, 25 gigabytes on a daily basis.

The second piece in this journey to big data was analyzing these events, and that’s where we're using HP Vertica. Second, our original use case was to analyze a funnel. A funnel is where people come to our site. They're searching for jobs, maybe by keyword, maybe by zip code. A subset of that is an interest in a job, and they click on a posting. A subset of that is applying for the job via an application. A subset is interest in an employer, and so on. We had never been able to analyze this funnel.

The dataset is about 300 to 400 million rows, and 30 to 40 gigabytes. We wanted to make this data available, not just to our internal users, but all external users. Therefore, we set ourselves a goal of a five-second response time. No query on this dataset should run for more than five seconds -- and Vertica and Hadoop gave us a solution for this.

Gardner: How have you been able to increase your performance reach your key performance indicators (KPIs) and service-level agreements (SLAs)? How has this benefited you?

Fehrmann: Another application that we were able to implement is a recommendation engine. A recommendation engine is that use where our jobseekers who apply for a specific job may not know about all the other jobs that are very similar to this job or that other people have applied to.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
We started analyzing the search results that we were getting and implemented a recommendation engine. Sometimes it’s very difficult to have real comparison between before and after. Here, we were able to see that we got an 11 percent increase in application flow. Application flow is how many applications a customer is getting from us. By implementing this recommendation engine, we saw an immediate 11 percent increase in application flow, one of our key metrics.

Gardner: So you took the success from your big-data implementation and analysis capabilities from this performance task to some other areas. Are there other business areas, search yield, for example, where you can apply this to get other benefits?

Brand-new applications

Fehrmann: When we started, we had the idea that we were looking for a solution for migrating our existing environment, to a better-performing new environment. But what we've seen is that most of the applications we've developed so far are brand-new applications that we hadn't been able to do before.

You mentioned search yield. Search yield is a very interesting aspect. It’s a massive dataset. It's about 2.5 billion rows and about 100 gigabytes of data as of right now and it's continuously increasing. So for all of the applications, as well as all of the search requests that we have collected since we have started this environment, we're able to analyze the search yield.
Most of the applications we've developed so far are brand-new applications that we hadn't been able to do before.

For example, that's how many applications we get for a specific search keyword in real time. By real time, I mean that somebody can run a query against this massive dataset and gets result in a couple of seconds. We can analyze specific jobs in specific areas, specific keywords that are searched in a specific time period or in a specific location of the country.

Gardner: And once again, now that you've been able to do something you couldn't do before, what have been the results? How has that impacted change your business? [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

Fehrmann: It really allows our salespeople to provide great information during the prospecting phase. If we're prospecting with a new client, we can tell him very specifically that if they're in this industry, in this area, they can expect an application flow, depending on how big the company is, of let’s say in a hundred applications per day.

Gardner: How has this been a benefit to your end users, those people seeking jobs and those people seeking to fill jobs?

Fehrmann: There are certainly some jobs that people are more interested in than others. On the flip side, if a particular job gets a 100 or 500 applications, it's just a fact that only a small number going to get that particular job. Now if you apply for a job that isn't as interesting, you have much, much higher probability of getting the job.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Gardner: I'm afraid we will have to leave it there. We've been talking with Snagajob about how they use big data on multiple levels to improve their business performance, their system’s performance, and ultimately how they go about understanding their new challenges and opportunities.

With that, I'd like to thank our guest, Robert Fehrmann, Data Architect at Snagajob in Richmond, Virginia. Thank you.

Fehrmann: Thank you, Dana.

Gardner: And I’d like to thank our audience as well for joining us for this special new style of IT discussion. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and do come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect discussion on how an employment search company uses data analysis to bring better matches for job seekers and employers. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Tuesday, April 14, 2015

GoodData Analytics Developers Share their Big Data Platform Wish List

Transcript of a BriefingsDirect podcast on how and why cloud data analytics provider GoodData makes HP Vertica an integral part of its infrastructure.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Dana Gardner: Welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people’s lives.

Once again, we're focusing on how companies are adapting to the new style of IT to improve IT performance and deliver better user experiences, as well as better business results.
Become a member of MyVertica
Register now
And gain access to the Free HP Vertica Community Edition.
Our next innovation case study interview highlights how GoodData has created a business intelligence (BI)-as-a-service capability across multiple industries to enable users to take advantage of both big-data performance as well as cloud delivery efficiencies.

To learn more we are here with a panel consisting of Tomas Jirotka, Product Manager of GoodData. Welcome, Tomas.
Gardner

Tomas Jirotka: Hello. It's great to be here.

Gardner: We are also here with Eamon O'Neill, the Director of Product Management at HP Vertica. Welcome, Eamon.

Eamon O'Neill: Thanks, Dana.

Gardner: And Karel Jakubec, Software Engineer at GoodData. Welcome.

Karel Jakubec: Thanks. It's great to be here.

Gardner: Let’s we start with you, Tomas. Tell us a bit about GoodData and why you've decided that the cloud model, data warehouses, and BI as a service are the right fit for this marketplace?

Jirotka: GoodData was founded eight years ago, and from the beginning, it's been developed as a cloud company. We provide software as a service (SaaS). We allow our customers to leverage their data and not worry about hardware/software installations and other stuff. We just provide them a great service. Their experience is seamless, and our customers can simply enjoy the product.

Jirotka
Gardner: So can you attach your data warehouse to any type of data or are you focused on a certain kind? How flexible and agile are your services?

Jirotka: We provide a platform -- and the platform is very flexible. So it's possible to have any type of data, and create insights to it there. You can analyze data coming from marketing, sales, or manufacturing divisions no matter in which industry you are.

Gardner: If I'm an enterprise and I want to do BI, why should I use your services rather than build my own data center? What's the advantage for which your customers make this choice?

Cheaper solution

Jirotka: First of all, our solution is cheaper. We have a multi-tenant environment. So the customers effectively share the resources we provide them. And, of course, we have experience and knowledge of the industry. This is very helpful when you're a beginner in BI.

Gardner: So, in order to make sure that your cloud-based services are as competitive and even much better in terms of speed, agility and cost, you need to have the right platform and the right architecture.

Jakubec
Karel, what have been some of the top requirements you’ve had as you've gone about creating your services in the cloud?

Jakubec: The priority was to be able to scale, as our customers are coming in with bigger and bigger datasets. That's the reason we need technologies like Vertica, which scales very well by just adding nodes to cluster. Without this ability, you realize you cannot implement solution for the biggest customers as you're already running the biggest machines on the market, yet they're still not able to finish computation in a reasonable time.

Gardner: I've seen that you have something on the order of 40,000 customers. Is that correct?

Jirotka: Something like that.

Gardner: Does the size and volume of the data for each of these vary incredibly, or are most of them using much larger datasets? How diverse and how varied is the amount of data that you're dealing with, customer by customer?

Jirotka: It really depends. A lot of customers, for example, uses Salesforce.com or other cloud services like that. We can say that these data are somehow standardized. We know the APIs of these services very well, and we can deliver the solution in just a couple of days or weeks.

Some of the customers are more complex. They use a lot of services from the Internet or internally,  and we need to analyze all of the sources and combine them. That's really hard work.

Gardner: In addition to scale and efficiency in terms of cost, you need to also be very adept at a variety of different connection capabilities, APIs, different data sets, native data, and that sort of thing.

Jirotka: Exactly. Agility, in this sense, is really curial.

Gardner: How long you have been using Vertica and how long have you been using BI through Vertica for a variety of these platform services?

Working with Vertica

Jirotka: We started working with Vertica at the beginning of the last year. So, one and a half years. We began moving some of our customers with the largest data marts to Vertica in 2013.

Gardner: What were some of the driving requirements for changing from where you were before?

Jirotka: The most important factor was performance. It's no secret that we also have Postgres in our platform. Postgres simply doesn’t support big data. So we chose Vertica to have a solution that is scalable up to terabytes of data.

Gardner: We're learning quite a bit more about Vertica and the roadmap. I'd like to check in with Eamon and hear more about what some of the newer features are. What’s creating excitement?

O'Neill
O’Neill: Far and away, the most exciting is about real-time personalized analytics. This is going to allow GoodData to show a new kind of BI in the cloud. A new feature we released last year in our latest 7.1 release is called Live Aggregate Projections. It's for telling you about what’s going on in your electric smart meter, that FitBit that you're wearing on your wrist, or even your cell-phone plan or personal finances.

A few years ago, Vertica was blazing fast, telling you what a million people are doing right now and looking for patterns in the data, but it wasn’t as fast in telling you about my data. So we've changed that.

With this new feature, Live Aggregate Projections, you can actually get blazing fast analytics on discrete data. That discrete data is data about one individual or one device. It could be that a cell phone company wants to do analytics on one particular cell phone tower or one meter.

That’s very new and is going to open up a whole new kind of dashboarding for GoodData in the cloud. People are going to now get the sub-second response to see changes in their power consumption, what was the longest phone call they made this week, the shortest phone call they made today, or how often do they go over their data roaming charges. They'll get real-time alerts about these kinds of things.
Become a member of MyVertica
Register now
And gain access to the Free HP Vertica Community Edition.
When that was introduced last year, it was standing room only. They were showing some great stats from power meters and then from houses in Europe. They were fed into Vertica and they showed queries that last year we were taking Vertica one-and-half seconds. We're now taking 0.2 seconds. They were looking at 25 million meters in the space for a few minutes. This is going to open up a whole new kind of dashboard for GoodData and new kinds of customers.

Gardner: Tomas, does this sound like something your customers are interested in, maybe retail? The Internet of Things is also becoming prominent, machine to machine, data interactions. How do you view what we've just heard Eamon describe, how interesting is it?

More important

Jirotka: It sounds really good. Real-time, or near real-time, analytics is becoming a more-and-more important topic. We hear it also from our customers. So we should definitely think about this feature or how to integrate it into the platform.

Gardner: Any thoughts, Karel?

Jakubec: Once we introduce Vertica 7.1 to our platform, it will be definitely one of features we will focus on. We have developed a quite complex caching mechanism for intermediate results and it works like a charm for Postgres SQL, but unfortunately it doesn't perform so well for Vertica. We believe that features like Live Aggregate Projection will improve this performance.

Gardner: So it's interesting. As HP Vertica comes out with new features, that’s something that you can productize, take out to the market, and then find new needs that you could then take back to Vertica. Is there a feedback loop? Do you feel like this is a partnership where you're displaying your knowledge from the market that helps them technically create new requirements?

Jakubec: Definitely, it's a partnership and I would say a complex circle. A new feature is released, we provide feedback, and you have a direction to do another feature or improve the current one. It works very similarly with some of our customers.
Engineer-to-engineer exchanges happen pretty often in the conference rooms.

O’Neill: It happens at a deeper level too. Karel’s coworkers flew over from Brno last year, to our office in Cambridge, Massachusetts and hung out for a couple of days, exchanging design ideas. So we learned from them as well.

They had done some things around multi-tenancy where they were ahead of us and they were able to tell us how Vertica performed when they put extra schemers on a catalog. We learned from that and we could give them advice about it. Engineer-to-engineer exchanges happen pretty often in the conference rooms.

Gardner: Eamon, were there any other specific features that are popping out in terms of interest?

O’Neil: Definitely our SQL on Hadoop enhancements. For a couple of years now we've been enabling people to do BI on top of Hadoop. We had various connectors, but we have made it even faster and cheaper now. In this most recent 7.1 release, you can now install Vertica on your Hadoop cluster. So you no longer have to maintain dedicated hardware for Vertica and you don’t have to make copies of the data.

The message is that you can now analyze your data, where it is and as it is, without converting from the Hadoop format or a duplication. That’s going to save companies a lot of money. Now, what we've done is brought the most sophisticated SQL on Hadoop to people without duplication of data.

Gardner: Tomas, how does Hadoop factor into your future plans?

Using Hadoop

Jirotka: We employ Hadoop in our platform, too. There are some ETL scripts, but we've used it in a traditional form of MapReduce jobs for a long time. This is really costly and inefficient approach because it takes much time to develop and debug it. So we may think about using Vertica directly with Hadoop. This would dramatically decrease the time to deliver it to the customer and also the running time of the scripts.

Gardner: Eamon, any other issues that come to mind in terms of prominence among developers?

O’Neill: Last year, we had our Customer Advisory Board, where I got to ask them about those things. Security came to the forefront again and again. Our new release has new features around data-access control.

We now make it easy for them to say that, for example, Karel can access all the columns in a table, but I can only access a subset of them. Previously, the developers could do this with Vertica, but they had to maintain SQL views and they didn’t like that. Now it's done centrally.
They don’t want have to maintain security in 15 places. They'd like Vertica to help them pull that together.

They like the data-access control improvements, and they're saying to just keep it up. They want more encryption at rest, and they want more integration. They particularly stress that they want integration with the security policies in their other applications outside the database. They don’t want have to maintain security in 15 places. They'd like Vertica to help them pull that together.

Gardner: Any thoughts about security, governance and granularity of access control?

Jirotka: As we're a SaaS company, security is number one for us. So far, we have some solutions that work for us, but these solutions are quite complex. Maybe we can discover new features from Vertica and use that feature.

Jakubec: Any simplification of security and access controls is a great new. Restriction of access for some users to just subset of values or some columns is very common use case for many customers. We already have a mechanism to do it, but as Eamon said it involves maintenance of views or complex filtering. If it is supported by Vertica directly, it’s great. I didn’t know that before and I hope we can use it.

Gardner: Very good. I'm afraid we’ll have to leave it there. We've been hearing how GoodData, based in San Francisco, a BI service provider, acts as a litmus test for how a platform should behave in the market, both in terms of performance as well as economics. They've been telling us their story as well as their interest in the latest version of HP Vertica.

So a big thank you to our guests, Tomas Jirotka, Product Manager at GoodData; Eamon O’Neill, Director of Product Management at HP Vertica, and Karel Jakubec, the Software Engineer at GoodData.
Become a member of MyVertica
Register now
And gain access to the Free HP Vertica Community Edition.
And also a big thank you to our audience for joining this special new style of IT discussion. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks for joining, and don’t forget to come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast on how and why cloud data analytics provider GoodData makes HP Vertica an integral part of its infrastructure. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Thursday, March 12, 2015

How a Hackathon Approach Juices Innovation on Big Data Applications for Thomson Reuters

Transcript of a BriefingsDirect discussion on how information giant Thomson Reuters leveraged a hackathon to spur new ideas among its big data-focused developers.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on technology innovation and how it's making an impact on people’s lives.

Gardner
Once again, we're focusing on a major trend, how big data changes everything, and how companies are adapting to the new style of IT to gain new insights, deliver better user experiences, and better overall business results.

Our next innovation interview explores the use of a hackathon approach to unlock creativity in the search for better use of big data for analytics. We will hear how Thomson Reuters in London sought to foster innovation and derive more value from its vast trove of business and market information.

The result: A worldwide virtual hackathon that brought together developers and data scientists to uncover new applications, visualizations, and services to make all data actionable and impactful.
Bringing human understanding to the cloud
Helping developers build a new class of apps
To learn more about getting developers on board the big-data analysis train, please join me in welcoming our guest, Chris Blatchford, Director of Platform Technology in the IT organization at Thomson Reuters in London. Welcome, Chris.

Chris Blatchford: Hi. Good to be here.

Blatchford
Gardner: Tell us about Thomson Reuters, the organization, for those who don't know.

Blatchford: Thomson Reuters is the world's leading source of intelligent information. We provide data across the finance, legal, news, IP, and science, tax, and accounting industries through product and service offerings, combining industry expertise with innovative technology.

Gardner: It’s hard to think of an organization where data and analysis is more important. It’s so core to your very mission.

Blatchford: Absolutely. We take data from a variety of sources. We have our own original data, third-party sources, open-data sources, and augmented information, as well as all of the original content we generate on a daily basis. For example, our journalists in the field provide original news content to us directly from all over the globe. We also have third-party licensed data that we further enrich and distribute to our clients through a variety of tools and services

Gardner: And therein lies the next trick, what to do with the data once you have it. About this hackathon, how did you come up upon that as an idea to foster innovation?

Big, Open, Linked Data

Blatchford: One of our big projects or programs of work currently is, as everyone else is doing, big data. We have an initiative called BOLD, which is Big, Open, Linked Data, headed up by Dan Bennett. The idea behind the project is to take all of the data that we ingest and host within Thomson Reuters, all of those various sources that I just explained, stream all of that into a central repository, cleanse the data, centralize it, extract meaningful information, and subsequently expose it to the rest of the businesses for use in their specific industry applications.

As well as creating a central data lake of content, we also needed to provide the tools and services that allow businesses to access the content; here we have both developed our own software and licensed existing tools.

So, we could demonstrate that we could build big-data tools using our internal expertise, and we could demonstrate that we could plug in third-party specific applications that could perform analysis on that data. What we hadn’t proved was that we could plug in third-party technology enterprise platforms in order to leverage our data and to innovate across that data, and that’s where HP came in.

HP was already engaged with us in a number of areas, and I got to speaking with their Big Data Group around their big data solutions. IDOL OnDemand came up. This is now part of the Haven OnDemand platform. We saw some synergies there between what we were doing with the big-data platform and what they could offer us in terms of their IDOL OnDemand API’s. That’s where the good stuff started.

Gardner: Software developers, from the very beginning, have had a challenge of knowing their craft, but not knowing necessarily what their end users want them to do with that craft. So the challenge -- whether it’s in a data environment, a transactional environment or interface, or gaming -- has often been how to get the requirements of what you're up to into the minds of the developers in a way that they can work with. How did the hackathon contribute to solving that?
As well as creating a central data lake of content, we also need to provide the tools and services that allow businesses to access the content.

Blatchford: That’s a really good question. That’s actually one of the biggest challenges big data has in general. We approach big data in one of two ways. You have very specific use cases, for example, consider a lawyer working on a particular case for a client, it would be useful for them to analyze prior cases with similar elements. If they are able to extract entities and relevant attributes, they may be able to understand the case final decision, or perhaps glean information that is relevant to their current case.

Then you have the other approach, which is much more about exploration, discovering new insights, trends, and patterns. That’s similar to the the approach we wanted to take with the hackathon -- provide the data and the tools to our developers for them just to go and play with the data.

We didn’t necessarily want to give them any requirements around specific products or services. It was just, "Look, here is a cool platform with some really cool APIs and some capabilities. Here is some nice juicy data. Tell us what we should be doing? What can we come up with from your perspective on the world?"

A lot of the time, these engineers are overlooked. They're not necessarily the most extroverted of people by the nature of what they do and so they miss chances, they miss opportunities, and that’s something we really wanted to change.

Gardner: It’s fascinating the way to get developers to do what you want them to do is to give them no requirements.

Interesting end products

Blatchford: Indeed. That can result in some interesting end-products. But, by and large, our engineers are more commercially savvy than most, hence we can generally rely on them to produce something that will be compelling to the business. Many of our developers have side projects and personal development projects they work on outside of the realms of their job requirement. We should be encouraging this sort of behavior.

Gardner: So what did you get when you gave them no requirements? What happened?

Blatchford: We had 25 teams that submitted their ideas. We boiled that down to 7 finalists based upon a set of preliminary criteria, and out of those 7, we decided upon our first-, second-, and third-place winners. Those three end results were actually taken, or are currently going through a product review, to potentially be implemented into our product lines.

The overall winner was an innovative UI design for mobile devices, allowing users to better navigate our content on tablets and phones. There was a sentiment analysis tool, that allowed users to paste in news stories or any news content source on the web and extract sentiment from that news story.

And the other was more of an internally focused, administrative exploration tool, that  allowed us to more intuitively navigate our own data, which perhaps doesn’t initially seem as exciting as the other two, but is actually a hugely useful application for us.
Bringing human understanding to the cloud
Helping developers build a new class of apps
Gardner: Now, how does IDOL OnDemand come to play in this? IDOL is the ability to take any kind of information, for the most part, apply a variety of different services to it, and then create analysis as a service. How did that play into the hackathon? How did the developers use that?

Blatchford: Initially the developers looked at the original 50-plus APIs that IDOL OnDemand provides, and you have everything in there from facial recognition, to OCR, to text analytics, to indexing, all sorts of cool stuff. Those, in themselves, provided sufficient capabilities to produce some compelling applications, but our developers also utilized Thomson Reuters API’s and resources to further augment the IDOL platform.

This was very important, as it demonstrated that not only could we plug in an Enterprise analytics tool into our data, but also that it would fit well with our own capabilities.

Gardner: And HP Big Data also had a role in this. How did that provide value?

Five-day effort

Blatchford: The expertise. We should remember we stood this hackathon up from inception to completion in a little over one month, and that’s I think pretty impressive by any measure.

The actual hackathon lasted for five days. We gave the participants a week to get familiar with the APIs, but they really didn’t need that long because the documentation behind the APIs on IDOL OnDemand and the kind of "try it now" functionality it has was amazing. This is what the engineers and the developers were telling me. That’s not my own words.

The Big Data Group was able to stand this whole thing up within a month, a huge amount of effort on HP’s side that we never really saw. That ultimately resulted in a hugely successful virtual global hackathon. This wasn’t a physical hackathon. This was a purely virtual hackathon the world over.

Gardner: HP has been very close to developers for many years, with many tools, leading tools in the market for developers. They're familiar with the hackathon approach. It sounds like HP might have a business in hackathons as a service. You're proving the point here.

For the benefit of our listeners, if someone else out there was interested in applying the same approach, a hackathon as a way of creating innovation, of sparking new thoughts, light bulbs going off in people's heads, or bringing together cultures that perhaps hadn't meshed well in the past, what would you advise them?
First and foremost, the reason we were successful is because we had a motivated, willing partner in HP.

Blatchford: That’s a big one. First and foremost, the reason we were successful is because we had a motivated, willing partner in HP. They were able to put the full might of their resources and technology capabilities behind this event, and that along side our own efforts ultimately resulted in the events success.

That aside, you absolutely need to get the buy-in of the senior executives within an organization, get them to invest into the idea of something as open as a hackathon. A lot of hackathons are quite focused on a specific requirement. We took the opposite approach. We said, "Look, developers, engineers, go out there and do whatever you want. Try to be as innovative in your approach as possible."

Typically, that approach is not seen as cost effective, businesses like to have defined use cases, but sometimes that can strangle innovation. Sometimes we need to loosen the reins a little.

There are also a lot of logistical checks that can help. Ensure you have clear criteria around hackathon team size and members, event objectives, rules, time frames and so on. Having these defined up front makes the whole event run much smoother.

We ran the organization of the event a little like an Agile project, with regular stand-ups and check-ins. We also stood up a dedicated internal intranet site with all the information above. Finally, we set-up user accounts on the IDOL platform early on, so the participants could familiarize themselves with the technology.

Winning combination

Gardner: Yeah, it really sounds like a winning combination: the hackathon model, big data as the resource to innovate on, and then IDOL OnDemand with 50 tools to apply to that. It’s a very rich combination.

Blatchford: That’s exactly right. The richness in the data was definitely a big part of this. You don’t need millions of rows of data. We provided 60,000 records of legal documents and we had about the same in patents and news content. You don’t need vast amounts of data, but you need quality data.

Then you also need a quality platform as well. In this case IDOL OnDemand.The third piece is what’s in their heads. That really was the successful formula.
You don’t need vast amounts of data, but you need quality data.

Gardner: I have to ask. Of course, the pride in doing a good job goes a long way, but were there any other incentives; a new car, for example, for the winning hackathon application of the day?

Blatchford: Yeah, we offered a 1960s Mini Cooper to the winners. No, we didn't. We did offer other incentives. There were three main incentives. The first one, and the most important one in my view, and I think in everyone’s view, was exposure to senior executives within the organization. Not just face time, but promotion of the individual within the organization. We wanted this to be about personal growth as much as it was about producing new applications.

Going back to trying to leverage your resources and give them opportunities to shine, that’s really important. That’s one of the things the hackathon really fostered -- exposing our talented engineers and product managers, ensuring they are appreciated for the work they do.

We also provided an Amazon voucher incentive, and HP offered some of their tablets to the winners. So it was quite a strong winning set.

Gardner: Sounds like a very successful endeavor others out there might be interested in emulating, so I am glad we could find out more about it.

I'm afraid we're going to have to leave it there. We've been learning about how Thomson Reuters in London sought to foster innovation and derive more value from its vast trove of business and market data, and how they used a virtual worldwide hackathon approach to bring together developers and data scientists to uncover new applications and services.

So a big thank you to our guest, Chris Blatchford, the Director of Platform Technology in the IT organization at Thomson Reuters in London. Thank you, Chris.

Blatchford: Thanks.
Bringing human understanding to the cloud
Helping developers build a new class of apps
Gardner: And I'd like to thank our audience as well for joining us for this special how Big Data Changes Everything discussion. We've explored solid evidence from early adopters of how Big Data changes everything, and how companies are adapting to the new style of IT to gain new insights, deliver better user experiences, and better overall business results.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect discussion on how information giant Thomson Reuters leveraged a hackathon to spur new ideas among its big data-focused developers. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in: