Showing posts with label Java. Show all posts
Showing posts with label Java. Show all posts

Wednesday, February 03, 2010

CERN’s Evolution to Cloud Computing Portends Revolution in Extreme IT Productivity?

Transcript of a BriefingsDirect podcast on the move to cloud computing for data-intensive operations, focusing on the work being done by the European Organization for Nuclear Research.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: Platform Computing.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect. Today, we present a sponsored podcast discussion on some likely directions for cloud computing based on the exploration of expected cloud benefits at a cutting edge global IT organization.

We are going to explore the thinking on how cloud computing both the private and public varieties might be useful at CERN, the European Organization for Nuclear Research in Geneva.

CERN has long been an influential bellwether on how extreme IT problems can be solved. Indeed, the World Wide Web owes a lot of its usefulness to early work done at CERN. Now the focus is on cloud computing. How real is it, and how might an organization like CERN approach cloud?

In many ways CERN is quite possibly the New York of cloud computing. If cloud can make it there, it can probably make it anywhere. That's because CERN deals with fantastically large data sets, massive throughput requirements, a global workforce, finite budgets, and an emphasis on standards and openness.

So please join us, as we track the evolution of high-performance computing (HPC) from clusters to grid to cloud models through the eyes of CERN, and with analysis and perspective from IDC, as well as technical thought leadership from Platform Computing.

Join me in welcoming our panel today, Tony Cass, Group Leader for Fabric Infrastructure and Operations at CERN. Welcome, Tony.

Tony Cass: Pleased to meet you.

Gardner: We’re also here with Steve Conway, Vice President in the High Performance Computing Group at IDC. Welcome, Steve.

Steve Conway: Thanks. Welcome to everyone.

Gardner: And, we're also here with Randy Clark, Chief Marketing Officer at Platform Computing. Welcome Randy.

Randy Clark: Thank you. Glad to be here.

Gardner: Over the last several years, we've seen cloud computing become quite popular as a concept. It remains largely confined to experimentation, but this notion of private cloud computing is being scoped out by many large and influential enterprises as well as large early adopters like CERN.

Let me go to you Steve Conway. What's the difference between private and public cloud and how far away are any tangible benefits of cloud computing from your perspective?

Already here

Conway: Private cloud computing is already here, and quite a few companies are exploring it. We already have some early adopters. CERN is one of them. Public clouds are coming. We see a lot of activity there, but it's a little bit further out on the horizon than private or enterprise cloud computing.

Just to give you an example, we just did a piece of research for one of the major oil and gas companies, and they're actively looking at moving part of their workload out to cloud computing in the next 6-12 months. So, this is really coming up quickly.

Gardner: So, this notion of having a cohesive approach to computing and blending what you do on premises with these other providers isn't just pie in the sky. This is really something people are serious about.

Conway: Well, CERN is clearly serious about it in their environment. As I said, we're also starting to see activity pick up with cloud computing in the private sector with adoption starting somewhere between six months from now and, for some, more like 12-24 months out.

Gardner: Randy Clark, from your perspective, how many customers of Platform Computing would you consider to be seriously evaluating what we now refer to as public or private cloud?

Clark: We have formally interviewed over 200 customers out of our installed base of 2,000. A significant portion -- I wouldn’t put an exact number on that, but it's higher than we initially anticipated -- are looking at private-cloud computing and considering how they can leverage external resources such as Amazon, Rackspace and others. So, it's easily a third and possibly more.

Gardner: Tony Cass, let's go to you at CERN. Tell us first a little bit about CERN for those of our readers who don’t know that much or aren't that familiar. Tell us about the organization and what it does, and then we can start to discuss your perceptions about cloud.

Cass: We're a laboratory that exists to enable, initially Europe’s and now the world’s, physicists to study fundamental questions. Where does mass come from? Why don’t we see anti-matter in large quantities? What's the missing mass in the universe? They're really fundamental questions about where we are and what the universe is.

We do that by operating an accelerator, the Large Hadron Collider, which collides protons thousands of times a second. These collisions take place in certain areas around the accelerator, where huge detectors analyze the collisions and take something like a digital photograph of the collision to understand what's happening. These detectors generate huge amounts of data, which have to be stored and processed at CERN and the collaborating institutes around the world.

We have something like 100,000 processors around the world, 50 petabytes of disk, and over 60 petabytes of tape. The tape is in just a small number of the centers, not all of the hundred centers that we have. We call it "computing at the terra-scale," that's terra with two R's. We’ve developed a worldwide computing grid to coordinate all the resources that we have with the jobs of the many physicists that are working on these detectors.

Gardner: So, to look at the IT problem and unpack it a little bit. You're dealing with such enormous amounts of data. You’ve been in the distribution of these workloads for quite some time. Maybe you could explain a little bit the evolution of how you've distributed and managed such extreme workload?

No central management

Cass: If you look at the past, in the 1990’s, we had people collaborating, but there was no central management. Everybody was based at different institutes and people had to submit the workloads, the analysis, or the Monte Carlo simulations of the experiments they needed.

We realized in 2000-2001 that this wasn’t going to work and also that the scale of resources that we needed was so vast that it couldn’t all be installed at CERN. It had to be shared between CERN, a small number of very reliable centers we call the Tier One centers and then 100 or so Tier Two centers at the universities. We were developing this thinking around the same time as the grid model was becoming popular. So, this is what we’ve done.

What a lot of the grid academics have done is in understanding or exploring what could be done with the grid, as an idea. What we've been focusing on is making it work and not pushing the envelope in terms of the technology, but pushing the envelope in terms of the scale to make sure that it works for the users. We connect the sites. We run tens of thousands of jobs a day across this and gradually we’ve run through a number of exercises to distribute the data at gigabytes a second and tens of thousands of jobs a day.

We've progressively deployed grid technology, not developed it. We've looked at things that are going on elsewhere and made them work in our environment.

Gardner: As I understand it, the interest you have in cloud isn’t strictly a matter of ripping and replacing, but augmenting what you're already doing vis-a-vis these grid models.

Cass: Exactly. The grid solves the problem in which we have data distributed around the world and it will send jobs to the data. But, there are two issues around that. One is that if the grid sends my job to site A, it does so because it thinks that a batch slot will become available at site A first. But, maybe a grid slot becomes available at site B and my job is site A. Somebody else who comes along later actually gets to run their job first.

Today, the experiment team submits a skeleton job to all of the sites in order to detect which site becomes available first. Then, they pull down my job to this site. You have lots of schedulers involved in this -- in the experiment, the grid, and the site -- and we're looking at simplifying that.

These skeleton jobs also install software, because they don’t really trust the sites to have installed the software correctly. So, there's a lot of inefficiency there. This is symptomatic of a more general problem. Batch workers are good at sharing resources that are relatively static, but not when the demand for resource types changes dynamically.

So, we’re looking at virtualizing the batch workers and dynamically reconfiguring them to meet the changing workload. This is essentially what Amazon does with EC2. When they don’t need the resources, they reconfigure them and sell the cycles to other people. This is how we want to work in virtualization and cloud with the grid, which knows where the data is.

Gardner: Steve Conway, you’ve been tracking HPC for some time at IDC. Maybe you have some perceptions on how CERN is a leading adopter of IT over the years, the types of problems they're solving now, or the types of problems other organizations will be facing in the future. Could you tell us about this management issue and do you think that this is going to become a major requirement for cloud computing?

World technology leader

Conway: Starting with CERN, their scientists have earned multiple Nobel prizes over the years for their work in particle physics. As you said before, CERN is where Tim Berners-Lee and his colleagues invented the World Wide Web in the 1980s.

More generally, CERN is a recognized world leader in technology innovation. What’s been driving this, as Tony said, are the massive volumes of data that CERN generates along with the need to make the data available to scientists, not only across Europe, but across the world.

For example, CERN has two major particle detectors. They're called CMS and ATLAS. ATLAS alone generates a petabyte of data per second, when it’s running. Not all that data needs to be distributed, but it gives you an idea of the scale or the challenge that CERN is working with.

In the case of CERN’s and Platform’s collaboration, as Tony said, the idea is not just to distribute the data but also the applications and the capability to run the scientific problem.

CERN is definitely a leader there, and cloud computing is really confined today to early adopters like CERN. Right now, cloud computing services constitute about $16 billion as a market.

IDC: By 2012, which is not so far away, we project that spending for cloud computing is going to grow nearly threefold to about $42 billion. That would make it about 9 percent of IT spending.



That’s just about four percent of mainstream IT spending. By 2012, which is not so far away, we project that spending for cloud computing is going to grow nearly threefold to about $42 billion. That would make it about 9 percent of IT spending. So, we predict it’s going to move along pretty quickly.

Gardner: How important is this issue that Tony brought up about being able to manage in a dynamic environment and not just more predictable static batch loads?

Conway: It’s the single biggest challenge we see for not only cloud computing, but it has affected the whole idea of managing these increasingly complex environments -- first clusters, then grids, and now clouds. Software has been at the center of that.

That’s one of the reasons we're here today with Platform and CERN, because that’s been Platform’s business from the beginning, creating software to manage clusters, then grids, and now clouds, first for very demanding, HPC sites like CERN and, more recently, also for enterprise clients.

Gardner: Randy Clark, as you look at the marketplace and see organizations like CERN changing their requirements, what, in your thinking, is the most important missing part from what you would do in management with HPC and now cloud? What makes cloud different, from a management perspective?

Dynamic resources

Clark: It’s what Tony said, which is having the resources be dynamic not static. Historically, clusters and grids have been relatively static, and the workloads have been managed across those. Now, with cloud, we have the ability to have a dynamic set of resources.

The trick is to marry and manage the workloads and the resources in conjunction with each other. Last year, we announced our cloud products -- Platform LSF and Platform ISF Adaptive Cluster -- to address that challenge and to help this evolution.

Gardner: Let’s go back to Tony Cass. Tell me what you’re doing with cloud in terms of exploration. I know you’re not in a position to validate, or you haven’t put in place, any large-scale implementation or solutions that would lead the market. But, I’m very curious about what the requirements are. What are the problems that you're trying to solve that you think cloud computing specifically can be useful in?

Cass: The specific problem that we have is to deliver the most physics we can within the fixed budget and the fixed amount of resources. These are limited either by money or by data-center cooling and generally are much less than the experiment wants. The key aim is to deliver the most cycles we can and the most efficient computing we can to the physicists.

I said earlier that we're looking at virtualization to do this. We’ve been exploring how to make sure that the jobs can work in a virtual environment and that we can instantiate virtual machines (VMs), as necessary, according to the different experiments that are submitting workloads at one time to integrate the instantiation of VMs with the batch system.

At the moment, we're looking at how you can reliably send a virtual image that's generated at one place to another site.



Once we got that working, we figured that the real problem was managing the number of VMs. We have something like 4,000 boxes, but if you have a VM per call, plus a few spare, then it can easily get to 60,000, 70,000, or 80,000 VMs. Managing these is the problem that we are trying to explore now, moving away from “can we do it” to “can we do it on a huge scale?”

Gardner: Are you yet at the point where you want to be able to manage the VMs that you have under your own control, and perhaps starting to deploy virtualized environments and workloads in someone else’s cloud and make them managed and complementary.

Cass: There are two aspects to that. The resources in our community are at other sites, and all of the sites are very independent. They are also academic environments. So, they are exploring things in their own way as well. At the moment, we're looking at how you can reliably send a virtual image that's generated at one place to another site.

Amazon does this, but there are tight constraints in the way they manage that cluster, because they built it thinking about this. Universities maybe didn’t build their own cluster in a way that separates that out from some of the other computing they're doing. So, there are security and trust implications there that we are looking at. That will be a thing to collaborate on long-term.

More cost effective

Certainly, if we configure things in our own way, when we look in a cloud environment, perhaps it will be more cost effective for us to only purchase the equipment we need for the average workload and they buy resources from Amazon or other providers. But, there are interesting things you have to explore about the fact that the data is not at Amazon, even if they have the cycles.

There are so many things that we’re thinking about. The one we’re focusing on at the moment is effectively managing the resources that we have here at CERN.

Gardner: Steve Conway, it sounds as if CERN has, with its partnered network, a series of what we might call private-cloud implementations and they're trying to get them to behave in concert at what we might call at a public cloud level. That exercise could, as with the World Wide Web, create some de-facto standards and approaches that might, in fact, help what we call hybrid cloud computing moving forward. Does that fairly surmise where we are?

Conway: That’s right. There are going to have to be more rigorous open standards for the clouds. What Tony was talking about at CERN is something that we see elsewhere. People are turning to public clouds today -- "turning to" just meaning exploring at this point for a way to handle overload work and search workloads.

But, we're seeing some smaller and medium-size businesses looking to public clouds as a way to avoid having to purchase their own internal resources . . . and also as a way of avoiding having to hire experts who know how to operate them.



The Internet itself is a pretty high latency network, if you think of it that way. People are looking to send portions of the workload that doesn't have a lot of communication dependencies particularly inter-processor communication dependencies, because the latency doesn't support that.

But, we're seeing some smaller and medium-size businesses looking to public clouds as a way to avoid having to purchase their own internal resources, clusters for example, and also as a way of avoiding having to hire experts who know how to operate them. For example, engineering services firms don't have those experts in house today.

Gardner: Back to you Tony Cass, I know this is still a bit hypothetical, but if there were the standards in place, and you were able to go to a third-party cloud provider for some of these spikes or occasionally dynamically generated workloads that perhaps exceed your current on-premise’s capabilities, would this be a financial boon to you, where you could protect your pricing and you could decide the right supply and demand fit when it comes to these extreme computing problems?

Cass: It would certainly be a boon. The possibility is being demonstrated by experiments that are actually based at Brookhaven to do simulations that are CPU-intensive, where they don't need much data transfer or data access. They have been able to run simulations cost-effectively with EC2.

Although their cycles, compared to some of the things we're doing, are more expensive, if we don't have to buy all of the resources, we could certainly save money. Another aspect is that it is beyond money in some sense. If you need to get something fixed for a conference, and you are desperately trying to decide whether or not you’ve discovered the Higgs then it's not a case of “money's no object,” but you can get the resources from a cloud much more quickly than you can install capacity at CERN. So both aspects are definitely of interest.

Gardner: Randy Clark, this makes a great deal of sense from the perspective of a large research organization. But, we're not just talking about specific workloads. We're talking about workloads that will be common across many other vertical industries or computing environments. Can you name a few, or mention some from your experience, where we should expect the same sorts of economic benefits to play out.

Different use cases

Clark: What we're seeing is across industries. Financial services is certainly taking a leadership role. There's a lot going on in the semiconductor or electronic industry. Business intelligence (BI) is across industries and government. So, across industries, we see different use cases.

To your point, these use cases are enterprise applications to run the business, and we're seeing that in Java applications, test and development environments, and traditional HPC environments.

That's something driven by the top of the organization. Tony and Steve laid it out well. They look at the public/private cloud economically, and say, "Architecturally, what does this mean for our business?" Without any particular application in mind they're asking how to evolve to this new model. So, we're seeing it very horizontally and, to your point, in enterprise and HPC applications.

Gardner: Steve Conway, thinking about these large datasets, Randy brought up BI, and that, of course, means warehousing, data analytics, and advanced analytics. A lot of organizations are creating datasets at a scale never anticipated, never mind seen before, things from sensors, mobile devices, network computing, or social networking.

BI is one of those markets that, in its attributes, straddles the world of HPC and enterprise computing just as financial services does . . .



How do we bring together these compute resources, the raw power with these large data sets. I think this is an issue that CERN might also be a bellwether on, in somehow managing these large data sets and the compute power, bringing them architecturally into alignment.

Conway: BI is one of those markets that, in its attributes, straddles the world of HPC and enterprise computing just as financial services does, in the sense that they have workloads that don't have a whole lot of communications dependencies. They don't need networks with very high latency for the most part.

You see organizations like the University of Phoenix, which has 280,000 online students, that have already made this evolution -- in this case, with Platform helping them out -- from clusters to grid computing today. Now, they're looking toward cloud computing as a way to take them further.

You also see that not just in the private sector side. One of the other active customers that's really looking in that same direction is the Centers for Disease Control (CDC), which has moved to from clusters to grid computing.

What you're seeing here is people who have already stepped through the earlier stages of this evolution. They've gone from clusters to grid computing for the most part and now are contemplating the next move to cloud computing. It's an evolutionary move. It could have some revolutionary implications, but, from a technological standpoint, sometimes evolutionary is much safer and better than revolutionary.

Gardner: Tell us about some of the solutions that you now need to bring to market or are bringing to market around management and other issues? Where have you found that the rubber hits the road, in terms of where people can take this in real time? What's the current state of the art? Rather than talking about hypothetical, what's now possible, when it comes to moving from cluster and grid to the revolution of cloud?

Interaction of technologies

Clark: What Platform sees is the interaction of distributed computing and new technologies like virtualization requiring management. What I mean by that is the ability, in a large farm or shared environment, to share resources and then make those resources dynamic. It's the ability to add virtualization into those on the resource side, and then, on the server side, to make it Internet accessible, have a service catalog, and move from providing IT support to truly IT as a competitive service.

The state of the art is that you can get the best of Amazon, ease of use, cost, accessibility with the enterprise configuration, scale, and dependability of the enterprise grid environment.

There isn't one particular technology or implementation that I would point to, to say "That is state of the art," but if you look across the installations we see in our installed base, you can see best practices in different dimensions with each of those customers.

Gardner: Randy, what are some typical ways that you're seeing people getting started, when they want to make these leaps from evolutionary progression to revolutionary paybacks? Where do they start making that sort of catalytic difference?

Taking a step back, we see customers thinking about architecturally how do they want to have that management layer.



Clark: The evolution is the technology, as Steve said. The revolution is in the approach architecturally to how to get to that new spot.

Taking a step back, we see customers thinking about architecturally and how they want to have that management layer. What is that management layer going to mean to them going forward? And, can they quickly identify a set of applications and resources and get started?

So, there is an architecture piece to it, thinking about what the future will hold, but then there is a very pragmatic piece -- let's get going, let's engage, let's build something and be able to scale that out over time. We saw that approach in grid computing. We're encouraging folks to think, but then also to get started.

Gardner: Tony Cass at CERN, what are your next steps? Where would you expect to be heading next as you explore the benefits and possible real-world opportunities?

Cass: We’re definitely concentrating for the moment on how we exploit effective resources here. The wider benefits we'll have to discuss with our community.

Gardner: What would you like to see happen next?

Focusing on delivery

Cass: What I would like to see happen next is a definite cloud environment at CERN, where we move from something that we're thinking about to something that is in operation, where we have the ability to use resources that aren’t primarily dedicated for physics computing to deliver cycles to experiment. I'd like to see a cloud, a dynamically evolving environment in our computer center. We’re convinced it's possible, but delivering that is what we’re focusing on.

Gardner: Steve Conway, where do you see things headed next? What are the next steps that we should look for, as we move from that evolutionary progression to more of a revolutionary productivity?

Conway: It's along a couple of dimensions. One is the dimension of people actually working in these environments. In that sense, the CERN-Platform collaboration is going to help drive the whole state of the art forward over the next period of time.

People are a little bit concerned about testing their data there. The evolution of standards is going to accelerate this trend.



The other one, as Randy mentioned before, it that the evolution of standards is going to be important. For example, right now, one of the barriers to public-cloud computing is vendor lock-in, where the cloud, the Amazons, the Yahoos, and so forth are not necessarily interoperable. People are a little bit concerned about testing their data there. The evolution of standards is going to accelerate this trend.

Gardner: Why don’t I give the last word today to Randy? Tell us about some information that's available out there for folks who are looking to explore and take some first steps toward this more revolutionary benefit.

Clark: I'd encourage everybody to visit our website. There are a number of white papers, webinars, and webcasts that we've done with other customers to highlight some other use cases within development, test, and production environments. I'd point people to the resource page on our website www.platform.com.

Gardner: I want to thank our guests. This has been a very interesting discussion, and I certainly look forward to following what CERN does, because I do think that they’re going to be a leader in terms of what many others will be end up doing in B2B cloud computing.

Thank you to Tony Cass, Group Leader for Fabric Infrastructure and Operations at CERN. Thank you, sir.

Cass: Thank you.

Gardner: And also a good, big thank you to Steve Conway, Vice President in the High Performance Computing Group at IDC. Thank you, Steve.

Conway: Thanks.

Gardner: And also, of course, thank you to Randy Clark, Chief Marketing Officer at Platform Computing.

Clark: Thank you for the opportunity.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast on what likely outcomes we can expect from cloud computing and architecture, on the progression from grid to cloud computing, and moving into a more revolutionary set of benefits. Thanks for listening and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: Platform Computing.

Transcript of a BriefingsDirect podcast on the move to cloud computing for data-intensive operations, focusing on the work being done by the European Organization for Nuclear Research. Copyright Interarbor Solutions, LLC, 2005-2010. All rights reserved.

You may also be interested in:

Thursday, August 21, 2008

Pulse Provides Novel Training and Tools Configuration Resource to Aid in Developer Education, Preparedness

Transcript of BriefingsDirect podcast on Java training and education with Genuitec Pulse for Java and Eclipse.

Listen to the podcast. Sponsor: Genuitec.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.

Today, a sponsored podcast discussion about the complexity around Java training and education. The development toolset, the plug-ins, the community are all very fast-moving targets. It's difficult for trainers, educators -- not to mention the students and budding developers themselves -- to get a full grasp of what's expected of them, and then to find resources that are up-to-date and timely.

We're going to be discussing with some experts how better to organize Java training and education. We're going to look at the Eclipse profiles that can be organized and coordinated using Pulse, a Genuitec-organized tools configuration network function. We're going to learn about how organizations can organize their training, so that students can better anticipate what's expected of them in the real world.

To help us understand some of these issues and work toward some solutions, we're joined by Michael Cote, an analyst with RedMonk. Welcome to the show, Michael.

Michael Cote: Hello, everybody.

Gardner: We're also joined by Todd Williams, vice president of technology at Genuitec. Welcome to the show, Todd.

Todd Williams: Thanks very much, Dana.

Gardner: Lastly, we're joined by Ken Kousen, an independent technical trainer and president of Kousen IT, Inc. He's also an adjunct professor at Rensselaer Polytechnic Institute. Welcome, Ken.

Ken Kousen: Hi, Dana, glad to be here.

Gardner: As I mentioned, complexity and moving targets are part of the problem, but it also seems that there is a disconnect between book knowledge or training knowledge that one gathers about development and what happens in the real world, what goes on with code being checked in and checked out, and how teams are organized. It seems difficult for someone to anticipate what's really going happen.

Let's go to Ken first. Ken, what is the gap, from your perspective, between what students and budding developers get through training and in university settings, and then what's often expected of them in the real world?

Kousen: It's interesting. The gap between what's taught in academia and what's taught in the real world is very large, actually. The classes I teach tend to be in a master's level program, and I teach a couple of classes in developing enterprise applications that are specifically constructed to address this gap.

Academia will talk about abstractions of data structures, algorithms, and different techniques for doing things. Then, when people get into the real world, they have no idea what Spring, Hibernate, or any of the other issues really are.

It's also interesting that a lot of developments in this field tend to flow from the working professionals toward academia, rather than the other way around, which is what you would find in engineering, when I used to be in that area.

Gardner: Todd, when you're doing hiring or are doing development for business, and you're talking to your customers -- folks that use MyEclipse and your services for training and consulting -- are you seeing a worsening situation in terms of how to acquire qualified labor, or do people have a pretty good sense of where to go to find good Java developers?

Williams: Finding quality employees is always a challenge, and probably always will be. Part of what I see as being difficult, especially in the Java and Enterprise Java market, is the huge number of technologies that are being employed at different levels. Each company picks its own type of stack.

Ken mentioned Spring and Hibernate. There is also Java transaction API (JTA), Java server faces (JSF), and Struts, Web framework and persistence technologies, and application servers that are in use. Finding employees that fit with what you are trying to do today, with an eye toward being able to mature them into where you are going tomorrow, is probably going to always be the concern.

Gardner: Now, what's been going on with development, not just the function, but teams, the collaboration, agile types of activities, Scrum? It used to be that people could specialize, stay in one little niche, but now the "master of all trades" seems to be more in demand.

Let's go to Michael. Michael, is development fundamentally changing? When we think of developers, do we need to recast how we imagine them or conceive of them?

Cote: Yes. I think it's fair even to go to the extreme and say absolutely. You look at the employment patterns that most developers find themselves in, and they are not really working at some place three, five, ten, even twenty years. It's not realistic. So, specializing in some technology that essentially binds you to a job isn't really an effective way to make sure you can pay your bills for the rest of your life.

You have to be able to pick up quickly any given technology or any stack, whether it’s new or old. Every company has their own stack that they are developing. You also have to remember that there is plenty of old existing software out there that no one really talks about anymore. People need to maintain and take care of it.

So, whether you are learning a new technology or an old technology, the role of the developer now, much more so in the past, is to be more of a generalist who can quickly learn anything without support from their employer.

You're not going to get a lot of slack to learn things in a given time, paid training, and things like that. You're pretty much left on your own, or there are always cheaper alternatives to go to.

So the heat is really on developers to be Type A people who are always seeking out the best option.

Gardner: Alright. Well, now that we have scared anyone from ever wanting to be a developer, Ken, help us get a little bit closer to earth. What can students do, what can professors or instructors do, to help get more of this real-world perspective into what they do in these courses and in this preparation?

Kousen: It's interesting that while the various tools and technologies evolve, some of the basic principles always hold pretty fast. I've taught this class several times and I have to say that every time I've taught it, it's been very, very different, but the overall architectural issues are pretty constant.

Plus, what seems to follow in the industry are various trends, like an increased emphasis on testing, for example, the recent rise in dynamic languages, and things like that. The idea of continually trying to follow what's going on in the marketplace and seeing what's interesting seems to be very helpful.

I also emphasize to the students that a good source of information is to find some of the better open-source projects, and not necessarily join them, use them, or do anything with them, but follow what they do and see those projects as the communal efforts of some of the best developers in the world.

So, if they all say, "Oh yeah, we obviously have to have this source-control mechanism," then maybe that's an interesting thing that should be looked at, or this particular bug reporting tool, or whatever. I often emphasis that particular direction as well.

Gardner: How about that, Todd? Are these open-source communities, these chat rooms, these forums, the real, practical lab that the students and developer should be looking towards?

Williams: I think to a degree that it's certainly a practical lab that students have easy access to. Obviously, in open source, whether it’s something like the Eclipse Foundation, Apache, or what have you, they make a very explicit effort to communicate what they are doing through either bug reports, mail lists, and discussion groups. So, it's an easy way to get involved as just a monitor of what's going on. I think you could learn quite a bit from just seeing how the interactions play out.

That's not exactly the same type of environment they would see inside closed-wall corporate development, simply because the goals are different. Less emphasis is put on external communications and more emphasis is put on getting quality software out the door extremely quickly. But, there are a lot of very good techniques and communication patterns to be learned in the open-source communities.

Gardner: Now, when we go to community, that also means choice, which is a good thing. But, there is also a downside to choice. There are a lot of variables, many different things to look at. Tell us a little bit about the importance of profiling, and when you have got many new plug-ins to choose from, and you've got lots of commentary and social media being generated about what to use and what not to use.

Give us, Todd, if you could, some idea of the problem set that you saw in the marketplace a couple of years ago when you were thinking about Pulse.

Williams: Let me take a step back and quickly explain what Pulse is for those who aren't familiar with it. We built a general-purpose software provisioning system that right now we are targeting at the Eclipse market, specifically Eclipse developers.

For our initial release last November, we focused on providing a simple, intuitive way that you could install, update, and share custom configurations with Eclipse-based tools.

In Pulse 2, which is our current release, we have extended those capabilities to address what we like to call team-synchronization problems. That includes not only customized tool stacks, but also things like workspace project configurations and common preference settings.

Now you can have a team that stays effectively in lock step with both their tools and their workspaces and preferences.

What drove us to build something like this were a number of things. If you look at the Eclipse market, where we have been for a number of years, there are literally thousands of products and plug-ins for Eclipse. If you just want to go out and take a survey of them, or try some of them, it's a very daunting process for most people.

It starts out when you download Eclipse, go find some plug-ins, possibly looking into Eclipse Plug-in Central, find those update sites, type them in, download the plug-ins, and try them. This pattern repeats for quite some time, while the developer goes out and tries to figure out which of the plug-ins are good and which ones aren't.

With Pulse, we put these very popular, well-researched plug-ins into a catalog, so that you can configure these types of tool stacks with drag-and-drop. So, it's very easy to try new things. We also bring in some of the social aspects; pulling in the rankings and descriptions from other sources like Eclipse Plug-in Central and those types of things.

So, within Pulse, you have a very easy way to start out with some base technology stacks for certain kinds of development and you can easily augment them over time and then share them with others.

Gardner: Ken, help us understand how this can be used in the training and/or academic setting? What is it about Pulse that brings in more of the real world, and anticipates what choices developers are going to have once they get into the nitty-gritty of coding?

Kousen: Looking at academic and training settings, they are a little bit different. In a training setting, one of the real challenges the training classes face every time is getting the initial classroom set up correct. That is often very involved and complicated, because a lot of the tools involved are somewhat dependent on each other and dependent on environment variables and things like that.

So, trying to set up standard Pulse configurations and then being able to set up a classroom using those shared deployments is a very interesting opportunity. I haven't had the chance to do it yet, but I have definitely been talking to some training providers about giving that a shot.

I did try it in a classroom, and it's rather interesting, because one of the students that I had recently this year was coming from the Microsoft environment. I get a very common experience with Microsoft people, in that they are always overwhelmed by the fact, as Todd said, there are so many choices for everything. For Microsoft, there is always exactly one choice, and that choice costs $400.

I tried to tell them that here we have many, many choices, and the correct choice, or the most popular choice changes all the time. It can be very time consuming and overwhelming for them to try to decide which ones to use in which circumstances.

So, I set up a couple of configurations that I was able to share with the students. Once they were able to register and download them, they were able to get everything in a self-contained environment.

We found that pretty helpful, although I've got to say that this year the class size was sufficiently small, so that I don't know that we really got the same benefit we would get in a large classroom, where there would be many, many setup issues to deal with.

Gardner: So, almost mimicking a collaboration activity in a development setting, but in the classroom.

Kousen: Exactly.

Gardner: Are there any particular things that you learned from this exercise that those who might be evaluating and thinking about using Pulse could benefit from?

Kousen: It was pretty straightforward for everybody to use. We had to make sure that people using it had fast download speeds, but that had nothing to do with Pulse. That had to do with the size of Eclipse.

Of course, whenever you get students downloading configurations, they have this inevitable urge to start experimenting, trying to add in plug-ins, and replacing things. I did have one case where the configuration got pretty corrupted, not due to anything that they did in Pulse, but because of plug-ins they added externally. We just basically scrapped that one and started over and it came out very nicely. So, that was very helpful in that case.

Gardner: Michael, as you are listening to this, is there anything that jumps out at you in terms of understanding of Eclipse and its popularity, and then dealing with complexity that you could share?

Cote: I like the comparison of the Eclipse development world, versus visual studio, versus getting the one thing, because it is very accurate. That's sort of the ethos of Java -- maximum "choosability," if you will. It's one of these things in development that takes a long time to accept, but having lots of options is often more expensive and burdensome than having fewer options. Now that said, you want to make sure that you have good fewer options.

In every development team I have been involved with in my previous lives, as it were, anytime someone new comes onto the team, it’s always an extremely difficult issue just to get their tool chain setup correctly.

Having something wrong in the tool chain, the shared tools that the whole team uses, can really be quite disruptive. That's because the way that you assume your team members are going about solving problems is slightly wrong, and so they may not have the fully optimized way that your project is based around.

I guess you could call that the commercial application of that tediousness of setting up the configuration in more of an educational or a training environment. It's difficult to just sort of give someone a print out and say, go setup your stuff like this, because you are always missing little bits, and there is a lot of nuance in how things are exactly setup in the tool chains.

Gardner: Back to you, Todd at Genuitec. Have there been any surprises since you brought Pulse to market in how it’s being used? Are there unanticipated consequences that you would like to share -- the good ones anyway?

Williams: It's been interesting. We have seen a good number of people using Pulse, the way we anticipated it, sharing their tool stacks, and publishing them for their teams.

There seems to be a lot of people that use it privately. They don't share it with anyone, but they use it to manage multiple development profiles. So they might do C++ development one day and Java development the next, or what have you, and they like to keep custom tool stacks just for those things.

Even though they are kind of an island, and we made Pulse to share amongst teams, they find a lot of value in it, just to keep everything tidy.

Cote: If I can add to that, I personally haven't seen people using Pulse like this, because I haven't stuck my head in a developer shop when Pulse has been around. We would typically have a problem where -- across different versions of the project you are working on -- you would have your ID or your tools set up differently.

So, if you wanted to very quickly switch between those different versions, for example, to support or do some debugging in an old version, if there was some support issue, that switching cost between the two setups is a big part of going to fix an older version of something.

Nowadays, you have a lot of virtualization, so you can make this step a little easier, but you end up doing absurd things, like just having machines dedicated to specific versions of the software that you are working on.

If you can more easily switch between the profiles and the configurations that you have, then you can hopefully make it easier and less tedious to support these older products that you tend to have a lot of requests to support.

Gardner: Ken, did you see some advice that you might offer to those, either in academia or in the training field, things that they might want to consider as they are evaluating such things as Pulse?

Kousen: I agree with what the others were saying about the idea of setting up a series of alternative profiles that match the environment you are going to be working in.

I realized, as Michael and Todd were saying that, that I actually do that myself. I have a J2EE profile or Java EE profile, and I also have a regular Java profile, when I am working on different things, because there are certain shortcuts that won't conflict with anything in Java EE, if I use it in Java.

Eventually, I hope when you wind up adding Grails support or Groovy and Grails support to Pulse, it will probably have a configuration environment for that as well. The idea of having a variety of profiles that could each be used in its given time is very helpful.

I know that in a training environment we will definitely try to do that. We will be setting up alternative profiles that can be shared in a particular training class.

Academically, I like to leave things a bit more free form, although I agree that the initial setup is very helpful, because if the students don't have any feel for the environment at all, getting them over that initial hurdle is very, very helpful. After that, letting them experiment is always very, very useful. So that's good.

Gardner: Todd, Ken mentioned support for Ruby, dynamic languages, Groovy. Can you tip your hand a little bit and let us know what you've got in mind in that regard?

Williams: Actually, all of those things are in the Pulse catalog right now. Sometimes they are hard to find, because it's kind of big, but we added search to it to help you run them down. But, there are actually multiple Ruby solutions; I know Groovy is in there.

If a particular solution that you like isn't in there though, it's relatively straightforward to add it, not to the catalog, but you can still add it very, very easily to any of your profiles, either locally or shared.

So, the catalog is like a really good starting point that we try to keep up to date with what our users ask us to put into it. On the other hand, if it contains everything in the world, it gets a bit unwieldy as well.

Kousen: Dana, can I comment on that? I did speak very quickly on that issue. There is a Groovy plug-in in there. I was actually very pleased to see that, because I was concerned.

I've been using the Groovy plug-in for a while, and I wasn't sure whether that was going to be in the catalog at all. It did take me a while to find it, because it was filed under an area that I wasn't expecting, but once I put it in the search box, then it showed up immediately.

The only thing about Grails is that there isn't really a dedicated Grails plug-in yet, and the Groovy plug-in is really moving towards something like that. So, when that becomes available, I'm sure it will be incorporated into Pulse.

By the way, another issue that is very useful is that when I am browsing inside Pulse, just looking around to see what sort of components have been added, it's interesting to see what turns out to be popular; things that I hadn't really anticipated.

For example, I have been using Subclipse for the Subversion plug-in for a couple of years now. In browsing into the Version Control category I see that there are various other Subversion plug-ins as well and also others coming down the line. So that was another capability that I didn't anticipate and found rather interesting.

Gardner: Todd, looking forward a little bit, it seems that this profile information, while useful in a tactical sense, might actually have some strategic value too.

I'm thinking about the metadata that might be available through a profile and a definition of what a developer wants to do from an activity or behavioral or a pattern base. Then, applying that to when they do a search; perhaps refining the search based on their profile at that time, or perhaps using the profile in regard to when they do testing, bills, other aspects of lifecycle management for development.

Have you taken this a step further, where we could take these profiles and use them in a more strategic fashion, or is that something you are looking at?

Williams: That's a great question, Dana. Of course, we have a very large product plan for Pulse. We've only had it out since November, but you're right. We do have a lot of profile information, so if we chose to mine that data, we could find some correlations between the tools that people use, like some of the buying websites do.

People who buy this product also like this one, and we could make ad hoc recommendations, for example. It seems like most people that use Subversion also use Ruby or something, and you just point them to new things in the catalog. It's kind of a low-level way to add some value. So there are certainly some things under consideration.

Gardner: Michael, what do you think of that, taking more profile information; metadata about behaviors, uses, pattern of work, and then applying that to some of the other larger lifecycle activities in development?

Cote: Things like that work out really well, when you have the proliferation of choice that we were talking about earlier, where the systems can always be gained and everything.

This thing is a small enough subset that it doesn't happen, but just seeing sheer quantity-wise and rating-wise what people are using, helps you evaluate. I am probably making this figure up, but if there are 10 different unified modeling language (UML) plug-ins for Eclipse, then you need to somehow narrow down to the ones that are going to work out well for you.

The mixture of the fast and best way to get to that is really just to see which one is being used the most, because chances are, if people are still using it actively, its going to be a popular one. People are pretty fast to dump plug-ins that don't work well for them.

There is a place to capture the metadata or the usage data that's going around with things. That's the kind of thing that usually developers only get a chance to figure out when they are face to face with someone at a conference or other sort of events that don't happen as frequently as you might want to, to simply figure out which plug-in you might want to use.

Gardner: Any time you can take personalization information and automate that or refine searches and activities is certainly a productivity improvement, and Pulse really strikes me as setting up the opportunity to do that.

Cote: Absolutely.

Gardner: Alright. Let's stat wrapping up a little bit. Ken, any last thoughts as a technical trainer about where you would like to see this sort of capability go?

Kousen: I'm not exactly sure where I will be able to take advantage of it. Let me rephrase that. I think the current Pulse configuration is already very useful, and I'm not sure what else I need in order to start trying to incorporate it into an environment.

The only other issue that I wind up having in a training environment is setting things like environment variables onto the operating system. If there is some way we can get that into Eclipse for example, or rather into Pulse, rather than having to do it on the operating system itself -- maybe through the tools or whatever -- then that would be helpful. But I don't know. Right now, I think the situation is pretty good. I can't think of anything else concrete that I would want to add right there.

Gardner: Okay. Todd, thoughts about what educators and trainers should be considering as they look at something like Pulse, and how to exploit it and leverage it.

Williams: One thing that came to my mind, from a student's perspective, is the integrated development environments (IDEs) that are available right now; even the various configurations of Eclipse, are really made for professionals. When you take something like MyEclipse, there is just so much in it.

We need the ability to actually strip down the IDE to only what is needed for a particular exercise. For example, you could set up a profile for the first exercise of the class with just a limited set of tools that a new student would need to get their hands on. It limits the confusion factor. When you do the next exercise, you could easily update the profile; add a few additional tools to it.

So, you have kind of a selected discovery of additional tools and capabilities that coincide with the level of expertise the students are developing, as they are going up the learning curve in a particular course. I was just wondering. Is that the kind of thing that now we have enabled through having a technology like Pulse, that makes delivery of that straightforward, versus what had to be done before.

Gardner: Just for those interested in perhaps getting started, Pulse uses its network. How do people access that, how do they find it, how do you get started?

Williams: Sure. The Pulse website is www.poweredbypulse.com. There is a little 5 MB installer that you download and start running. If anyone is out in academia, and they want to use Pulse in a setting for a course, please fill out the contact page on the Website. Let us know, and we will be glad to help you with that. We really want to see usage in academia grow. We think it’s very useful. It's a free service, so please let us know, and we will be glad to help.

Gardner: Terrific. I want to thank our panelists for helping us dig a little bit into training issues, and some of the solutions that are welling up in the market to address them. We have been talking with Michael Cote, he is an analyst at RedMonk. Thank you Michael.

Cote: Absolutely.

Gardner: Todd Williams, vice president of technology, Genuitec. Appreciate your input, Todd.

Williams: Thanks, Dana, I have enjoyed it.

Gardner: Ken Kousen, independent technical trainer, president of Kousen IT, Inc., and adjunct professor at Rensselaer. Appreciate your experience and input, Ken.

Kousen: Oh, you are welcome, no problem.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You have been listening to a sponsored BriefingsDirect podcast. Thanks for listening, and come back next time.

Listen to the podcast. Sponsor: Genuitec.

Transcript of BriefingsDirect podcast on Java training and education with Genuitec Pulse for Java and Eclipse. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.

Thursday, January 17, 2008

Enterprises Seek Ways to Exploit Web Application Mashups and Lightweight Data Presentation Techniques

Transcript of BriefingsDirect podcast on data mashups with IBM and Kapow.

Listen to the podcast here. Sponsor: Kapow Technologies.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect. Today, a sponsored podcast discussion about the state of choice in the modern enterprise around development and deployment technologies.

These days, developers, architects and even line-of-business managers have many choices. This includes things like Web applications, software-as-a-service (SaaS), Services Oriented Architecture (SOA), RESTful applications, mashups, pure services off the Web, and pure services from within an Intranet or even the extended enterprise. We’re talking about RSS and Atom feeds, and, of course, there is a traditional .NET and Java going on.

We also see people experimenting with Ruby and a lot of use around PHP and scripting. The good news is that there are a lot of choices. The bad news is also that there are a lot of choices.

Some of these activities are taking place outside the purview of IT managers. People are being innovative and creative, which is good, but perhaps not always in the way that IT would like in terms of security and access control. These newer activities may not align with some of the larger activities that IT needs to manage -- which many times these days includes consolidation, unification, and modernization of legacy applications.

To help us weed through some of the agony and ecstasy of the choices facing application development and deployment in the enterprise, we have on the call, Rod Smith. Rod is Vice President of Internet Emerging Technologies at IBM. Welcome to the show, Rod.

Rod Smith: Thank you very much. It’s nice to be here.

Gardner: We also have Stefan Andreasen, the Founder and CTO of Kapow Technologies. Welcome to the show, Stefan.

Stefan Andreasen: Thank you.

Gardner: Let’s go first to Rod. We spoke last spring about these choices and how there are, in effect, myriad cultures that are now involved with development. In past years, development was more in a closed environment, where people were under control … white coats, raised floors, and glass rooms come to mind. But now it’s more like the Wild West. What have you been finding in the field, and do you see this as chaos or opportunity?

Smith: A little of both. In times of innovation you get some definite chaos coming through, but IT, in particular, and line of businesses see this as a big opportunity. Because of SOA and the other technologies you mentioned, information is available, and line of business is very interested in capturing new business opportunities.

Time to market is getting shorter, and getting squeezed all the time. So you’re seeing line of business and IT coming together around what they have to do to drive more innovation and move it up a couple of notches, from a business perspective.

Open standards now are very important to IT. Line of business, with mashups in particular, can use those types of services to get the information and create solutions they couldn’t do in the labs, when the propeller heads and others had to be involved five or 10 years ago.

Gardner: So we have dual or maybe multiple tracks going on. I suppose what’s missing is methodological and technical management. That’s an area where IBM has been involved for some time. Does IBM look at this as an opportunity?

Smith: A big opportunity. And you hit it on the head. The methodology here is very different from the development methodology we’ve been brought up to do. It’s much more collaborative, if you’re line of business, and it’s much more than a set of specifications.

Here is where we’re seeing people talk about building mashups. Usually they have a really good idea that comes to mind or something that they think will help with a new business opportunity.

Often the second question -- and we’ve seen a pattern with this -- is “Where is the data? How do we get to the data? Can IT open it up for us? Do line-of-business people have it in spreadsheets?” Typically, when it’s valuable to the business, they want to catalog it and put it together, so other people can share it. Finally, they do a mashup.

So methodology is one of the things we call a self-service business pattern. It starts with the idea, from a developer standpoint. "I really need to understand the business. I need to understand the time to market and the partnerships, and how information can be exposed." Then, they get down into some of the details. "I've got to do it quickly."

What we are seeing from an opportunity standpoint is that many businesses, when they see an opportunity, want a vendor to respond in 30 days or less, [and do more] within six months down the road. So that’s a challenge, and it is an opportunity. We think about tooling and middleware and services. How can we help the customer?

Gardner: Let’s go to Stefan. When you see these activities in the enterprise around mashups, SOAP, REST, HTML and XML, there’s an opportunity for bridging the chaos, but I suppose there’s also a whole new type of development around situational applications.

That is to say that, an opportunity exists to access content that hadn’t really been brought into an application development activity in the past. Can you tell us a little bit about what you’re seeing in the enterprise and how these new types of development are manifesting themselves?

Andreasen: Let me comment on the chaos thing a little bit. It’s important to understand the history here. At first, central IT worked with all their big systems. Line of business really didn’t have any access to IT or tools themselves, until recently when they got desktop tools like Excel.

This current wave is really driven by line of business getting IT in their own hands. They’ve started using it, and that’s created the chaos, but chaos is created because there is a need.

Now, with the Web 2.0 and the mashup wave, there is an acknowledgement of a big need here, as Rod also said. So it’s necessary to understand why this is happening and why it is something that’s very important.

Gardner: These end-users, power users, these line of business folks, they’ve been using whatever tools have been available to them, even if it’s an Excel spreadsheet. I suppose that gives them some productivity, but it also leaves these assets, data and content, lying around on hard drives in a fairly unmanaged perspective.

Can we knock two birds down with one stone in terms of managing this chaos in terms of the data, but also bring together some interface and application development benefits?

Andreasen: The worst thing would be to shut it down, of course. The best thing that’s happening now is acknowledging that line-of-business people need to do their own thing. We need to give them the tools, environments and infrastructure so they can do it in a controlled way -- in an acceptable, secured way -- so that your laptop with all of your customer data doesn't get stolen at the airport.

When we talk about customer data, we leap back to your earlier question about data. What are line-of-business people working with? Well, they’re working with data, analyzing data, and finding intelligence in that data, drawing conclusions out of the data, or inventing new products with the data. So the center of the universe here for this IT work is really dealing with data.

Gardner: SOA is one of the things that sits in the middle between the traditional IT approaches and IT development and then these newer activities around data, access, and UIs and using Web protocols.

I wonder if you think that that’s where these things meet. Is there a way to use an enterprise service bus (ESB) for checking in and out of provisioned or governed services? Is there a way that mashups and the ERP applications meet up?

Smith: The answer is yes. Without SOA we probably wouldn't have gotten to a place where we can think about mashable content or remixable content.

What you are seeing from customers is the need to take internal information and transform it into XML or RESTful services. There’s a good match between ESB things … [and] thinking about security and other pieces of it, and then building the Rich Internet Application (RIA) type of applications.

The part you touched on before is interesting, too. And I think Stefan would agree with me. One thing we learned as we opened up this content is that it isn't just about IT managing or controlling it. It’s really a partnership now.

One thing Stefan has with Kapow that really got us talking early was the fact that for Stefan’s content they have a freshness style. We found that same thing is very important. The line of business wants to be involved when information is available and published. That’s a very different blending of responsibility than we've seen before on this.

So thinking forward you can imagine that while you are publishing this, you might be putting it into a catalog repository or into services. But it also has to available for line of business now to be able to look at those assets and work with IT on when they should be available to business partners, customers and others.

Gardner: It’s interesting you mentioned the word "publish," and it’s almost as if we are interchanging the words "publishing" and "application development" in the sense that they are munging or overlapping.

Does that fit with what Kapow has been seeing, Stefan, that publishing and syndication are now a function of application development?

Andreasen: There are several sides to this question of which data you need, how to access it, how it is published, etc. One thing you are talking about is line of business publishing their data so other people can use it.

I split data into several groups. One is what I call core data, the data that is generally available to everybody in the company and probably sits in your big systems. It’s something everybody has. It’s probably something that's service-oriented or is going to be very soon.

Then there is the more specialized data that’s sitting out in line of business. There's a tendency now to publish those in standard formats like RSS, RESTful services, etc.

There's is a third group, which I call intelligence data. That's hard to find, but gives you that extra insight, extra intelligence, to let you draw a conclusion which is different from -- and hopefully better than -- your competitors’.

That’s data that’s probably not accessible in any standard way, but will be accessible on the Web in a browser. This is exactly what our product does. It allows you to turn any Web-based data into standard format, so you can access what I call intelligence data in a standard fashion.

Gardner: This is the type of data that had not been brought into use with applications in the past?

Andreasen: That is correct. There is a lot of information that’s out there, both on the public Web and on the private Web, which is really meant to be human-readable information. You can just think about something as simple as going to U.S. Geological Service and looking at fault lines of earthquakes and there isn't any programmatic API to access this data.

This kind of data might be very important. If I am building a factory in an earthquake area, I don’t want to buy a lot that is right on the top of a fault line. So I can turn this data into a standard API, and then use that as part of my intelligence to find the best property for my new factory.

Smith: When we talk of line of business, it’s just not internal information they want. It's external information, and we really are empowering these content developers now. The types of applications that people are putting together are much more like dashboards of information, both internally and externally over the Internet, that businesses use to really drive their business. Before, the access costs were high.

Now the access costs are continuing to drop very low, and people do say, "Let’s go ahead and publish this information, so it can be consumed and remixed by business partners and others,” rather than thinking about just a set of APIs at a low level, like we did in the past with Java.

Gardner: How do we bring these differing orbits into alignment? We've got people who are focused on content and the human knowledge dimension -- recognizing that more and more great information is being made available openly through the Web.

At the same time, we have this group that is API-minded. I guess we need to find a way of bringing an API to those folks who need that sort of interface to work with this data, but we also need for these people to take this data and make it available in such a way that a developer might agree with it or use it.

How does Kapow work between these constituencies and make one amenable to the other? We're looking for a way to bind together traditional IT development with some of these “mashupable” services, be it internal content or data or external services off of the Web.

I wonder what Kapow brings to the table in terms of helping these two different types of data and content to come together -- APIs versus content?

Andreasen: If you want to have automatic access to data or content, you need to be able to access it in a standard way. What is happening now with Web Oriented Architecture (WOA) is that we're focusing on a few standard formats like RESTful services and on feeds like RSS and Atom.

So first you need to be able to access your data that way. This is exactly what we do. Our customers turn data they work with in an application into these standard APIs and feeds, so they can work with them in an automated way.

It hadn’t been so much of a problem earlier, maybe because there wasn’t so much data, and people could basically cut and paste the data. But with the explosion of information out there, there's a realization that having the right data at the right time is getting more and more important. There is a huge need for getting access in an automated way.

How do line-of-business people work with the data? Well, they work with the data in the application interface. What if the application interface today is your browser?

Kapow allows the line-of-business people to automatically access data the way they worked with it in their Web browser.

That’s a very powerful way of accessing data, because you don't have to have an extra level of IT personnel. You don't have to first explain, "Well, this is the data I need. Go find it for me." And then, maybe you get the wrong data. Now, you are actually getting the data that you see the way you want.

Gardner: Another aspect to this is the popularity of social networking and what's known as the "wisdom of crowds" and wikis. A lot of contributions can be brought into play with this sort of gray area between content and publishing, different content feeds, and exposure and access and the traditional IT function.

Wikis have come into play, and they have quite a bit of exposure. Maybe you have a sense of how these worlds can be bridged, using some of what's been known as social networking?

Smith: Software development now is much more of a social networking activity than an engineering activity. At IBM, we have Blog and Wiki Central, where people use wikis to get their thoughts down and collectively bring an idea about.

Also at IBM, we have Innovation Jam, which we hold every year, and which brings in hundreds of thousands of people now. It used to be just IBM, but we’ve opened it up this last year to everyone, friends and family alike, to come up with ideas.

That part is great on the front end. You then can have a much better idea of what the expectations are, and what a user group wants. They're usually very motivated to stay in the loop to give you feedback as you do development.

The big part here is when it comes to doing mashups. It's the idea that you can produce something relatively quickly. With IBM’s QEDWiki, we like the idea that someone could assemble an application, wire it together in the browser, and it has the wiki characteristics. That is, it's stored on the server, it’s versioned as to enterprise characteristics, and it’s sharable.

It’s a key aspect that it has to be immediately deployable and immediately accessible by the folks that you are networking with.

That relates to what Stefan was saying and what you were asking about on how to bridge the two worlds of APIs and content. We're seeing now that as you think about the social networking side, people want the apps built into dashboards.

The more forward-thinking people in IT departments realize that the faster they can put together publishable data content, they can get a deeper understanding in a very short time about what their customers want.

They can then go back and decide the best way to open up that data. Is it through syndication feeds, XML, or programmatic API? Before, IT had to guess usage and how many folks might be touching it, and then build it once and make it scalable.

We’re doing things much more Agile-wise and building it that way, and then, as a flip, building the app that’s probably 80 percent there. Then IT can figure out how they could open up the right interfaces and content to make it available broadly.

Gardner: Stefan, could you give us some examples of user scenarios, where Kapow has been brought in and has helped mitigate some of the issues of access to content and then made it available to traditional development? Is there a way for those folks who are perhaps SOA-minded, to become a bit more open to what some people refer to as Web-Oriented Architecture?

Andreasen: One example that was mentioned in The Wall Street Journal recently in an article on mashups. It was on Audi in Germany. They are using our product to allow line of business to repurpose existing Intranets.

Let’s say that a group of people want to take what’s already there, but tweak it, combine it, and maybe expose it as a mobile application. With our tool, they can now do that in a self-service way, and then, of course, they can share that. What’s important is that they published this mini-mashup into their WebSphere portal and shared it with other people.

Some of them might just be for individual use. One important thing about a mashup is that an individual often creates it. Then it either stops there, because only that individual needs it – or it can also grow into company-wide use and eventually be taken over by central IT, as a great new way to improve performance in the entire company. So that shows one of the benefits.

Other examples have a lot to do with external data -- for example, in pricing comparisons. Let’s say I'm an online retailer and suddenly Amazon enters the market and starts taking a lot of market share, and I really don’t understand why. You can use our product to go out and harvest, let’s say, all the data from digital cameras from Amazon and from your own website.

You can quickly find out that whenever I have the lowest price, my product is out of stock -- and whenever I have a price that's too high, I don’t sell anything. Being able to constantly monitor that and optimize my prices is another example.

Another very interesting piece of information you can get is vendor pricing. You can know your own profit margin. Maybe it’s very low on Nikon cameras. You see that eBay is offering the Nikon cameras below even your cost as the vendor. You know for sure that buyers are getting a better deal with Nikon than you can offer. I call this using data to create intelligence and improve your business.

Gardner: All this real-time, updated content and data on the Web can be brought into many aspects of what enterprises do -- business processes, purchasing, evaluation, and research.

I suppose a small amount of effort from a mashup could end up saving a significant amount of money, because you’re bringing real-time information to those people making decisions.

How about you on your side, Rod? Any examples of how these two worlds -- the peanut butter and chocolate, if you will -- come together for a little better snack?

Smith: I’ll give you a good one. It’s an interesting one we did as a technology preview with Reuters and AccuWeather. Think about this again from the business perspective, where two business folks met at a conference and chit-chatted a bit.

AccuWeather was talking about how they offer different types of services, and the Reuters CTO said, "You know, we have this commodity-shipping dashboard, and folks can watch the cargo go from one place to another. It’s odd that we don’t have any weather information in there.” And the question came up very quickly: "I wonder how hard it would be to mash in some weather information."

We took one of their folks, one of mine, and the person from AccuWeather. They sat down over about three or four hours, figured out the scenario that Reuters was interested in and where the data came from, and they put it together. It took them about two weeks, but altogether 17 hours -- and that’s over a beer.

So it was chocolate and nuts and beer. I was in pretty good shape at that point. The interesting thing came after that. When we showed it to Reuters, they were very thrilled with the idea that you have that re-mixibility of content. They said that weather probably would be interesting, but piracy is a lot more interesting. "And, by the way" -- and this is from the line of business person -- "I know where to get that information."

Gardner: Now when you say "piracy," you mean the high seas and the Jolly Roger flying up on the mast -- that kind of thing?

Smith: That’s it. I didn’t even know it existed anymore. In 2006, there were 6,000 piracy events.

Gardner: Hijackings at sea?

Smith: Yes.

Gardner: Wow!

Smith: I had no idea. It turned out that the information was a syndication feed. So we pulled it in and could put it on a map, so you could look at the different events.

It took about two hours, but that’s that kind of dynamic now. The line-of-business person says, "Boy, if that only took you that much time, then I have a lot of ideas, which I’ve really not talked about before. I always knew that if I mentioned one more feature or function, IT would tell me, it takes six more months to do."

We've seen a huge flip now. Work is commensurate with some results that come quickly. Now we will see more collaboration coming from IT on information and partnerships.

Gardner: This networking-collaboration or social interaction is really what’s crafting the new level of requirements. Instead of getting in line behind 18 six-month projects, 12 to 20 hours can be devoted by people who are perhaps on the periphery of IT.

They're still under the auspices of what’s condoned under IT and make these mashups happen, so that it’s users close to the issues, close to where the creativity can begin that create a requirement, and then binds these two worlds together.

Smith: That’s correct, and what is interesting about it is, if you think about what I just described -- where we mashed in some data with AccuWeather -- if that had been an old SOA project of nine or 18 months, that would have been a significant investment for us, and would have been hard to justify.

Now, if that takes a couple of weeks and hours to do -- even if it fails or doesn’t hit the right spot -- it was a great tool for learning what the other requirements were, and other things that we try as a business.

That’s what a lot of this Web 2.0 and mashups are about -- new avenues for communication, where you can be engaged and you can look at information and how you can put things together. And it has the right costs associated with it -- inexpensive.

If I were going to sum up a lot of Web 2.0 and mashups, the magnitude of drop in “customization cost” is phenomenal.

Gardner: And that spells high return on value, right?

Smith: That’s right.

Gardner: How do you see this panning out in the future? Let’s look in our crystal ball. How do you see this ability of taking intelligence, as you call it, around the content, and then the line-of-business people coming in and making decisions about requirements, and how they want to tune or see the freshness of the content?

What’s going to happen in two or three years, now that we are bringing these things together?

Andreasen: There will be a lot more of what Rod just described. What Rod just mentioned is an early move, and a lot of companies aren't even thinking along these lines yet. Over the next one or two years, more people will realize the opportunity and the possibility here, and start doing it more. Eventually, it’s going to explode.

People will realize that getting the right data and the right content at the right time, and using that to create more intelligence is one thing. The other thing they’ll realize is that by networking with peers and colleagues, they'll get ideas and references to new data. All of these aspects -- the social aspects, the data aspect and the mashup aspect -- will be much more realized. I think it’s going to explode in usage.

Gardner: Any last thoughts, Rod, from where you see these things going?

Smith: Well, as we see in other technologies moving through from an SOA perspective, this is a great deal about cultural change within companies, and the technology barriers are coming down dramatically.

You don’t have to be a Java expert or a C# expert. I'm scary enough to be able to put together or find solutions for my own needs. It’s creating a way that line-of-business people are empowered and they can see business results quickly.

That also helps IT, because if the line of business is happy, then IT can justify the necessary middleware. That’s a fundamental shift. It's no longer an IT world, where they can promise a solution to the line of business 12 to 18 months down the road.

It’s much more of, "Show me something quickly. When I’ve got the results in my hand -- the dashboard -- then you can explain what I need to do for IT investments and other things."

It’s more collaboration at that point, and makes a lot of sense on governance, security, and other things. I can see the value of my app, and I can actually start using that to bring value to my company.

Gardner: I suppose another important aspect culturally is that part of SOA’s value is around reuse. These mashups and using this content in association with other different activities, in a sense promotes the notion of reuse.

You're thinking about, "How can I reuse this mashup? How can I extend this content, either off the Web or internally, into new activities?" That, in a sense, greases the skids toward more SOA, which I think is probably where IT is going to be heading anyway.

Smith: Well, what’s fun about this, and I think Stefan will agree, is that when I go to a customer, I don’t take PowerPoint charts anymore. I look on their website and I see if they have some syndication feeds or some REST interfaces or something.

Then I look around and I see if I can create a mashup of their material with other material that hadn’t been built with before. That’s compelling.

People look and they start to get excited because, as you just said, they see business patterns in that. "If you could do that, could you grab this other information from so-and-so?"

It’s almost like a jam session at that point, where people come up with ideas. That’s where we will see more of these examples. Actually, a lot of our stuff is on YouTube, where we had a retail store that wanted to see their stores on Google Maps and then see the weather, because weather is an important factor in terms of their businesses.

In fact, it’s one of the most important factors. What we didn’t realize is that very simple pattern -- from a technology standpoint it didn’t take much -- held up over and over again. If it wasn’t a store, it was banking location. If it wasn’t banking locations, it was ships. There were combinations in here that you could talk to your businessperson about.

Then you could say to the technologist or a developer, "What do I have to do to help them achieve that?" They don’t have to learn XML, Web objects, or anything else, because you have these SOA interfaces. It helps IT expand that whole nature of SOA into their enterprise.

Andreasen: One thing that's going to happen is that line-of-business people are getting a lot of great ideas. If I am working with business problems, I constantly get ideas about how to solve things. Usually, I just brush it away and say, "Well, it will be cool to have this, but it’s impossible."

They just don’t understand that the time from idea to implementation is dramatically going to go down. When they start realizing this, there is hidden potential out on the edge of the business that will now be cut loose and create a lot of value. It’s going to be extremely interesting to see.

Smith: One of the insights we have from customers is that mashups and this type of technology help them to visualize their SOA investments. You can’t see middleware. Your IT shop tells you what’s good, they tell you they get flexibility, but they want to be shown results -- and mashups help do that.

The second part is people say it completes the "last mile" for SOA. It starts to make a lot of sense for your IT shop to be able to show how the middleware can be used in ways it wasn’t necessarily planned for.

The big comment we hear is, "I want my content to be mashable or re-mixable." We figured out that it’s very much a SOA value. They want things to be used in ways they weren't planned for originally. Show me that aggressive new business opportunity, and you make me a very happy person.

Andreasen: Probably one thing we will see in companies is some resistance from the technologists, from central IT, because they are afraid they will lose control. They are afraid of the security issues etc., but it will probably be like what we've seen with company wikis.

They're coming in the back door in line of business and eventually the companies buy the company-wide wiki. I think we'll see the same thing with mashups. It will be starting out in line of business, and eventually the whole company understands, "Well, we have to have infrastructure that solves this problem in a controlled way."

Some companies have very strict policy today. They don’t even allow their line-of-business pros to write macros in Excel. Those companies are probably the ones that will be the last ones discovering the huge potential in mashups.

I really hope they also start opening their eyes that there are other roles for IT, rather than just the big, central system that run your business.

Gardner: Well, great -- thanks very much for your insights. This has really helped me understand better how these things relate and really what the payoff is. It sounds compelling from the examples that you provided.

To help us understand how enterprises are using Web applications, mashups, and lightweight data presentation, we’ve been chatting today with Rod Smith, Vice President of Internet Emerging Technologies at IBM. I really appreciate your time, Rod.

Smith: Thank you.

Gardner: And Stefan Andreasen, the Founder and CTO of Kapow Technologies. Thanks for joining, Stefan.

Andreasen: It’s been a pleasure, Dana.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions, and you've been listening to a BriefingsDirect. Thanks for listening and come back next time.

Listen to the podcast here. Sponsor: Kapow Technologies.

Transcript of BriefingsDirect podcast on data mashups with IBM and Kapow. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.