Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Kapow Technologies.
Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.
Today we present a sponsored podcast discussion on how text-based content and information
 from across web properties and activities are growing in importance to businesses. The need to analyze web-based text in real-time is rising to where structured data was in importance  just several years ago.
from across web properties and activities are growing in importance to businesses. The need to analyze web-based text in real-time is rising to where structured data was in importance  just several years ago.Indeed, for businesses looking to do even more commerce and community building across the Web, text access and analytics forms a new mother lode of valuable insights to mine.
In Part 1 of our series on web data services with Kapow Technologies, we discussed how external data has grown in both volume and importance across the Internet, social networks, portals, and applications.
As the recession forces the need to identify and evaluate new revenue sources, businesses need to capture such web data services for their business intelligence (BI) to work better, deeper, and faster.
In Part 2, we dug even deeper into how to make the most of web data services for BI, along with the need to share those web data services inferences quickly and easily.
Now, in this podcast, Part 3 of the series, we discuss how an ecology of providers and a variety of content and data types come together in several use-case scenarios. We look specifically at how near real-time text analytics fills out a framework of web data services that can form a whole greater than the sum of the parts, and this brings about a whole new generation of BI benefits and payoffs.
Here to help explain the benefits of text analytics and their context in web data services, is Seth Grimes, principal consultant at Alta Plana Corp. Thanks for joining, Seth.
Seth Grimes: Thank you, Dana.
Gardner: We're also joined by Stefan Andreasen, co-founder and chief technology officer at Kapow Technologies. Welcome, Stefan.
Stefan Andreasen: Thank you, Dana.
Gardner: We have heard about text analytics for some time, but for many people it's been a bit complex, unwieldy, and difficult to manage in terms of volume and getting to this level of a "noise-free" text-based analytic form. Something is emerging that you can actually work with, and has now become quite important.
Let's go to you first, Seth. Tell us about this concept of noise free. What do we need to do to make text that's coming across the Web in sort of a fire hose something we can actually work with?
Difficult concept
Grimes: Dana, noise free is an interesting concept and a difficult concept, when you're dealing
 with text, because text is just a form of human communication. Whether it's written materials or spoken materials that have been transcribed into text, human communications are incredibly chaotic.
with text, because text is just a form of human communication. Whether it's written materials or spoken materials that have been transcribed into text, human communications are incredibly chaotic.We have all kinds of irregularities in the way that we speak -- grammar, spelling, syntax. Putting aside any kind of irregularities, we have slang, sarcasm, abbreviations, and misspellings. Human communications are chaotic and they are full of "noise." So really getting to something that's noise-free is very ambitious.
I'm going to tell you straightforwardly, it's not possible with text analytics, if you are dealing with anything resembling the normal kinds of communications that you have with people. That's not to say that you can't aspire to a very high level of accuracy to getting the most out of the textual information that's available to you in your enterprise.
It's become an imperative to try to deal with the great volume of text -- the fire hose, as you said -- of information that's coming out. And, it's coming out in many, many different languages, not just in English, but in other languages. It's coming out 24 hours a day, 7 days a week -- not only when your business analysts are working during your business day. People are posting stuff on the web at all hours. They are sending email at all hours.
If you want to keep up, if you want to do what business analysts have been referring to as a 360-degree analysis of information, you've got to have automated technologies to do it.
Then, the volume of information that's coming out is huge. There are hundreds of millions of people worldwide who are on the Internet, using email, and so on. There are probably even more people who are using cell phones, text messaging, and other forms of communication.
If you want to keep up, if you want to do what business analysts have been referring to as a 360-degree analysis of information, you've got to have automated technologies to do it. You simply can't cope with the flood of information without them.
That's an experience that we went through in the last decades with transactional information from businesses. In order to apply BI or to get BI out of them, you have to apply automated methods with specialized software.
Fortunately, the software is now up to the job in the text analytics world. It's up to the job of making sense of the huge flood of information from all kinds of diverse sources, high volume, 24 hours a day. We're in a good place nowadays to try to make something of it with these technologies.
Gardner: Of course, we're seeing the mainstream media starts behaving more like bloggers and social media producers. We're starting to see that when events happen around the world, the first real credible information about them isn't necessarily from news organizations, but from witnesses. They might be texting. They might be using Twitter. It seems that if you want to get real-time information about what's going on, you need to be able to access those sorts of channels.
Text analytics
Grimes: That's a great point Dana, and it helps introduce the idea of the many different
 use-cases for text analytics. This is not only on the Web, but within the enterprise as well, and crossing the boundary between the Web and the inside of the enterprise.
use-cases for text analytics. This is not only on the Web, but within the enterprise as well, and crossing the boundary between the Web and the inside of the enterprise.Those use-cases can be the early warning of a Swine flu epidemic or other medical issues. You can be sure that there is text analytics going on with Twitter and other instant messaging streams and forums to try to detect what's going on.
You even have Google applying this kind of technology to look at the pattern of the searches that people are putting in. If people are searching on a particular medical issue centered in a particular geographic location, that's a good indicator that there's something unusual going on there.
It's not just medical cases. You also have brand and reputation management. If someone has started posting something very negative about your company or your products, then you want to detect that really quickly. You want early warning, so that you can react to it really quickly.
We have some great challenges out there, but . . . we have some great technologies to respond to those challenges.
We have a great use case in the intelligence world. That's one of the earliest adopters of text analytics technology. The idea is that if you are going to do something to prevent a terrorist attack, you need to detect and respond to the signals that are out there, that something is pending really quickly, and you have to have a high degree of certainty that you're looking at the right thing and that you're going to react appropriately.
We have some great challenges out there, but, as I said, we have some great technologies to respond to those challenges in a whole variety of business, government, and other applications.
Gardner: Stefan, I think there are very few people who argue with the fact that there is great information out there on the Web, across these different new channels that have become so prominent, but making that something that you can use is a far different proposition. Seth has been telling us about automated tools. Tell us what you see in terms of web data services and how we can make this information available to automated system.
Deep data
Andreasen: Thank you Dana. Let's just look at something like Google. You go there and do a
 search, and you think that you're searching the entire Internet. But, you're not, because you're probably not going to access data that's hidden behind logins, behind search forms, and so on.
search, and you think that you're searching the entire Internet. But, you're not, because you're probably not going to access data that's hidden behind logins, behind search forms, and so on.There is a huge amount of what I call "deep web," very valuable information that you have to get to in some other way. That's where we come in and allow you to build robots that can go to the deep web and extract information.
I'd also like to talk a little bit more about the noise-free thing and go to the Google example. Let's say you go to Google and you search for "IBM software." You think that you will be getting an article that has something to do with IBM software.
You often actually find an article that has nothing to do with IBM software, but, because there are some advertisements from IBM, IBM was a hit. There is some other place that links to software, and you will find software. Basically, end up in something completely irrelevant.
Eliminating noise is getting rid of all this stuff around the article that is really irrelevant, so you get better results.
The other thing around noise-free is the structure. It would be great if you could say, "I want to search an article about IBM software which was dated after Oct. 7," or whatever, but that means you also need to have that additional structured information in it.
It's very important to have tools that can . . . understand where the content is within a page and what's the navigation on that page.
The key here is to get noise-free data and to get full data. It's not only to go to the deep web, but also get access to the data in a noise-free way, and in at least a semi-structured way, so that you can do better text analysis, because text analysis is extremely dependent on the quality of data.
Grimes: I have to agree with you there, Stefan. It's very important to have tools that can strip away not only the ads, but understand where the content is within a page and what's the navigation on that page.
We might not be interested in navigation elements, the fluff that's on a page. We want to focus on the content. In addition, nowadays on the Web, there's a big problem of duplication of material that's been hosted in multiple sites. If you're dealing with email or forums, then people typically quote previous items in their reprise, and you want to detect and strip that kind of stuff away and focus on the real relevant content. That is definitely part of the noise-free equation, getting to the authentic content.
Gardner: Stefan, you refer to the deep web. I imagine this also has a role, when it comes to organizations trying to uncover information inside of their firewalls, perhaps among their many employees and all the different tools that they're using. We used to call it the intranet, but is there an intranet effect here for this ability to gather noise-free text information that we can then start processing?
Extended intranet
Andreasen: Absolutely. I'd even say the extended intranet. If we're looking at a web browser, which is the way that most business analysts or other persons today are accessing business applications, we're accessing three different kinds of applications.
One involves applications inside the firewall. It could be the corporate intranet, etc. Then there are applications where you have to use a login, and this can be your partners. You're logging in to your supplier to see if some item is in stock. Or, it can be some federal reporting site or something.
The sites behind the login are like the extended enterprise. Then, of course, there is everything out of the World Wide Web -- more than 150 million web pages out there -- which have all kinds of data, and a lot of that is behind search forms, and so on.
Gardner: Seth, as a consultant and analyst, you've been focused on text analytics for some time, but perhaps a number of our listeners aren't that familiar with it. Could you maybe give us a brief primer on what it is that happens when you identify some information -- be it Internet, extended web, deep web? How do you go through some basic steps to analyze, cleanse, and then put data into a form that you can then start working with?
Grimes: Dana, I'm going to first give you an extremely short history lesson, a little factoid for you. Text analytics actually predates BI. The basic approaches to analyzing textual sources were defined in the late '50s. Actually, there is a paper from an IBM researcher from 1958, that defines BI as the analysis of textual sources.
People apply so-called machine-learning technologies in order to improve the accuracy of what they are doing.
What happened is that enterprises computerized their operations, their accounting, their sales, all of that in the 1960s. That numerical data from transactional systems is readily analyzable, where text is much more difficult to analyze. But, now we have come to the point, as I said earlier, where there is software and great methods for analyzing text.
What do they do? The front-end of any text analysis system is going to be information retrieval. Information retrieval is a fancy, academic type of term, meaning essentially the same thing as search. We want to take a subset of all of the information that's out there in the so-called digital universe and bring in only what's relevant to our business problems at hand. Having the infrastructure in place to do that is a very important aspect here.
Once we have that information in hand, we want to analyze it. We want to do what's called information extraction, entity extraction. We want to identify the names of people, geographical location, companies, products, and so on. We want to look for pattern-based entities like dates, telephone numbers, addresses. And, we want to be able to extract that information from the textual sources.
In order to do that, people usually apply a combination of statistical and linguistic methods. They look for language patterns in the text. They look for statistics like the co-occurrence of words in multiple text. When two words appear next to each other or close to each other in many different documents -- that can be web pages or other documents -- that indicates the degree of relationship. People apply so-called machine-learning technologies in order to improve the accuracy of what they are doing.
Suitable technologies
All of this sounds very scientific and perhaps abstruse -- and it is. But, the good message here is one that I have said already. There are now very good technologies that are suitable for use by business analysts, by people who aren't wearing those white lab coats and all of that kind of stuff. The technologies that are available now focus on usability by people who have business problems to solve and who are not going to spend the time learning the complexities of the algorithms that underlie them.
So, we're at the point now where you can even treat some of these technologies as black boxes. They just work. They produce the results that you need in the form that you need them. That can be in a form that extracts the information into databases, where you can do the same kind of BI that you have been used to for the last 20 years or so with BI tools.
It can be visualizations that allow you to see the interrelationships among the people, the companies, and the products that are identified in the text. If you're working in law enforcement or intelligence, that could be interrelationships among individuals, organizations, and incidents of various types. We have visualization technologies and BI technologies that work on top of this.
Then, we have one other really nice thing that's coming on the horizon, which is semantic web technology -- the ability to use text analytics to support building a web of data that can be queried and navigated by automated software tools. That makes it even easier for individuals to carry out everyday business and personal problems for that matter.
Obviously, any BI or any text analysis is no better than the data source behind it.
Gardner: I'd like to dig into some use-cases and understand a little bit better how this is being used productively in the field. Before we do that, Stefan, maybe you could explain from Kapow Technologies' perspective, how you relate to this text analytics field that Seth so nicely just described. Where does Kapow begin and end, and how do you play perhaps within an ecosystem of providers that help with text analytics?
Andreasen: Text analytics, exactly as Seth was saying, is really a form of BI. In BI, you are examining some data and drawing some conclusions, maybe even making some automated actions on it.
Obviously, any BI or any text analysis is no better than the data source behind it. There are four extremely important parameters for the data sources. One is that you have the right data sources.
There are so many examples of people making these kind of BI applications, text analytics applications, while settling for second-tier data sources, because they are the only ones they have. This is one area where Kapow Technologies comes in. We help you get exactly the right data sources you want.
The other thing that's very important is that you have a full picture of the data. So, if you have data sources that are relevant from all kinds of verticals, all kinds of media, and so on, you really have to be sure you have a full coverage of data sources. Getting a full coverage of data sources is another thing that we help with.
Noise-free data
We already talked about the importance of noise-free data to ensure that when you extract data from your data source, you get rid of the advertisements and you try to get the major information in there, because it's very valuable in your text analysis.
Of course, the last thing is the timeliness of the data. We all know that people who do stock research get real-time quotes. They get it for a reason, because the newer the quotes are, the surer they can look into the crystal ball and make predictions about the future in a few seconds.
The world is really changing around us. Companies need to look into the crystal ball in the nearer and nearer future. If you are predicting what happens in two years, that doesn't really matter. You need to know what's happening tomorrow. So, the timeliness of the data is important.
Let me get to the approach that we're taking. Business analysts work with business applications through their web browser. They actually often cut and paste data out of business application into some spreadsheet.
The world is really changing around us. Companies need to look into the crystal ball in the nearer and nearer future.
You can see our product as a web browser, where you can teach it how to interact with the website, how to only extract the data that's relevant, and how you can structure that data, and then repeat it. Our product can give you automated, real-time, and noise-free access to any data you see in a web browser.
How does that apply to text analytics? Well, it gives you the 100-percent covered, real-time data source, with all of those values that I just explained.
Gardner: I really was intrigued by this notion of the crystal ball, and not two years from now, but tomorrow. It seems to me that so many people are putting up so much information about their lives, their preferences. People in business are doing the same around their occupation. We have this virtual focus group going on around us all the time. If we could just suck out the right information based on our products, we could get that crystal ball polished up.
Let me go back to you, Stefan. Can you give us an example of where a market research, customer satisfaction, or virtual focus group benefit is being derived from these text analytics capabilities?
Knowing the customer
Andreasen: Absolutely. For any company selling services or products, the most important thing for them to know is what the customers think about their product. Are we giving our customers the right customer service? Are we packaging our products the right way? How do we understand the customer's buying behavior, the customer communications, and so on?
Intuit is a customer we have together with a text analysis company called Clarabridge. They use text analysis solution to understand the TurboTax customers.
Before they had a text analysis system, they had some people that did one percent coverage sampling of forums on the web, their own customer support system, and emails into their contact center to get some rudimentary overview of what the customer thought.
We went in, and with Kapow Technologies they can now get to all these data sources -- forums online, their own customer support center, and wherever there are networks of TurboTax users -- and extract all the information in near real-time. Then, they use the text-analysis engine to make much, much better predictions of what the customers think, and they actually having the finger on the pulse.
With the web, you don't have to get those people together, because they come together on their own and participate in social media forums of various types.
If a set of customers suddenly talk about a feature that doesn't work, or that is much better in the competitor's product -- and thereby looking into the near future of the crystal ball --they can react early and try to deal with this in the best possible way.
Gardner: Seth Grimes, is this an area where you have seen a lot of the text analytics work focused on these sort of virtual focus groups?
Grimes: Definitely. That's an interesting concept. The idea behind a focus group is that it's a traditional qualitative research tool for market research firms. They get a bunch of people into a room and they have the facilitator lead those people through a conversation to talk about brand names, marketing, positioning, and then get their reactions to it.
With the web, you don't have to get those people together, because they come together on their own and participate in social media forums of various types. There are a whole slew of them. Together they constitute a virtual focus group, as you say.
The important point here is to get at the so-called voice of the customer. In other words, what is the customer saying in his own voice, not in some form where you're forcing that person to tick off number one, two, three, four, or five, in order to rate your product. They can bring up the issues that are of interest to them, whether they are good or bad issues, and they can speak about those issues however they naturally do. That's very important.
I've actually been privileged to share a stage with the analytics manager from Intuit, Chris Jones, a number of times to talk about what he is doing, the technologies, and so on. It's really interesting stuff that amplifies what Stefan had to say.
Broad picture
The idea is that you can use these technologies, both to get a broad picture of the issues, and no longer have to bend those issues into categories that your business analysts have predefined. Now, you can generate the topics of most interest, using automated, statistical methods from what the people are actually saying. In other words, you let them have their own voice.
You also get the effect of not only looking at the aggregate picture, at the mass of the market, but also at the individual cases. If someone posts about a problem with one of the products to an online forum, you can detect that there's an issue there.
You can make sure that the issues gets to the right person, and the company can personally address each issue in order to really keep it from escalating and getting a lot of attention that you really don't want it to get. You get the reputation of being a very responsive company. That's a very important thing.
The goal here is not necessarily to make more money. The goal is to boost your customer satisfaction rating, Net Promoter score, or however you choose to measure it. These technologies, the text technologies, are a very important package and part of the overall package of responding to customer issues and boosting customer satisfaction.
While you're doing it, those people are going to buy more. They're going to reduce your support costs, all of that kind of stuff, and you are going to make more money. So, by doing the right thing, you're also doing something good for your own company.
What you really want to know is who this person knows in all kinds of social networks on the 'Net, and to try to make a network of who are the real influencers and who are the network centers.
Gardner: In business, you want to reduce the guesswork to do better by your customers. Stefan, as I understand it, Kapow Technologies has been quite successful in working with a variety of military, government, and intelligence agencies around the world on getting this real-time information as to what's going on, but perhaps with the stakes being a bit higher, things like terrorism, and even insurrections and uprising.
Tell us a little bit about a second use case scenario, where text analytics are being used by government agencies and intelligence agencies.
Andreasen: As Seth said, the voice of the customer is very interesting and very valuable use case with text analysis. I'll add one thing to what Seth said. He was talking about product input, and of course, we all know that developing products -- maybe not so much a product like TurboTax, but developing a car -- is extremely expensive. So, understanding what kind of product your customers want in the future is an important part of the voice of the customer.
With a lot of the customers in the military intelligence, it's similar. Of course, they would like to know what people are writing from a sentiment point of view, an opinion point of view, but another thing that's actually even more important in the intelligence community is what I will call relationships.
Seth mentioned relationships earlier, and also understanding the real influencers and who are the ones that have the most connections in these relationships. Let's say somebody writes an article about how you mix some chemicals together to make an efficient bomb. What you really want to know is who this person knows in all kinds of social networks on the 'Net, and to try to make a network of who are the real influencers and who are the network centers.
Finding relationships
We see a lot of uses of our product, going out to blogs, forums, etc., in all kinds of languages, translating it often into English, and doing this relationship analysis. A very popular product for that, which is a partner of ours, is Palantir Technologies. It has a very cool interactive way of finding relationships. I think this is also very relevant for normal enterprises.
Yesterday I met with one of the big record companies, which is also a customer of ours. As soon as I explained this relationship stuff, they said, "We can really use this for anti-piracy, because it is really just very few people who do the major work when it gets to getting copies of new films out in the 'Net. So, understanding this relationship can be very relevant for this kind of scenario as well.
Grimes: Dana, when you introduced our podcast today, you used the term ecology or ecosystem, and that's a real great concept that we can apply here in a number of dimensions. We do have an ecosystem in at least two dimensions.
Stefan mentioned one of the Kapow partners, Palantir. We earlier mentioned the text analytics partner, Clarabridge. We have the ability now through integration technologies like Kapow to bring together different information sources, very disparate, different information sources with different characteristics, to provide an ecosystem of information that can be analyzed and brought to bear to solve particular business or government problems.
I find that ecosystem concept to be very useful here in framing the discussions about how the text technologies fit into something that's a much larger picture.
We have a set of software technologies that can similarly be integrated into an ecosystem to help you solve those problems. That might be text analysis technologies. It might be traditional BI or data warehousing technologies. It might be visualization technologies, whatever it takes to handle your particular business problem.
As we've been discussing, we do see applications in a whole variety of business and government issues, whether it's customer or intelligence or many other things that we haven’t even discussed today. So, I find that ecosystem concept to be very useful here in framing the discussions about how the text technologies fit into something that's a much larger picture.
Gardner: So, we are looking at the ecologies. We are looking at some of these use-cases. It seems to me that we also want to be able to gather information from a variety of different players, perhaps in some sort of a supply chain, ecosystem, business process, channel partners, or value added partners. The ecology and ecosystem concept works not only in terms of what we do with this information, but how we can apply that information back out to activities that are multi-player, beyond the borders or boundaries of any one organization.
I'm thinking about product recall, health, and public-health types of issues. Seth, have you worked with any clients or do you have any insights into how text analytics is benefiting an extended supply chain of some sort, and how the ecosystem of insight into the text analytics solves some unique problems there?
Product recall
Grimes: Product recall is an interesting one. Let me give you an example there. This is, like most examples that we are going to discuss, a multifaceted one.
People are all familiar with the problems with Firestone tires back a number of years ago, early in this decade, where the tread was coming off tires. Well, there are a number of parties that are going to be interested in this problem.
I am sorry, but put aside the consumers who are obviously affected by it, very badly affected by it. But, we have the manufacturers, not only of the tires, but also of the vehicles, the Ford Explorer in this case.
We have the regulatory bodies in the government, parts of the U.S. Department of Transportation. We have the insurance industry. All of these are stakeholders who have an interest in early detection, early addressing, and early correction of problem.
You don't want to wait until there are just so many cases here that it's just obvious to everyone, the issues really spill out into the press, and there are questions of negligence, and so on. So, how can you address something like a problem with tires where the tread is coming off?
You don't want to wait until there are just so many cases here that it's just obvious to everyone, the issues really spill out into the press, and there are questions of negligence, and so on.
Well, one way is warranty claims. For example, someone might file a claim through the vehicle manufacturer, Ford in this case, or through the tire manufacturer, claiming a defective product. Sometimes, just an individual tire is defective, but sometimes that's an indication of manufacturing or design issues. So you have warranty claims.
You also have accident reports that are filed by police departments or other government agencies and find their way into databases in the Department of Transportation and other places. Then, you have news reports about particular incidents.
There are multiple sources of information. There are multiple stakeholders here. And, there are multiple ways of getting at this. But, like so many problems, you're going to get at the issue much faster, if you combine information from all of these different sources, rather than relying on a single source.
Again, that's where the importance of building up an ecosystem of different data sources that come to bear on your problem is really important, and that's just a typical use case. I know of other organizations, manufacturing organizations, that are using this technology in conjunction with data-mining technologies for warranty claims, for example. Consumer appliances is another area that I have heard a lot about, but really there is no limitation in where you can apply this.
Gardner: Stefan, from your perspective, for these extended supply chains, public health issues, etc., again we get down to this critical time element -- for example, the Swine flu outbreak last spring. If folks could identify through text analytics where this was starting to crop up, they didn't have to wait for the hospital reports necessarily. Is that an instance where some of these technologies can really play an important role?
Big pitfall
Andreasen: Absolutely. Before I get into some more real examples, I want to emphasize some of the things that Seth was saying. He's talking about getting to multiple data sources. I cannot stress enough that what I have seen out there as one of the biggest pitfalls when people are making a text analysis solution or actually any BI solution is that they look at what data sources they have and they settle for that.
They should have said, "What are the optimal data sources to get the best prediction and get the best outcome out of this text analysis?" They should settle for no less than that.
The example here will actually explain that. I also have a tire example. We actually have two different kinds of customers using our products looking at tires, tire explosions, and tire recalls.
One is a tire company itself. They go to automated forums and try to monitor if people are doing exactly what Seth is saying, filing claims or writing on an automotive blog: "I got this tire, and it exploded." "It's just really bad." "Don't buy it." All those kinds of information from different sources.
If you get enough of the data source and you get that data in real-time, you can actually go in and contain the situation of a potential tire recall before it happens, which of course could be very valuable for your company.
Many different players here can use the same kind of information for different purposes, and that makes this really interesting.
The other use case is stock research. We have a lot of customers doing financial and market research with our technology. One of them is using our product, for example, to go out and check the same forums, but their objective is to predict if there is a tire recall. Then, they can predict that the stock is going to get a crash, when that happens, and project that beforehand.
Many different players here can use the same kind of information for different purposes, and that makes this really interesting as well.
Gardner: Well, it really seems the age old part of this is that, getting information first has many, many advantages, but the new element is that more and more information is in the form of analytics out in the web.
I wonder if we could cap this discussion -- we are about out of time -- by looking at the future. Seth, you mentioned earlier the semantic web. How automated can this get, and what needs to take place in order for that vision of a semantic web to take place?
Grimes: Well, the semantic web right now is a dream. It's a dream that was first articulated over a decade ago by Tim Berners-Lee, the person who created the World Wide Web, but it is one that is on the fast track to being realized. Being realized in this case means creating meaning.
What Stefan was referring to earlier when he talked about the dates of a published article, the title, perhaps other metadata fields such as the author, creating information that describes what's out there on the web and in databases.
Machine processable
Rendering that information into a form that's machine processable, not only in the sense of analysis, but also in the sense of making interconnections among different pieces of information, is what the semantic web is really about. It's about structuring information that's out there on the Web. That can include what Stefan referred to as the deep web, and creating tools that allow people to search and issue other types of queries against that web data.
It's something that people are working hard on now, but I don't think will be really realized in terms of any broad business usable applications for a fair number of years. Not next year or the year after, but maybe three to five years out, we will really start to see a very broadly useful business application. There is going to be niche applications in the near term, but later something much broader.
It's a direction that really hits on the themes that we have been talking about today, integrating applications and data from multiple sources and of multiple types in order to create a whole that is much greater than each of the parts.
We need software technologies that can do that nowadays, and fortunately we have them.
We need software technologies that can do that nowadays, and fortunately we have them, as we have been discussing. We need a path that will evolve us towards something that really creates much greater value for much larger massive applications in the future, and fortunately the technologies that we have now are evolving in that direction.
Gardner: Very good. I think we have to leave it there. I want to thank both of our guests. We have been discussing the role of text analytics and how companies can take advantage of that and bring that into play with their BI and marketing and other activities, and how the mining of this information is now being done by tools and is increasingly being automated.
I want to thank Seth Grimes, principal consultant at Alta Plana Corp., for joining us. Thanks so much, Seth.
Grimes: Again, thank you Dana, and thanks to Kapow for making this possible.
Gardner: Also, Stefan Andreasen, co-founder and CTO at Kapow Technologies. Thanks again for sponsoring and joining us, Stefan.
Andreasen: Well, thank you. That was a great discussion. Thank you.
Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. This is Part Three of a series from Kapow Technologies on using BI and web data services in unique forms to increase business benefits.
You have been listening to a sponsored BriefingsDirect podcast. Thanks and come back next time.
Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Kapow Technologies.
Transcript of a sponsored BriefingsDirect podcast on information management for business intelligence, one of a series on web data services with Kapow Technologies. Copyright Interarbor Solutions, LLC, 2005-2009. All rights reserved.
 
