Showing posts with label Eric Fegraus. Show all posts
Showing posts with label Eric Fegraus. Show all posts

Tuesday, November 03, 2015

Big Data Generates New Insights into What’s Happening in the World's Tropical Ecosystems

Transcript of a discussion on how large-scale monitoring of rainforest, biodiversity and climate has been enabled and accelerated by cutting-edge, big-data capture, retrieval and analysis.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Our next big-data case study discussion explores how large-scale monitoring of rainforest biodiversity and climate has been enabled and accelerated by cutting-edge big-data capture, retrieval, and analysis.

We'll learn how quantitative analysis and modeling are generating new insights into what’s happening in tropical ecosystems worldwide, and we'll hear how such insights are leading to better ways to attain and verify sustainable development and preservation methods and techniques.

To learn more about data science -- and how hosting that data science in the cloud -- helps the study of biodiversity, we're pleased to welcome our guests, Eric Fegraus, Senior Director of Technology of the TEAM Network at Conservation International in Arlington, Virginia. Welcome, Eric.

Eric Fegraus: Hi, Dana. It’s great to be here. Thank you.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Gardner: We're glad to have you. We're also here with Jorge Ahumada, Executive Director of the TEAM Network, also at Conservation International. Welcome, Jorge.

Jorge Ahumada: Great to be here.

Gardner: Let’s start with the trends. Clearly, knowing what’s going on in environments in the tropics helps us understand what to do and what not to do. How has that changed? We spoke about a year ago, Eric. Are there any trends or driving influences that have made this data gathering more important than ever.

Fegraus: Over this last year, we’ve been able to roll out our analytic systems across the TEAM Network. We're having more-and-more uptake with our protected-area managers using the system and we have some good examples where the results are being used.

Fegraus
For example, in Uganda, we noticed that a particular cat species was trending downward. The folks there were really curious why this was happening. At first, they were excited that there was this cat species, which was previously not known to be there.

This particular forest is a gorilla reserve, and one of the main economic drivers around the reserve is ecotourism, people paying to go see the gorillas. Once they saw that these cats are going down, they started asking what could be impacting this. Our system told them that the way they were bringing in the eco-tourists to see the gorillas had shifted and that was potentially having an impact of where the cats were. It allowed them to readjust and think about their practices to bring in the tourists to the gorillas.

Information at work

Gardner: Information at work.

Fegraus: Information at work at the protected-area level.

Gardner: Just to be clear for our audience, the TEAM Network stands for the Tropical Ecology Assessment and Monitoring. Jorge, tell us a little bit about how that came about, the TEAM Network and what it encompasses worldwide?

Ahumada: The TEAM Network was a program that started about 12 years ago and it was started to fill a void in the information we have from tropical forests. Tropical forests cover a little bit less than 10 percent of the terrestrial area in the world, but they have more than 50 percent of the biodiversity.

Ahumda
So they're the critical places to be conserved from that point of view, despite the fact we didn’t have any information about what's happening in these places. That’s how the TEAM Network was born, and the model was to use data collection methods that were standardized, that were replicated across a number of sites, and have systems that would store and analyze that data and make it useful. That was the main motivation.

Gardner: Of course, it’s super-important to be able to collect and retrieve and put that data into a place where it can be analyzed. It’s also, of course, important then to be able to share that analysis. Eric, tell us what's been happening lately that has led to the ability for all of those parts of a data lifecycle to really come to fruition?

Fegraus: Earlier this year, we completed our end-to-end system. We're able to take the data from the field, from the camera traps, from the climate stations, and bring it into our central repository. We then push the data into Vertica, which is used for the analytics. Then, we developed a really nice front-end dashboard that shows the results of species populations in all the protected areas where we work.

The analytical process also starts to identify what could be impacting the trends that we're seeing at a per-species level. This dashboard also lets the user look at the data in a lot of different ways. They can aggregate it and they can slice and dice it in different ways to look at different trends.

Gardner: Jorge, what sort of technologies are they using for that slicing and dicing? Are you seeing certain tools like Distributed R or visualization software and business-intelligence (BI) packages? What's the common thread or is it varied greatly?

Ahumada: It depends on the analysis, but we're really at the forefront of analytics in terms of big data. As Michael Stonebraker and other big data thinkers have said, the big-data analytics infrastructure has concentrated on the storage of big data, but not so much on the analytics. We break that mold because we're doing very, very sophisticated Bayesian analytics with this data.

One of the problems of working with camera-trap data is that you have to separate the detection process from the actual trend that you're seeing because you do have a detection process that has error.

Hierarchical models

We do that with hierarchical models, and it's a fairly complicated model. Just using that kind of model, a normal computer will take days and months. With the power of Vertica and power of processing, we’ve been able to shrink that to a few hours. We can run 500 or 600 species from 13 sites, all over the world in five hours. So it’s a really good way to use the power of processing.

We’d been also more recently working with Distributed R, a new package that was written by HP folks at Vertica, to analyze satellite images, because we're also interested in what’s happening at these sites in terms of forest loss. Satellite images are really complicated, because you have millions of pixels and you don’t really know what each pixel is. Is it forest, agricultural land, or a house? So running that on normal R, it's kind of a problem.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Distributed R is a package that actually takes some of those functions, like random forest and regression trees, and takes full power of the vertical processing of Vertica. So we’ve seen a 10-fold increase in performance with that, and it allows us to get much more information out of those images.

Gardner: Not only are you on the cutting-edge for the analytics, you've also moved to the bleeding edge on infrastructure and distribution mechanisms. Eric, tell us a little bit about your use of cloud and hybrid cloud?

Fegraus: To back up a little bit, we ended up building a system that uses Vertica. It’s an on-premise solution and that's what we're using in the TEAM Network. We've since realized that this solution we built for the TEAM Network can also be readily scalable to other organizations and government agencies, etc., different people that want to manage camera trap data, they want to do the analytics.

So now, we're at a process where we’ve been essentially doing software development and producing software that’s scalable. If an organization wants to replicate what we’re doing, we have a solution that we can spin up in the cloud that has all of the data management, the analytics, the data transformations and processing, the collection, and all the data quality controls, all built into a software instance that could be spun up in the cloud.
In many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data.

Gardner: And when you say “in the cloud,” are you talking about a specific public cloud, in a specific country or all the above, some of the above?

Fegraus: All of the above. We'll be using Vertica or we're using Vertica OnDemand. We're actually going to transition our existing on-premise solution into Vertica OnDemand. The solution we’re developing uses mostly open-source software and it can be replicated in the Amazon cloud or other clouds that have the right environments where we can get things up and running.

Gardner: Jorge, how important is that to have that global choice for cloud deployment and attract users and also keep your cost limited?

Ahumada: It’s really key, because in many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data. As Eric was saying, the big limitation here is which cloud solutions are available in each country. Right now, we have something with cloud OnDemand here, but in some of the countries, we might not have the same infrastructure. So we'll have to contract different vendors or whatever.

But it's a way to keep cost down, deliver the information really quick, and store the data in a way that is safe and secure.

What's next?

Gardner: Eric, now that we have this ability to retrieve, gather, analyze, and now distribute, what comes next in terms of having these organizations work together? Do we have any indicators of what the results might be in the field? How can we measure the effectiveness at the endpoint -- that is to say, in these environments based on what you have been able to accomplish technically?

Fegraus: One of the nice things about the software that we built that can run in the various cloud environments, is that it can also be connected. For example, if we start putting these solutions in a particular continent, and there are countries that are doing this next to each other, there are not going to be silos that will be unable to share an aggregated level of data across each other so that we can get a holistic picture of what's happening.

So that was very important when we started going down this process, because one of the big inhibitors for growth within the environmental sciences is that there are these traditional silos of data that people in organizations keep and sit on and essentially don't share. That was a very important driver for us as we were going down this path of building software.

Gardner: Jorge, what comes next in terms of technology. Are the scale issues something you need to hurdle to get across? Are there analytics issues? What's the next requirements phase that you would like to work through technically to make this even more impactful?

Ahumada: As we scale up in size and  start  having more granularity in the countries where we work, the challenge is going to be keeping these systems responsive and information coming. Right now, one of the big limitations is the analytics. We do have analytics running at top speeds, but once we started talking about countries, we're going to have an the order of many more species and many more protected areas to monitor.
This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data.

This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data. We're looking forward to keep working with our technology partners, and in particular HP, to help them guide this process. As a case study, we're very well-positioned for that, because we already have that challenge.

Gardner: Also it appears to me that you are a harbinger, a bellwether, for the Internet of Things (IoT). Much of your data is coming from monitoring, sensors, devices, and cameras. It's in the form of images and raw data. Any thoughts about what others who are thinking about the impact of the IoT should consider, now that you have been there?

Fegraus: When we talk about big data, we're talking about data collected from phones, cars, and human devices. Humans are delivering the data. But here we have a different problem. We're talking about nature delivering the data and we don't have that infrastructure in places like Uganda, Zimbabwe, or Brazil.

So we have to start by building that infrastructure and we have the camera traps as an example of that. We need to be able to deploy much more, much larger-scale infrastructure to collect data and diversify the sensors that we currently have, so that we can gather sound data, image data, temperature, and environmental data in a much larger scale.

Satellites can only take us some part of the way, because we're always going to have problems with resolution. So it's really deployment on the ground which is going to be a big limitation, and it's a big field that is developing now.

Gardner: Drones?

Using drones

Fegraus: Drones, for example, have that capacity, especially small drones that are showing to be intelligent, to be able to collect a lot of information autonomously. This is at the cutting edge right now of technological development, and we're excited about it.

Gardner: Well great. I'm afraid we will have to leave it there. We have been learning and exploring how large-scale monitoring of rainforest, biodiversity and climate has been enabled and accelerated by cutting-edge, big-data capture, retrieval, and analysis. And we've seen how quantitative analysis and modeling are generating new insights into what's happening in tropical ecosystems worldwide.

So a big thanks to our guests, Eric Fegraus, Senior Director of Technology of the TEAM Network at Conservation International, and Jorge Ahumada, the Executive Director of the TEAM Network, also at Conservation International.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
And a big thank you to our audience as well for joining us for this big data innovation case study discussion. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a discussion on how large-scale monitoring of rainforest, biodiversity and climate has been enabled and accelerated by cutting-edge, big-data capture, retrieval and analysis. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Tuesday, May 26, 2015

Big Data Helps Conservation International Proactively Respond to Species Threats in Tropical Forests

Transcript of a BriefingsDirect discussion on how a conservation group, partnering with HP, brings real-time environmental data into the hands of environmental policy makers.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people's lives.

Gardner
Once again, we're focusing on how companies are adapting to the new style of IT to improve IT performance, gain new insights and deliver better user experiences, as well as better overall business results.

Our next innovation case study interview highlights how Conservation International (CI) in Arlington, Virginia uses new technology to pursue more data about what's going on in tropical forests and other ecosystems around the world.

As a non-profit, they have a goal of a sustainable planet, but we're going to learn how they've learned to measure what was once unmeasurable -- and then to share that data to promote change and improvement.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
To learn how, we're joined by Eric Fegraus, Director of Information Systems at Conservation International. Welcome, Eric.

Eric Fegraus: Thank you, Dana. It’s a pleasure to be here.

Gardner: First, tell us the relationship with technology. Conservation International recently announced HP Earth Insights. What is that all about?

Fegraus: HP Earth Insights is a partnership between Conservation International and HP and it's really about using technology to accelerate the work and impact of some of the programs within Conservation International. What we've been able to do is bring the analytics and a data-driven approach to build indices of wildlife communities in tropical forests and to be able to monitor them in near-real-time.

Fegraus
Gardner: I'm intrigued by this concept of being able to measure what was once unmeasurable. What do you mean by that?

Fegraus: This is really a telling line. We really don’t know what’s happening in tropical forests. We know some general things. We can use satellite imagery and see how forests are increasing or decreasing from year to year and from time period to time period. But we really don't know the finer scale measurements. We don't know what's happening within the forest or what animal species are increasing or are decreasing.

There's some technology that we have out in the field that we call camera traps, which take images or photos of the animals as they pass by. There are also some temperature sensors in them. Through that technology and some of the data analytics, we're able to actually evaluate and monitor those species over time.

Inference points

Gardner: One of the interesting concepts that we've seen is that for a certain quantity of data, let's say 10,000 data points, you can get magnitude of order more inference points. How does that work for you, Eric? Even though you're getting a lot of data, how does that translate into even larger insights?

Fegraus: We have some of the largest datasets in our field in terms of camera trapping data and wildlife communities. But within that, you also have to have a modeling approach to be able to utilize that data, use some of the best statistics, transform that into meaningful data products, and then have the IT infrastructure to be able to handle it and store it. Then, you need the data visualization tools to have those insights pop out at you.

Gardner: So, not only are you involved with HP in terms of the Earth Insights Project, but you're a consumer of HP technology. Tell us a little bit about Vertica and HP Haven, if that also is something you are involved with?

Fegraus: Yes. All of our servers are HP ProLiant servers. We've created an analytical space within our environment using the HP ProLiant servers, as well as HP Vertica. That's really the backbone of our analytical environment. We're also using R and we're now exploring with Distributed R within the Vertica context.

We’re using the HP Cloud for data storage and back up and we’re working on making the cloud a centerpiece for data exchange and analysis for wildlife monitoring. In terms of Haven, we're exploring other parts of Haven, in particular HP Autonomy, and a few other concepts, to help with unstructured data types.
What we want to do is get the best available data at the right spatial and temporal scales, the best science, and the right technology.

Gardner: Eric, let’s talk a little bit about what you get when you do good data analytics and how it changes the game in a lot of industries, not just conservation. I'm thinking about being able to project into people’s understanding of change.

So for someone to absorb an understanding that things need to happen in order for things to improve, there is a sense of convincing. What is big data bringing to the table for you when you go to governments or companies and try to promulgate change in these environments?

Fegraus: From our perspective, what we want to do is get the best available data at the right spatial and temporal scales, the best science, and the right technology. Then, when we package all this together, we can present unbiased information to decision makers, which can lead to hopefully good sustainable development and conservation decisions.

These decision makers can be public officials setting conservation policies or making land use decisions. They can be private companies seeking to value natural capital or assess the impacts of sourcing operations in sensitive ecosystems.

Of course, you never have control over which way legislation and regulations can go, but our goal is to bring that kind of factual information to the people that need it.

Astounding results

Gardner: And one of the interesting things for me is how people are using different data sets from areas that you wouldn't think would have any relationship to one another, but then when you join and analyze those datasets, you can come up with astounding results. Is this the case with you? Are you not only gathering your own datasets but finding the means to jibe that with other data and therefore come up with other levels of empirical analysis?

Fegraus: We are. A lot of the analysis today has been focused on the data that we've collected within our network. Obviously, there are a lot of other kinds of big data sets out there, for example, provided by governments and weather services, that are very relevant to what we're doing. We're looking at trying to utilize those data sets as best we can.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Of course, you also have to be careful. One of the key things we want to do is look for patterns, but we want to make sure that the patterns we're seeing, and the correlations we detect, all make sense within our scientific domain. You don’t want to create false correlations and improbable correlations.

Gardner: And among those correlations that you have been able to determine so far, about 12 percent of species are declining in the tropical forest. This information is thanks to your Tropical Ecology Assessment and Monitoring (TEAM) and HP Earth Insights. And there are many cases not yet perceived as being endangered. So maybe you could just share some of the findings, some of the outcome from all this activity.

Fegraus: We've actually worked up a paper, and that’s one of the insights. It’s telling, because species are ranked by “whether they are considered endangered or not.” So species that are considered “least concerned” according to the International Union for the Conservation of Nature (IUCN), we assume that they are doing okay.

So you wouldn’t expect to find that those species are actually declining. That can really serve as an early warning, a wake-up call, to protected-area managers and government officials in charge of those areas. There are actually some unexpected things happening here. The things that we thought were safe are not that safe.
Whether we are in the Amazon or whether we're in a forest in Asia or Indonesia, we can have results that are important locally

Gardner: And, for me, another telling indicator was that on an aggregate basis, some species are being measured and there isn’t any sense of danger or problem, but when you go localized, when you look at specific regions and ecosystems, you develop a different story. Was there an ability for your data gathering to give you more a tactical and insights that are specific?

Fegraus: That’s one of the really nice things about the TEAM Network, a partnership between Conservation International, the Wildlife Conservation Society and the Smithsonian Institution. In a lot of the work that TEAM does, we really work across the globe. Even though we're using the same methodologies, the same standards, whether we are in the Amazon or whether we're in a forest in Asia or Indonesia, we can have results that are important locally.

Then, as you aggregate them through sub-national level efforts, national-levels, or even continental levels, that's where we're trying to have the data flow up and down those spatial scales as needed.

For example, even though a particular species may be endangered worldwide we may find that locally, in a particular protected area, that species is stable. This provides important information to the protected area manager that the measures that are in place seem to be working for that species. It can really help in evaluating practices, measuring conservation goals and establishing smart policy.

Sense of confidence

Gardner: I've also spoken to some folks who express a sense of relief that they can go at whatever data they want and have a sense of confidence that they have systems and platforms that can handle the scale and the velocity of that data. It is sort of a freeing attitude that they don’t have to be concerned at the data level. They can go after the results and then determine the means to get the analysis that they need.

Is that something that you also share, that with your partnership with HP and with others, that this is about the determination of the analysis and the science, and you're not limited by some sort of speeds-and-feeds barrier?
The problem has really been bringing the technology, analytics, and tools to the programs that are mission critical, bringing all of this to business driven programs that are really doing the work.

Fegraus: This gets to a larger issue within the conservation community, the non-profits, and the environmental consulting firms. Traditionally, IT and technology has been all about keeping the lights on and making sure everyone has a laptop. There's a saying that people can share data, but the problem has really been bringing the technology, analytics, and tools to the programs that are mission critical, bringing all of this to business driven programs that are really doing the work.

One of the great outcomes of this is that we've pushed that technology to a program like TEAM and we're getting the cutting-edge technology that a program like TEAM needs into their hands, which has really changed the dynamic, compared to the status quo.

Gardner: So scale really isn't the issue any longer. It's now about your priorities and your requirements for the scientific activity?

Fegraus: Yes. It's making sure that technology meets the requirements in scientific and program objectives. And that's going to vary quite a bit depending on the program and the group that we were talking about, but ultimately it’s about enabling and accelerating the mission critical work of organizations like Conservation International.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Gardner: We've been discussing new data gathering and analysis programs to better determine tropical forest impacts for species and other conservation goals, and we've been learning this from our guest, Eric Fegraus, Director of Information Systems at Conservation International based in Arlington, Virginia. Thanks so much, Eric.

Fegraus: Thank you so much, Dana.

Gardner: And I like to thank our audience as well, for joining us for the special new style of IT discussion.

I'm Dana Gardner; Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect discussion on how a conservation group, partnering with HP, brings real-time environmental data into the hands of environmental policy makers. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in: