Re:Source — Building a tool to predict water availability in the Serengeti

Tennison Yu
6 min readJan 24, 2021

Author: Tennison Yu // Partners: Pierce Coggins, Conor Healey, Jason Baker

Water is arguably Earth’s most sought after commodity. Unlike other resources, animals also depend on it to flourish and rarely is this more true than Serengeti National Park, a 40,000 km² wildlife preserve filled with an assorted array of biodiversity. You can find tiny colorful lovebirds as well as towering grey elephants. It is also the home of the Maasai people and other tribal communities. Each year, approximately 290 000 tourists visit the Serengeti to see these awe-inspiring sites. In 2018, tourism accounted for about $2.43 billion of Tanzania’s GDP.

Unfortunately, this great ecosystem now comes under threat. Through various causes such as climate change and human activity, water has become scarce or undrinkable. For those who live in the Serengeti, animals and humans alike, this means higher rates of disease and decreased herd size. Large animal populations have reduced by 75% in the last 40 years.

There is completely no water, so the animals are depending on humans … If we don’t help them, they will die.

— Patrick Kilonzo Mwalua, Founder & Director @ Mwalua Wildlife Trust

The current solutions include delivering fresh water to wells or boring wells to provide wildlife access to underground water sources. Both of which can be costly, laborious, and time-consuming.

With the modernization of our technological world, my partners and I wanted to address this problem by building a tool to provide water availability information as part of our capstone in the UC Berkeley MIDS program.

Building The App

Our solution would provide near-daily water predictions over key water sources within the Serengeti Ecosystem, enabling park rangers, land managers, environmentalists, and researchers to re-allocate resources necessary to identify and respond to potential drought conditions.

We identified eleven critical water sources throughout the 40,000 km² Serengeti Ecosystem and built the model, prediction pipeline, and visualization tool to provide near-daily water identification and predictions.

Data

We started with two publicly available datasets: one of the Amazon Rainforest with 40.5k labeled images from Kaggle and one with locations around Australia with 4.9k labeled images from the Australian government. We then supplemented this by building a scraper around Planet’s API to pull satellite images of the picked Serengeti areas. After, we self-labeled the images for use in the modeling step described below (data no longer available).

The images above give an example of what we looked for when self-labeling. On the far left, the stream blends well with the darker forest indicating an abundance of water. Compared to the right, the dry stream stands out, meaning a lack of water. In unsure cases such as the middle image, we additionally considered the lushness of the surrounding vegetation.

In total, our dataset was comprised of:

Training and Validation:

  • Manually Annotated Serengeti Images from Planet (5.8k images)
  • Kaggle Amazon (40.5k images)
  • Australia (4.9k images)

Label Distribution:

  • No Water: 80.6%
  • Water: 19.3%

Model

To do the modeling, we were intrigued by the new MixNet model, published in Dec. 2019 by Google Brain (paper). The variable kernel size, multi-scale networks, and neural architecture search seemed appropriate in distinguishing water sources whether they were large streams or small watering holes. It also had state-of-the-art performance and used fewer parameters translating into less compute and development time required for us.

As an initial comparison, we also trained a ResNet-18 model and found the below results. The higher F1-Scores were an encouraging sign for us to proceed with this model.

We also wanted to compare our model when using three-band images versus four-band images. All the images came in both the standard red, blue, green (RGB), and 4-banded TIFF version representing RGB plus the near-infrared visual spectra (RGBA).

We found that the F1-scores for RGBA were more encouraging from a numerical and stability point of view and so proceeded with that one moving forward.

Infrastructure

To encapsulate the model and flesh out our tool, we used the following setup on AWS.

The first part of this diagram illustrates our data capture methodology as described previously. We then used PyTorch on an EC2 instance to do the training. Weights & Biases was used to store our results. From there, a Flask and Gunicorn Docker container hosting our model served as our webserver for inference. From there, we set up a ready-to-deploy R-Shiny app as a proof of concept. Below is a screenshot of the result, which worked on both web and mobile applications.

Conclusion

Challenges and Next Steps

We were able to train a model and deploy it as an app successfully. However, the journey continues as we did encounter several challenges along the way. Our model primarily had trouble distinguishing lakes and ravines, mostly when they were near dense forest canopies.

As seen above, lakes have less characteristic edges and can transition from wet to dry more gradually. They are also harder to capture fully in an image.

The foliage breaks the consistent pattern of river edges with canopies, leading to model confusion between rivers and valleys with a dense canopy.

In all, this points to two immediate potential solutions: increase the number of images or perform image enhancements/augmentation. We recognize that our dataset is unbalanced. Getting more shots with water should help the algorithm distinguish different patterns better. We can leverage our API wrapper or by transforming the images to do this. Enhancing image quality to sharpen and brighten the edges should also help.

Future

In addition to addressing the above, we will continue to push the boundaries. There are other techniques to be tried, such as object segmentation, which will likely positively impact our app. Nevertheless, it was pleasing to develop such a modern tool powered by the latest technologies to identify water availability at critical sources. As a team, we recognize that our app may have other applications, such as deforestation and oil spills. Suffice to say; we are excited about the future! Thanks for reading.

Project Repo: https://github.com/tyu0912/serengeti_water_predictor2

References

  1. Source: University of York, https://phys.org/news/2019-03-serengeti-mara-squeezeone-world-iconic-ecosystems.html
  2. Source: Just Fun Facts, http://justfunfacts.com/interesting-facts-about-maasai-mara-national-reserve/
  3. Source: Serengeti Watch, https://serengetiwatch.org/threats/
  4. Frank, Douglas A.; Mcnaughton, Samuel J.; Tracy, Benjamin F.; The Ecology of Earth’s Grazing Ecosystems,
  5. Source: Business Week, https://www.busiweek.com/tanzania-earns-us2-43-billion-from-tourism-in-2018/
  6. Source: The Dodo, https://www.thedodo.com/water-man-kenya-animals-2263728686.html
  7. MixNet Paper: Mingxing Tan, Quoc V. Le, https://arxiv.org/abs/1907.09595
  8. ResNet Paper: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, https://arxiv.org/abs/1512.03385
  9. Michigan State University, Remote Sensing Paper: Spagnuolo et al. https://www.mdpi.com/2072-4292/12/7/1086
  10. Joshua Blumenstock Science Article: http://www.jblumenstock.com/files/papers/jblumenstock_2016_science.pdf
  11. Global Forest Watch: https://www.globalforestwatch.org/map
  12. Planet Team (2017). Planet Application Program Interface: In Space for Life on Earth. San Francisco, CA. https://api.planet.com.

--

--

Tennison Yu

Machine Learning Engineer @ Jumio Corp. I like building infrastructure to get ML done.