Amazon EMR (short for “Elastic MapReduce”) is a big data platform for distributed computing. In situations where you have to do many calculations and process massive amounts of data, Amazon EMR is the tool you’ll want to use. In this blog post, I’ll talk about my experience implementing Amazon EMR in our stack and the best practices I learned along the way.

Why EMR?

I first became interested in distributed computing while enrolled in the Master of Information and Data Science program at UC Berkeley.

Author: Tennison Yu // Partners: Pierce Coggins, Conor Healey, Jason Baker

Author: Tennison Yu // Partners: Pierce Coggins, Conor Healey, Jason Baker

Water is arguably Earth's most sought after commodity. Unlike other resources, animals also depend on it to flourish and rarely is this more true than Serengeti National Park, a 40,000 km² wildlife preserve filled with an assorted array of biodiversity. You can find tiny colorful lovebirds as well as towering grey elephants. It is also the home of the Maasai people and other tribal communities. Each year, approximately 290 000 tourists visit the Serengeti to see these awe-inspiring sites. In 2018, tourism accounted for about $2.43

Ansible and Terraform logos from

Recently, I took on an opportunity to learn Terraform and Ansible. To those unfamiliar, both are automation tools used for large scale deployments mostly on the cloud. Terraform is strong in deploying the infrastructure such as setting up instances, networks, buckets, etc whereas Ansible is great for deploying configurations and software updates to any instances that have been set up. Therefore, together they are a powerful team.

Terraform is developed by HashiCorp and uses its proprietary language HCL, whereas Ansible is owned by Red Hat and is primarily driven by YML files.

Tennison Yu

Machine Learning Engineer @ Jumio Corp. I like building infrastructure to get ML done.

