MCSA Data Engineering with Azure: Exam 70-775 study guide

One of Cloudreach’s core beliefs is that as a company, and as individual Cloudreachers, we should strive to be "one step ahead". For me, this means building a strong competency in one of Azure’s most exciting offerings with data engineering, analytics and if I’m feeling especially ambitious, machine learning (further down the road). My plan is to achieve anMCSA Data Engineering with Azureand share my experiences to help others who are interested in pursuing this path.

The 70-775 Exam

One of the key tests required to achieve an MCSA Data Engineering is:Exam 70-775 Perform Data Engineering on Microsoft HDInsight.

As the name implies, 70-775 focuses onHDInsight, Microsoft’s cloud-based, fully managed implementation of Apache Hadoop (the original big data analysis technology utilizing computational clusters). HDInsight makes Hadoop technologies available (including related tooling including Apache Hive, HBase, Spark and Kafka) to organizations that want the power of Hadoop without the infrastructural investment and overhead.

Available Resources

Hadoop and related technologies

Although I have some familiarity with Hadoop on-premises, I think it will be useful to take a good refresher course before focusing on HDInsight. I’m a fan of Udemy and there’s a solid course which seems to fit the need: The Ultimate Hands-On Hadoop - Tame your Big Data! taught by Frank Lane. There’s also a well-reviewed Microsoft Virtual Academy video overview of HDInsight named Big Data Analytics with HDInsight. It requires a serious time commitment (the video is five hours long!) but is considered well worth it.

70-775 specific exam resources

Although an important exam, a review of 70-775’s topics isn’t offered by the major online learning platforms such as Udemy, A Cloud Guru and Pluralsight. There is a Udemy HDInsight course, "Hadoop on Azure. An Introduction to Big Data Using HDInsight" but it hasn’t been updated in three years, an aeon of time in cloud technology terms.

With this in mind, I turned to the information provided on the exam’s information page for study and preparation guidance. Microsoft suggests the following edX courses:

  • Processing Big Data with Hadoop in Azure HDInsight
  • Implementing Real-Time Analytics in Azure HDInsight
  • Implementing Predictive Analytics with Spark in Azure HDInsight

edX courses are free (you can add a verified edX completion certificate for $99 US) but typically well reviewed.

To this list, I’ll add the quickstart for HDInsight which you can find here. Microsoft’s Azure quickstarts are an excellent way to practice a technology using a near real-world scenario (or a reasonable copy).

Microsoft Press publishes an exam guide: Exam Ref 70-775 Perform Data Engineering on Microsoft Azure HDInsight which I’ll be reviewing in a few weeks when it becomes available (release date as of this writing is July 20. 2018). Oh and of course, you’ll also need an Azure subscription (you can start with trial account, which comes with a $200 credit).

Study Plan

With all Microsoft exams, I find it useful to use the information provided on the exam’s description page as the basis for creating a detailed, week by week, day by day study plan. Everyone’s different and while some people can cram their preparation into a compressed amount of time, I gain more real and deep learning by dividing the exam’s topics into easily digestible units.

Here’s a sample:

Week and 70-775 Exam Objective Area Day One Day Two Day Three
Week 1 Administer and Provision HDInsight Clusters Deploy HDInsight Clusters

Create a cluster in a private virtual network
Create a cluster that has a custom metastore
Create a domain-joined cluster
Select an appropriate cluster type based on workload considerations, Customize a cluster by using script actions
Provision a cluster by using Portal
Provision a cluster by using Azure CLI tools
Provision a cluster by using Azure Resource Manager (ARM) templates and PowerShell
Manage managed disks
Configure vNet peering
Deploy and secure multi-user HDInsight clusters

Provision users who have different roles
Manage users, groups, and permissions through Apache Ambari
PowerShell, and Apache Ranger
Configure Kerberos
Configure service accounts
Implement SSH tunneling
Restrict access to data
Ingest data from cloud or on-premises data

Store data in Azure Data Lake
Store data in Azure Blob Storage
Perform routine small writes on a continuous basis using Azure CLI tools Ingest data in Apache Hive and Apache Spark by using Apache Sqoop
Application Development Framework (ADF)
AzCopy, and AdlCopy
Ingest data from an on-premises Hadoop cluster

In the above, the course skill area (in this case, "Administer and Provision HDInsight Clusters") is used as the focus for a week of study with the sub-areas providing the day to day topics covered. For each area, I gather additional resources (notes, Quickstarts, summaries) to ensure I understand the material before moving forward.

And that’s it! As I progress towards my goal and sit for the exams I’ll provide updated information about the quality (or, lack) of the learning materials and my impressions of the test (with no spoilers, of course).

Happy Learning!

  • learning-and-development
  • data-engineering
  • microsoft-azure
  • Data Insights
  • Tech Blog