Previous Lecture Complete and continue  

  Intro & Materials

Welcome to Week 4!

In this section, we're going to break a long list of values into a few categories. You might need to categorize values if:

  • You have a list of zip codes but you really just care about the states.
  • You have a list of states but you really just care about the regions where those states are located.
  • You have a list of countries but you really just care about regions of the world.
  • You have a list of ages (0, 1, 2, 3, 4, 5, etc.) but you really just care about age ranges (0-9, 10-19, 20-29, etc.).
  • You have a list of schools but you really just care about which district the school is located within.
  • You have a list of test scores (40%, 55%, 70%) but you really just want to focus on students who passed or didn’t pass the exam.
  • You have a list of body mass indices (19, 24, 29, 32, etc.) but you want to categorize the raw numbers into underweight, normal weight, overweight, and obese.
  • You have a list of languages spoken but you really want to divide people into those who speak Mandarin and those who don’t.
  • You have a list of countries where people were born but you really just want to divide people into born in U.S. and not born in U.S.
  • … and so on.

The raw dataset you’re given is rarely the one you actually want for your analyses. Clean, re-code, and re-categorize until you’ve got the groupings you need.

We'll practice two techniques: if and vlookup.

Download
Discussion
0 comments