Assignment: Working with Data#

Using the data below, answer the following questions:

  1. Which entities (top 5) had the largest population density in 2020?

  2. Which entities have more water area than land area?

  3. Which entities increased in population the most in the last 10 years?

  4. What state bird accounts for the largest population as of 2020? Land area?

  5. How many entities’ largest city is their capital city?

  6. Which city has the largest percent drop from their largest city to their 5th largest? 100*(1st largest - 5th largest)/(1st largest)

import pandas as pd
facts = pd.read_csv('../data/state_facts.tsv',delimiter="\t")
facts.head(5)

Using the “state_dates.tsv” data, answer the remaining questions. You will need to merge the two data sets together:

  1. Of the states that joined the United States before 1790, what is the most common state flower?

  2. Which has the larger population density, the most dense US Territory or the least dense state?

  3. Make a graph that plots the populations of the largest city in each entity in the order in which they joined the US. Make the bars black

  4. Make two additional graphs like the one above but one for land area (green bars) and one for water area (blue bars)

Hint: pd.read_csv('../data/state_dates.tsv',delimiter="\t")

Hint: You likely want to convert the Date column to datetime. You might have to correct errors in the data as well.

Hint: states['Date']<pd.datetime(1790,1,1)

Hint: pd.merge(****,****,left_on='USPS_code',right_on='Abbreviation',how='outer')

# Sample code to help with the plots

#import matplotlib as plt
#%config InlineBackend.figure_format ='retina' #This makes your plot clearer


#plot = *your df by date*[[*column*,'Abbreviation']].plot(kind='bar',figsize=(10,4))
#plot.set_xticklabels(*your df by date*['Abbreviation']);