Assignment Instructions

The data for this assignment come from the Hospital Compare web site (http://hospitalcompare.hhs.gov) run by the U.S. Department of Health and Human Services. The purpose of the web site is to provide data and information about the quality of care at over 4,000 Medicare-certified hospitals in the U.S. This dataset es- sentially covers all major U.S. hospitals. This dataset is used for a variety of purposes, including determining whether hospitals should be fined for not providing high quality care to patients (see http://goo.gl/jAXFX for some background on this particular topic).

The Hospital Compare web site contains a lot of data and we will only look at a small subset for this assignment. The zip file for this assignment contains three files

A description of the variables in each of the files is in the included PDF file named Hospital_Revised_Flatfiles.pdf. This document contains information about many other files that are not included with this programming assignment. You will want to focus on the variables for Number 19 (“Outcome of Care Measures.csv”) and Number 11 (“Hospital Data.csv”). You may find it useful to print out this document (at least the pages for Tables 19 and 11) to have next to you while you work on this assignment. In particular, the numbers of the variables for each table indicate column indices in each table (i.e. “Hospital Name” is column 2 in the outcome-of-care-measures.csv file)

More information about the assignment here

Data zip file - link

Loading Packages

library(data.table)
library(dplyr)
library(ggplot2)
library(janitor)

1. Plot the 30-day mortality rates for heart attack

# reading data
outcome <- data.table::fread("data/outcome-of-care-measures.csv", colClasses = "character")

# preprocessing data for histogram
histogram_data <- outcome %>% 
    rename(death_rate_30_HA = 11) %>%
    mutate(death_rate_30_HA = suppressWarnings(as.numeric(death_rate_30_HA))) %>%
    select(death_rate_30_HA) %>%
    unlist()

# plot histogram
hist(histogram_data, 
    main = "Hospital 30-day Death (Mortality) Rates from Heart Attacks",
    xlab = "Deaths", 
    col = "red")

2. Finding the best hospital in a state

Write a function called best that take two arguments: the 2-character abbreviated name of a state and an outcome name. The function reads the outcome-of-care-measures.csv file and returns a character vector with the name of the hospital that has the best (i.e. lowest) 30-day mortality for the specified outcome in that state. The hospital name is the name provided in the Hospital.Name variable. The outcomes can be one of “heart attack”, “heart failure”, or “pneumonia”. Hospitals that do not have data on a particular outcome should be excluded from the set of hospitals when deciding the rankings.

best <- function(state, outcome) {
    # Read outcome data
    dt <- data.table::fread("data/outcome-of-care-measures.csv")
    
    # change outcome to lowercase
    outcome <- tolower(outcome)
    
    # change variable name to prevent confusion
    chosen_state <- state

    # Check state and outcome are valid, if not return warning message
    if (!chosen_state %in% unique(dt[["State"]])) {
        stop("Invalid state")
    }
    
    if (!outcome %in% c("heart attack", "heart failure", "pneumonia")) {
        stop("Invalid outcome")
    }

    dt <- dt %>% 
        rename_with(~ tolower(gsub("^Hospital 30-Day Death \\(Mortality\\) Rates from ", "", .x))) %>%
        filter(state == chosen_state) %>%
        mutate(rate = suppressWarnings(as.numeric(get(outcome)))) %>%
        clean_names() %>%
        select(hospital_name, state, rate) %>%
        filter(complete.cases(.)) %>%
        arrange(rate, hospital_name) %>%
        mutate(rank = row_number())  
    
    unlist(dt[1,1])
}

Sample outputs

best("TX", "heart attack")
##                      hospital_name 
## "CYPRESS FAIRBANKS MEDICAL CENTER"
best("MD", "pneumonia")
##                      hospital_name 
## "GREATER BALTIMORE MEDICAL CENTER"

3. Ranking hospitals by outcome in a state

Write a function called rankhospital that takes three arguments: the 2-character abbreviated name of a state (state), an outcome (outcome), and the ranking of a hospital in that state for that outcome (num). The function reads the outcome-of-care-measures.csv file and returns a character vector with the name of the hospital that has the ranking specified by the num argument. For example, the call rankhospital(“MD”, “heart failure”, 5) would return a character vector containing the name of the hospital with the 5th lowest 30-day death rate for heart failure. The num argument can take values “best”, “worst”, or an integer indicating the ranking (smaller numbers are better). If the number given by num is larger than the number of hospitals in that state, then the function should return NA. Hospitals that do not have data on a particular outcome should be excluded from the set of hospitals when deciding the rankings.

rankHospital <- function(state, outcome, num="best") {
    # Read outcome data
    dt <- data.table::fread("data/outcome-of-care-measures.csv")
    
    # change outcome to lowercase
    outcome <- tolower(outcome)
    
    # change variable name to prevent confusion
    chosen_state <- state

    # Check state and outcome are valid, if not return warning message
    if (!chosen_state %in% unique(dt[["State"]])) {
        stop("Invalid state")
    }
    if (!outcome %in% c("heart attack", "heart failure", "pneumonia")) {
        stop("Invalid outcome")
    }
    
    dt <- dt %>% 
        rename_with(~ tolower(gsub("^Hospital 30-Day Death \\(Mortality\\) Rates from ", "", .x))) %>%
        filter(state == chosen_state) %>%
        mutate(rate = suppressWarnings(as.numeric(get(outcome)))) %>%
        clean_names() %>%
        select(hospital_name, state, rate) %>%
        filter(complete.cases(.)) %>%
        arrange(rate, hospital_name) %>%
        mutate(rank = row_number())  

    if (num == "best") {
        unlist(head(dt[[1]], 1))
    }
    
    else if (num == "worst") {
        unlist(tail(dt[[1]], 1))
    }
    
    else {
        dt %>% 
            slice(num) %>%
            select(hospital_name) %>%
            unlist()
    }
}

Sample outputs

rankHospital("TX", "heart failure", "best")
## [1] "FORT DUNCAN MEDICAL CENTER"
rankHospital("MD", "heart attack", "worst")
## [1] "HARFORD MEMORIAL HOSPITAL"
rankHospital("MN", "heart attack", 5000) 
## character(0)

4. Ranking hospitals in all states

Write a function called rankall that takes two arguments: an outcome name (outcome) and a hospital ranking (num). The function reads the outcome-of-care-measures.csv file and returns a 2-column data frame containing the hospital in each state that has the ranking specified in num. For example the function call rankall(“heart attack”, “best”) would return a data frame containing the names of the hospitals that are the best in their respective states for 30-day heart attack death rates. The function should return a value for every state (some may be NA). The first column in the data frame is named hospital, which contains the hospital name, and the second column is named state, which contains the 2-character abbreviation for the state name. Hospitals that do not have data on a particular outcome should be excluded from the set of hospitals when deciding the rankings.

rankAll <- function(outcome, num = "best") {
    # Read outcome data
    dt <- data.table::fread("data/outcome-of-care-measures.csv")
    
    # change outcome to lowercase
    outcome <- tolower(outcome)
    
    # check if outcome is valid
    if (!outcome %in% c("heart attack", "heart failure", "pneumonia")) {
        stop('invalid outcome')
    }
    
    dt <- dt %>% 
        rename_with(~ tolower(gsub("^Hospital 30-Day Death \\(Mortality\\) Rates from ", "", .x))) %>%
        mutate(rate = suppressWarnings(as.numeric(get(outcome)))) %>%
        clean_names() %>%
        select(hospital_name, state, rate) %>%
        filter(complete.cases(.)) %>%
        group_by(state) %>%
        arrange(rate, hospital_name, .by_groups=TRUE) %>% 
        arrange(state) %>%
        mutate(rank = row_number()) 
    
    if (num == "best") {
        dt %>% 
            filter(rank == 1) %>%
            select(hospital_name, state)
    }
    
    else if (num == "worst") {
        dt %>%
            group_by(state) %>%
            filter(rank == max(rank)) %>%
            select(hospital_name, state)
    }
    
    else {
        dt %>%
            group_by(state) %>%
            filter(rank == num) %>%
            select(hospital_name, state)
    }
}

Sample outputs

head(rankAll("heart attack", 20), 5)
## # A tibble: 5 x 2
## # Groups:   state [5]
##   hospital_name                       state
##   <chr>                               <chr>
## 1 D W MCMILLAN MEMORIAL HOSPITAL      AL   
## 2 ARKANSAS METHODIST MEDICAL CENTER   AR   
## 3 JOHN C LINCOLN DEER VALLEY HOSPITAL AZ   
## 4 SHERMAN OAKS HOSPITAL               CA   
## 5 SKY RIDGE MEDICAL CENTER            CO
tail(rankAll("pneumonia", "worst"), 3)
## # A tibble: 3 x 2
## # Groups:   state [3]
##   hospital_name                              state
##   <chr>                                      <chr>
## 1 MAYO CLINIC HEALTH SYSTEM - NORTHLAND, INC WI   
## 2 PLATEAU MEDICAL CENTER                     WV   
## 3 NORTH BIG HORN HOSPITAL DISTRICT           WY
tail(rankAll("heart failure"), 10)
## # A tibble: 10 x 2
## # Groups:   state [10]
##    hospital_name                                                     state
##    <chr>                                                             <chr>
##  1 WELLMONT HAWKINS COUNTY MEMORIAL HOSPITAL                         TN   
##  2 FORT DUNCAN MEDICAL CENTER                                        TX   
##  3 VA SALT LAKE CITY HEALTHCARE - GEORGE E. WAHLEN VA MEDICAL CENTER UT   
##  4 SENTARA POTOMAC HOSPITAL                                          VA   
##  5 GOV JUAN F LUIS HOSPITAL & MEDICAL CTR                            VI   
##  6 SPRINGFIELD HOSPITAL                                              VT   
##  7 HARBORVIEW MEDICAL CENTER                                         WA   
##  8 AURORA ST LUKES MEDICAL CENTER                                    WI   
##  9 FAIRMONT GENERAL HOSPITAL                                         WV   
## 10 CHEYENNE VA MEDICAL CENTER                                        WY

Session info

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS  10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] janitor_2.0.1     ggplot2_3.3.2     dplyr_1.0.2       data.table_1.13.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.5       pillar_1.4.7     compiler_4.0.2   tools_4.0.2     
##  [5] digest_0.6.27    bit_4.0.4        lubridate_1.7.9  evaluate_0.14   
##  [9] lifecycle_0.2.0  tibble_3.0.4     gtable_0.3.0     pkgconfig_2.0.3 
## [13] rlang_0.4.8      cli_2.2.0        yaml_2.2.1       xfun_0.19       
## [17] withr_2.3.0      stringr_1.4.0    knitr_1.30       generics_0.1.0  
## [21] vctrs_0.3.5      bit64_4.0.5      grid_4.0.2       tidyselect_1.1.0
## [25] glue_1.4.2       snakecase_0.11.0 R6_2.5.0         fansi_0.4.1     
## [29] rmarkdown_2.5    purrr_0.3.4      magrittr_2.0.1   scales_1.1.1    
## [33] ellipsis_0.3.1   htmltools_0.5.0  assertthat_0.2.1 colorspace_2.0-0
## [37] utf8_1.1.4       stringi_1.5.3    munsell_0.5.0    crayon_1.3.4