Data Adventures II

Data contents
Tidy data activity
Data exploration
Clean-up

Data contents

The inpatient dataset contains the following variables:

Variable	Description
`DRG Definition`	The code and description identifying the MS-DRG. MS-DRGs are a classification system that groups similar clinical conditions (diagnoses) and the procedures furnished by the hospital during the stay.
`Provider Id`	The CMS Certification Number (CCN) assigned to the Medicare certified hospital facility.
`Provider Name`	The name of the provider.
`Provider Street Address`	The provider’s street address.
`Provider City`	The city where the provider is located.
`Provider State`	The state where the provider is located.
`Provider Zip Code`	The provider’s zip code.
`Provider HRR`	The Hospital Referral Region (HRR) where the provider is located.
`Total Discharges`	The number of discharges billed by the provider for inpatient hospital services.
`Average Covered Charges`	The provider’s average charge for services covered by Medicare for all discharges in the MS-DRG. These will vary from hospital to hospital because of differences in hospital charge structures.
`Average Total Payments`	The average total payments to all providers for the MS-DRG including the MS-DRG amount, teaching, disproportionate share, capital, and outlier payments for all cases. Also included in average total payments are co-payment and deductible amounts that the patient is responsible for and any additional payments by third parties for coordination of benefits.
`Average Medicare Payments`	The average amount that Medicare pays to the provider for Medicare’s share of the MS-DRG. Average Medicare payment amounts include the MS-DRG amount, teaching, disproportionate share, capital, and outlier payments for all cases. Medicare payments DO NOT include beneficiary co-payments and deductible amounts nor any additional payments from third parties for coordination of benefits.

Tidy data activity

For this short activity, you will first think about a question on your own and then share your thoughts with a group. Your group members are the other people sitting at your table (there should be at least three students per group). After each group member shares their thoughts, you will come to an agreement about it.

Github Issue Threads

To catalog everyone’s responses to this activity, you will be posting in a Github Issue created for your group. The links to the six threads are:

Exercise: on your own (2–3 minutes)

Spend a couple of minutes reading about the variables in the inpatient dataset in the Data contents section. In addition, run inpatient in an R code block by itself so that you can look at some of the rows in the dataset. Identify any columns that do not appear to satisfy the principles of tidy data. When told by the instructor, post them in your group’s corresponding Github Issue thread.

Exercise: in your groups (10 minutes)

Have each group member share what he or she posted on the group discussion thread. After sharing, being discussing with your group members which columns do not satisfy the tidy data rules. Come to an agreement on which columns need to be made tidy. Then, brainstorm which of these tidyr functions are needed for this task: gather(), spread(), separate(), and unite().

Select a group member to post on the group discussion thread which columns you all agree should be made tidy, what type of dataset reshaping you need to do to make it tidy (describe this using plain English), and which tidyr functions you need to accomplish this job.

Data tidying demo

Follow along with the instructor to see how to complete the tidying phase for this dataset.

Data exploration

It is good practice to begin the data exploration phase by asking some simple questions about the dataset. We will take that approach for this in-class activity. For today, you will be shown examples about the kinds of questions you can ask while exploring a dataset. You will then think about how you might go about addressing that question.

Question: What is the relationship between what hospitals bill to Medicare versus what they get reimbursed?

Looking at the table in Data contents, we remind ourselves that Average Covered Charges (now renamed average_covered_charges) is what the hospital bills for a procedure and that Average Medicare Payments (now renamed averaged_medicare_payments) is what Medicare actually reimburses.

Exercise: In your group

Discuss with your group what kind of visualization you could make with ggplot2 that could help you answer this question. The visualization should be built using the two variables mentioned above. Come to an agreement, and then post the code snippet you agree on in your group discussion thread (note, you can use the backticks to create code blocks on Github too).

Class discussion

After all groups have submitted their agreed upon code snippets to the discussion thread, the instructor will review the answers.

Be sure to update/annotate your personal RMarkdown file to show the visualization.

Question: For each diagnosis, what fraction of the hospital’s billing price does Medicare actually reimburse?

To calculate the reimbursement fraction, we want to calculate a new column in the dataset, which we’ll name reimbursement_fraction, by dividing the average Medicare payments column by the average charges billed column.

Instructor-led demo of calculation

Follow along and write the working code in your RMarkdown file.

Exercise: on your own

For the column reimbursement_fraction that we just created, explain in a sentence what a value close to 0 means. Using a second sentence, explain what a value close to 1 means.

Question: Do all hospitals across the country charge Medicare approximately the same amount for joint replacements? Does Medicare reimburse uniformly for the same procedure?

Joint replacements are coded as MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTREMITY under the diagnosis column.

Exercise: on your own

Spend a few minutes thinking of how we can filter down the dataset so that only entries corresponding to joint replacement will remain. Write one line of code that does this and assign this dataset to the variable inpatient_joint_replacement.

Clean-up

Don’t forget to save, stage, commit, and push your code to Github before leaving class! You will be using your RMarkdown file again on Tuesday, October 10. Your participation in the group discussion threads during class as well as your personal RMarkdown document will be graded as an assignment.

Data Adventures II

October 4, 2017

Data contents

Tidy data activity

Github Issue Threads

Exercise: on your own (2–3 minutes)

Exercise: in your groups (10 minutes)

Data tidying demo

Data exploration

Question: What is the relationship between what hospitals bill to Medicare versus what they get reimbursed?

Exercise: In your group

Class discussion

Question: For each diagnosis, what fraction of the hospital’s billing price does Medicare actually reimburse?

Instructor-led demo of calculation

Exercise: on your own

Question: Do all hospitals across the country charge Medicare approximately the same amount for joint replacements? Does Medicare reimburse uniformly for the same procedure?

Exercise: on your own

Clean-up