The inpatient
dataset contains the following variables:
Variable | Description |
---|---|
DRG Definition |
The code and description identifying the MS-DRG. MS-DRGs are a classification system that groups similar clinical conditions (diagnoses) and the procedures furnished by the hospital during the stay. |
Provider Id |
The CMS Certification Number (CCN) assigned to the Medicare certified hospital facility. |
Provider Name |
The name of the provider. |
Provider Street Address |
The provider’s street address. |
Provider City |
The city where the provider is located. |
Provider State |
The state where the provider is located. |
Provider Zip Code |
The provider’s zip code. |
Provider HRR |
The Hospital Referral Region (HRR) where the provider is located. |
Total Discharges |
The number of discharges billed by the provider for inpatient hospital services. |
Average Covered Charges |
The provider’s average charge for services covered by Medicare for all discharges in the MS-DRG. These will vary from hospital to hospital because of differences in hospital charge structures. |
Average Total Payments |
The average total payments to all providers for the MS-DRG including the MS-DRG amount, teaching, disproportionate share, capital, and outlier payments for all cases. Also included in average total payments are co-payment and deductible amounts that the patient is responsible for and any additional payments by third parties for coordination of benefits. |
Average Medicare Payments |
The average amount that Medicare pays to the provider for Medicare’s share of the MS-DRG. Average Medicare payment amounts include the MS-DRG amount, teaching, disproportionate share, capital, and outlier payments for all cases. Medicare payments DO NOT include beneficiary co-payments and deductible amounts nor any additional payments from third parties for coordination of benefits. |
For this short activity, you will first think about a question on your own and then share your thoughts with a group. Your group members are the other people sitting at your table (there should be at least three students per group). After each group member shares their thoughts, you will come to an agreement about it.
To catalog everyone’s responses to this activity, you will be posting in a Github Issue created for your group. The links to the six threads are:
Spend a couple of minutes reading about the variables in the inpatient
dataset in the Data contents section. In addition, run inpatient
in an R code block by itself so that you can look at some of the rows in the dataset. Identify any columns that do not appear to satisfy the principles of tidy data. When told by the instructor, post them in your group’s corresponding Github Issue thread.
Have each group member share what he or she posted on the group discussion thread. After sharing, being discussing with your group members which columns do not satisfy the tidy data rules. Come to an agreement on which columns need to be made tidy. Then, brainstorm which of these tidyr
functions are needed for this task: gather()
, spread()
, separate()
, and unite()
.
Select a group member to post on the group discussion thread which columns you all agree should be made tidy, what type of dataset reshaping you need to do to make it tidy (describe this using plain English), and which tidyr
functions you need to accomplish this job.
Follow along with the instructor to see how to complete the tidying phase for this dataset.
It is good practice to begin the data exploration phase by asking some simple questions about the dataset. We will take that approach for this in-class activity. For today, you will be shown examples about the kinds of questions you can ask while exploring a dataset. You will then think about how you might go about addressing that question.
Looking at the table in Data contents, we remind ourselves that Average Covered Charges
(now renamed average_covered_charges
) is what the hospital bills for a procedure and that Average Medicare Payments
(now renamed averaged_medicare_payments
) is what Medicare actually reimburses.
Discuss with your group what kind of visualization you could make with ggplot2
that could help you answer this question. The visualization should be built using the two variables mentioned above. Come to an agreement, and then post the code snippet you agree on in your group discussion thread (note, you can use the backticks to create code blocks on Github too).
After all groups have submitted their agreed upon code snippets to the discussion thread, the instructor will review the answers.
Be sure to update/annotate your personal RMarkdown file to show the visualization.
To calculate the reimbursement fraction, we want to calculate a new column in the dataset, which we’ll name reimbursement_fraction
, by dividing the average Medicare payments column by the average charges billed column.
Follow along and write the working code in your RMarkdown file.
For the column reimbursement_fraction
that we just created, explain in a sentence what a value close to 0 means. Using a second sentence, explain what a value close to 1 means.
Joint replacements are coded as MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTREMITY
under the diagnosis
column.
Spend a few minutes thinking of how we can filter down the dataset so that only entries corresponding to joint replacement will remain. Write one line of code that does this and assign this dataset to the variable inpatient_joint_replacement
.
Don’t forget to save, stage, commit, and push your code to Github before leaving class! You will be using your RMarkdown file again on Tuesday, October 10. Your participation in the group discussion threads during class as well as your personal RMarkdown document will be graded as an assignment.