Where should one live in the US to minimize uninsured medical costs?

Tags: There are currently no tags associated with this assignment group.
Students analyze aggregate Medicare claim information from US hospitals to answer the question: where should someone live to minimize their medical costs?


The purpose of this assignment is to have students do both the data mining (using something like Pandas or R) and analysis (in report form). The health care data it relies on is fairly clean, but different years report slightly different service codes which requires consideration.

(Back to top)

CSC440 Data Mining & Visualization, Spring 2019



This is the first run of the assignment. Much of the instruction for the software (Python) was in class, though there were also weekly readings in both a book on Python for data analysis and visualizations in a data mining textbook.

Outcome summary

I was hoping students would use a number of visualizations, including geospatial maps, but the Python library we tried to use wasn't working in class. Several students included plots that could have been vastly improved, but we didn't spend much time on that in class. About half the students did a very nice deep dive into the data, as I was expecting. They almost all analyzed the suggested data in isolation, e.g., without looking at other sources of information. I wasn't expecting that, but I should have given how the assignment is worded. The formatting in some of the reports was wanting. In the future, I would offer students additional data sources (e.g., a standard of living multiplier or something similar). I would also spend more time going over visualizations and appropriate ways of using bar graphs, pie charts, etc. Finally, I would provide models of good and poor analyses for similar research questions.