Like all powerful technologies, data science isn’t inherently good or bad. Often, only the more nefarious and mercenary applications are reported in the media due to their sensational and incendiary nature. Subjects such as intelligence gathering, financial gain or even privacy invasion form the focus of many articles to the extent that many projects that will benefit us all are overlooked. Here, we will look at projects from Chicago University’s Data Science for Social Good (DSSG) Program which help make the world a better place.
Mining Electronic Medical Records to Combat Obesity
Heart disease has been the top killer worldwide for the past decade and the growing epidemic of obesity is projected to affect 42% of Americans by 2030. DSSG in partnership with North Shore University Health System aimed to change this through the analysis and modelling of thousands of Electronic Medical Records (EMRs) collected by North Shore University.
Childhood obesity is normally identified via growth charts – percentile curves illustrating height and weight change in the population. For example, in the chart above, the red dot denotes an 8 year old boy at 70 pounds who is heavier than 90% of American boys. Currently, there is only one version of these charts for each gender, meaning personal growth spurts and other individual factors are not accounted for. A ratio of weight in kg to height in metres squared, known as BMI or body mass index is another common measure used to identify if an individual is obese or overweight. Although these measures are good at identifying overweight individuals, a major problem with these measures is that they are inherently retrospective.
EMRs for 23,000 children (a sample of which can be seen above),followed over roughly 6 years formed the input to the obesity analysis. The first step taken was to explore the data and see how the North Shore records compared to CDC data for the entire US, show in the graph below where the dotted lines are the CDC figures and the solid lines are the North Shore aggregates, which on average are quite a bit higher than the general population.
Once the baseline analysis was complete they examined individual growth curves to determine if a child was obese by 5 could show whether an individual would be obese in later life.