By Rob Mitchum
Illustration courtesy of Cory Mollet and Juan-Pablo Velez/Open City

They went from thinking of a project as a data set to the actual people and problems behind them.”
—Rayid Ghani
Director, Data Science for Social Good Fellowship

The offices in downtown Chicago have all the trappings of a fashionable web startup: brightly colored walls covered in Post-it notes, bundles of cords on the floor and ceiling, a well-used pingpong table.

But a closer look at the graphs taped to the walls and the code scrolling across the monitors reveals a purpose deeper than the usual tech company concerns. Instead of tracking online commerce, these programmers, computer scientists, statisticians, and social scientists are tackling real-world problems in education, health care, disaster relief, and city services.

The vibrant atmosphere was home to the first summer of the Eric and Wendy Schmidt Data Science for Social Good Fellowship, a University of Chicago program that brought 40 undergraduate and graduate students to Chicago. In partnership with non-profit organizations and government agencies, the fellows are building data-driven tools to improve city services, predict medical emergencies or crime, measure the impact of early-childhood interventions and energy-efficient buildings, and much more.

“The goal for the program was to find people who are interested in data and analytics and want to use those skills to help society,” says Rayid Ghani, fellowship director and researcher with the Computation Institute and the Harris School for Public Policy. “The easy path for these students would have been to go get an internship at an Internet company or a finance startup. They instead wanted to do more; they felt it was important for them to do something that actually had an impact, rather than optimizing ads and selling more clicks.”

Ghani, who joined the University of Chicago earlier this year, was inspired to create the fellowship following his work as chief data scientist for the 2012 Obama for America campaign. Ghani and the campaign's data analytics team created new tools that used social media and email to raise funds, recruit volunteers, register voters and get supporters to the polls. After the election, Ghani sought to transfer that model of data-driven solutions and creativity to broader social issues; Google Chairman Eric Schmidt provided funding and agreed to be an advisor for the fellowship program.

The University of Chicago and offered a logical home for the fellowship, with multiple centers focusing on data-driven approaches to policy, urban design, and city operations. Co-organizers of the fellowships included the Computation Institute, the Urban Center for Computation and Data, and Chicago Harris,

“This is one of the few places that I’ve seen around the world where we have the right expertise in the systems sciences and the social sciences to bring together, and answer these questions in a way that's much more holistic than just looking at one or the other,” says Charlie Catlett, UrbanCCD director.

Tapping into an eager tech community

The organizers of the Data Science for Social Good fellowship knew they struck a nerve when they received more than 550 applications in only two weeks. From that pool, Ghani, Chicago Harris graduate student Matt Gee, alumnus Juan-Pablo Velez, and several other volunteer data scientists from the Chicago community chose the inaugural class of fellows: undergraduate and graduate students with backgrounds in computer science, statistics, public policy, and social science, including some from as far away as Israel and Mexico.

When they arrived in Chicago, the fellows were organized into 12 teams, each working on a project with a nonprofit or government partner. Partners were chosen from those that possessed real problems with social impact, a willingness to share data and interact with the fellows, and a commitment to using the new tools beyond one summer.

Even with those stringent qualifications, it wasn't difficult to find willing organizations, ranging from the NorthShore University HealthSystem and the Mesa Public School District to Chicago Department of Streets & Sanitation and the Chicago Transit Authority to the Qatar Computing Research Institute and The Case Foundation.

A broad range of impacts

At a special "data slam" event Aug. 20 at the Gleacher Center, the teams presented colorful Chicago maps, information-rich prototypes, giant “hairball” networks, and insightful Chicago maps in explaining the fruit of their summer's work.

One team tackled the educational issue of “under-matching,” the missed potential when under-privileged, qualified high school students fail to enroll in high-caliber colleges. Working with Mesa Public Schools and other school districts, fellows Edward Su, Nihar Shah, and Min Xu used data on grades, attendance, discipline, test scores, and more to identify kids at high risk of under-matching.

“This way we hope that we can find those students and intervene, preventing themselves from selling themselves short, and redirecting them toward brighter futures,” says Edward Su, a graduate student at MIT.

Another team—fellows Kayla Jacobs, Nathan Leiby, and Kwang-Sung Jun, and mentor Elena Eneva—used computational techniques to help sort text messages and tweets during crises and emergencies. Their partner, the nonprofit organization Ushahidi, launched in response to violence following the 2007 elections in Kenya, offers an open-source tool that maps and reports user-submitted reports during natural disasters, environmental emergencies, or other situations in which information is scarce. But since most of these submissions must still be manually read and sorted by human volunteers, the fellows are developing a tool for automatically categorizing messages according to language, location, and other criteria.

"We think our impact is to take a system that is important and useful, but has trouble to scaling, and use a computer to provide that scale," says Leiby, a software developer who has worked on projects in India and Haiti. "So instead of spending their time processing the reports, the volunteers can respond to them."

Closer to home, fellows worked with data from the City of Chicago to evaluate and improve city services such as trash collection and streetlight outages, while another team created a tool to help the recently launched Chicago bike share program Divvy anticipate when bike stands will become overcrowded or empty. Another project built a model for the Cook County Land Bank to suggest the abandoned properties to purchase and redevelop that will make the greatest positive impact on surrounding neighborhoods.

A new language of data and social good

As the projects took shape, the fellows not only learned their way around new software, analytic methods, and programming languages, but also how to learn from and work with partners on the specific problems that they face.

“It’s great to be able to hang out at the office and chat all day about machine learning or what binary classifier I want to use,” says Jacobs, a graduate student in computer science at Technion: Israel Institute of Technology. “But in order to make those solutions actually have an impact, it's important that the stakeholders, that the people using those solutions understand what they're getting and why it matters to them.”

Beyond the predictive models and analytic tools that will result from this summer's work, Ghani hopes that the fellowship will help seed a new international community of data scientists ready to attack social issues and work with social good organizations.

"A lot of the fellows, when they came in, really cared a lot about technical problems,” Ghani says. "But they went from thinking of a project as a data set to the actual people and problems behind them and the impact. I think that's what’s needed to create this set of people who care about the problem not just as a technical problem, but as something that has a much broader social impact.”

Originally published on September 9, 2013.