Wherever you look, there is talk of the revolution being brought about by "Big Data." But is Big Data just a fad, as its critics contend, or is there something at its core that is here to stay with us? Well before the term was coined, particle physicists were driven to Big-Data challenges by the necessity of their large-scale detectors and produced datasets. Now however, we live in a world where not just the amount, but also the diversity and complexity of digital information continues to grow exponentially. Materials simulations and astronomy images are pushing the boundaries of exploration. Social networks enable the exchange of information between people; medical devices and e-commerce record the exchange of information between people and machines; GPS devices and bar code scanners allow the exchange of information between machines.
Importantly, “bigger is not just better, it is different:" This is what researchers and entrepreneurs across the academic, government, and private sectors realize as the “Big Data” revolution continues to unfold and affect our scientific inquiry process. Figuring out how to efficiently, effectively, and reliably extract new knowledge from big and/or complex datasets presents us with new algorithmic and practical challenges, but also enables us to ask new questions.
At Northwestern, we address Big Data from a different perspective. That is why we prefer the term "Data Science" to Big Data. We contend that Data Science is not about the absolute size of the data but about a change in scale. Indeed, we believe that all knowledge creation fields are being transformed by the relative increase in scale of the data available to scholars: electronic health records now store millions of charts on millions of patients; astronomy will be revolutionized by a 3.2 billion-pixel camera collecting the equivalent of the whole Netflix movie database every three months for ten years; the Google Books project has made millions of books available in digital form to researchers. All these data, typically, tens to hundreds of times larger than what was recently available within disciplines, bring about many opportunities for discovery, but also pose enormous challenges to the uninitiated and even to experts.
Data Science, we believe, is about the opportunities for creating new knowledge brought about by the massive increase in scale of available digital information. Importantly, Data Science offers opportunities across a large number of disciplines. Consider three examples: journalism, sociology, and material science. Journalists’ access to digital repositories of political campaign contributions, of federal, state and local contract assignment, or of voting patterns could enable them to identify trends that could restore openness to a society where the number of political, social, and economic transactions has become overwhelming. Sociologists’ ability to conduct Web-based experimental studies involving tens of thousands to millions of individuals will revolutionize our understanding of social processes. Material scientists’ creation of repositories of material properties for tens of thousands of alloys and meta-materials will enable them to uncover patterns that would be hidden to an individual researcher with knowledge of the properties of a small number of materials.
What really sets Northwestern apart are Chicago’s unmatched conditions for Data Science activities. Chicagoland is home to two world-renowned universities and two national labs. Northwestern has close relations with the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications (NCSA), with the newly funded partnership with UI LABS for digital manufacturing, with the regional partnership in Advanced Computing. NCSA also acts as the Big Data repository for a large number of projects in the physical sciences in direct collaboration with NU’s StarLight.
The City of Chicago has been at the forefront of making the data it collects available to the public, prompting numerous initiatives by scholars and entrepreneurs. Several initiatives are aimed at using data for social good, something that aligns well with the goals of Northwestern’s Strategic Plan.
Chicagoland has gained a reputation for startups within the area of analytics that is rivaled only by the Bay Area.
Northwestern's Data Science Initiative was lead initially by five faculty members (Amaral, Kalogera, Starren, Uzzi, and Zettelmeyer) representing the four largest schools (Arts and Sciences, Engineering, Management, and Medicine). Currently, the Initiative is led by a Steering Committee, chaired by Amaral, representative of stakeholders in Data Science at the University.