Big Data vs. Data Science

Wherever you look, there is talk of the revolution being brought about by "Big Data." But is Big Data just a fad, as its critics contend, or is there something at its core that is here to stay with us? Well before the term was coined, particle physicists were driven to Big-Data challenges by the necessity of their large-scale detectors and produced datasets. Now however, we live in a world where not just the amount, but also the diversity and complexity of digital information continues to grow exponentially. Materials simulations and astronomy images are pushing the boundaries of exploration. Social networks enable the exchange of information between people; medical devices and e-commerce record the exchange of information between people and machines; GPS devices and bar code scanners allow the exchange of information between machines.

Importantly, “bigger is not just better, it is different:" This is what researchers and entrepreneurs across the academic, government, and private sectors realize as the “Big Data” revolution continues to unfold and affect our scientific inquiry process. Figuring out how to efficiently, effectively, and reliably extract new knowledge from big and/or complex datasets presents us with new algorithmic and practical challenges, but also enables us to ask new questions.

At Northwestern, we address Big Data from a different perspective. That is why we prefer the term "Data Science" to Big Data. We contend that Data Science is not about the absolute size of the data but about a change in scale. Indeed, we believe that all knowledge creation fields are being transformed by the relative increase in scale of the data available to scholars: electronic health records now store millions of charts on millions of patients; astronomy will be revolutionized by a 3.2 billion-pixel camera collecting the equivalent of the whole Netflix movie database every three months for ten years; the Google Books project has made millions of books available in digital form to researchers. All these data, typically, tens to hundreds of times larger than what was recently available within disciplines, bring about many opportunities for discovery, but also pose enormous challenges to the uninitiated and even to experts.

Data Science, we believe, is about the opportunities for creating new knowledge brought about by the massive increase in scale of available digital information. Importantly, Data Science offers opportunities across a large number of disciplines. Consider three examples: journalism, sociology, and material science. Journalists’ access to digital repositories of political campaign contributions, of federal, state and local contract assignment, or of voting patterns could enable them to identify trends that could restore openness to a society where the number of political, social, and economic transactions has become overwhelming. Sociologists’ ability to conduct Web-based experimental studies involving tens of thousands to millions of individuals will revolutionize our understanding of social processes. Material scientists’ creation of repositories of material properties for tens of thousands of alloys and meta-materials will enable them to uncover patterns that would be hidden to an individual researcher with knowledge of the properties of a small number of materials.

The Chicago Advantage

What really sets Northwestern apart are Chicago’s unmatched conditions for Data Science activities. Chicagoland is home to two world-renowned universities and two national labs. Northwestern has close relations with the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications (NCSA), with the newly funded partnership with UI LABS for digital manufacturing, with the regional partnership in Advanced Computing. NCSA also acts as the Big Data repository for a large number of projects in the physical sciences in direct collaboration with NU’s StarLight.

The City of Chicago has been at the forefront of making the data it collects available to the public, prompting numerous initiatives by scholars and entrepreneurs. Several initiatives are aimed at using data for social good, something that aligns well with the goals of Northwestern’s Strategic Plan.

Chicagoland has gained a reputation for startups within the area of analytics that is rivaled only by the Bay Area.

Leadership Group

Northwestern's Data Science Initiative was lead initially by five faculty members (Amaral, Kalogera, Starren, Uzzi, and Zettelmeyer) representing the four largest schools (Arts and Sciences, Engineering, Management, and Medicine). Currently, the Initiative is led by a Steering Committee, chaired by Amaral, representative of stakeholders in Data Science at the University.


Luis Amaral
Chemical and Biological Engineering
Northwestern Engineering

Vicky Kalogera
Physics & Astronomy
Weinberg College of Arts and Sciences
Justin Starren
Preventive Medicine-Health
Feinberg School of Medicine
Brian Uzzi
Leadership and Organizational Change
Kellogg School of Management
Florian Zettelmeyer
Marketing
Kellogg School of Management
Larry Birnbaum
Electrical Engineering and Computer Science
Northwestern Engineering
Cate Brinson
Mechanical Engineering
Northwestern Engineering
David Figlio
Education and Social Policy
School of Education and Social Policy
Kim Gray
Civil & Environmental Engineering
Northwestern Engineering
Larry Hedges
Statistics
Weinberg College of Arts and Sciences
Richard Morimoto
Molecular Biosciences
Weinberg College of Arts and Sciences
Beth McNally
Genetic Medicine
Feinberg School of Medicine
Bonnie Spring
Preventive Medicine-Behavioral Medicine
Feinberg School of Medicine