Wherever you look, there is talk of the revolution being brought about by Data Science. But is Data Science just a fad, as its critics contend, or is there something at its core that is here to stay with us? Well before the term was coined, particle physicists were driven to Big-Data challenges by the necessity of their large-scale detectors and produced datasets. Now, however, we live in a world where not just the amount, but also the diversity and complexity of digital information continues to grow exponentially. Materials simulations and astronomy images are pushing the boundaries of exploration. Social networks enable the exchange of information between people; medical devices, GPS devices and bar code scanners allow the exchange of information between machines.
Importantly, “bigger is not just better, it is different:” This is what researchers and entrepreneurs across the academic, government, and private sectors realize as the Data Science revolution continues to unfold and affect our scientific inquiry process. Figuring out how to efficiently, effectively, and reliably extract new knowledge from big and/or complex datasets presents us with new algorithmic and practical challenges, but also enables us to ask new questions.
At Northwestern, we address Data Science from a different perspective. We contend that Data Science is not about the absolute size of the data but about a change in scale. Indeed, we believe that all knowledge creation fields are being transformed by the relative increase in scale of the data available to scholars: electronic health records now store millions of charts on millions of patients; astronomy will be revolutionized by a 3.2 billion-pixel camera collecting the equivalent of the whole Netflix movie database every three months for ten years; the Google Books project has made millions of books available in digital form to researchers. All these data, typically, tens to hundreds of times larger than what was recently available within disciplines, bring about many opportunities for discovery, but also pose enormous challenges to the uninitiated and even to experts.