Book Review on ‘The FOURTH PARADIGM’
Mazhar K. Laliwala
Assistant Professor in Physics
Gujarat Arts & Science College, Ahmedabad.
Title of the Book: The FOURTH PARADIGM-DATA INTENSIVE SCIENTIFIC DISCOVERY
EditEd by Tony Hey, STewarT TanSl ey, and KriSTin Tol l e
Publisher: Microsoft Research
The Fourth Paradigm is a continued conversation around data-intensive science, building upon an initial set of essays published in 2009, freely available for download at http://research.microsoft.com/en–us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf. The book ‘The FOURTH PARADIGM’ build on the vision of Jim Gray(Jim Gray was a researcher and manager of Microsoft Research’s eScience Group), a Turing Award-winning computer scientist lost at sea in 2007, for data-driven scientific discovery and how it can be realized. The original compilation featured over 70 contributors, with 26 invited essays grouped under four topics: Earth and Environment, Health and Wellbeing, Scientific Infrastructure, and Scholarly Communication. ‘The Fourth Paradigm’, a remarkable publication from Microsoft Research, is about discovery based on data-intensive science – a new kind of scientific exploration.
‘Hundreds of projects in fields ranging from genomics to computational linguistics to astronomy demonstrate a major shift in the scale at which scientific data are taken, and in how they are processed, shared and communicated to the world. Most significantly, there is a shift in how researchers find meaning in data, with sophisticated algorithms and statistical techniques becoming part of the standard scientific toolkit. The Fourth Paradigm is about this shift, how scientists are dealing with it, and some of the consequences. Its 30 chapters, written by some 70 authors, cover a wide range of aspects of data-intensive science.
”The book is in four parts. The first two parts are a panorama of the new ways in which data are obtained, through new instruments and large-scale sensor networks. The fields covered range from cosmology to the environment and from healthcare to biology. Most of the chapters in these sections follow a common pattern. Each introduces a complex system of scientific interest — the human brain, the worlds oceans, the global health system and so on — before supplying an explanation of how we are building an instrument or a network of sensors to map out that system comprehensively and, in some cases, to track its real-time behavior.
We learn in one chapter, for example about workflow and the impact of workflow tools on Data-centric research. We also come across general purpose open source workflow system include Taverna,Kepler,Pegasus and Triana for scientific application. In another chapter, I learn about new term ‘eScience’ where “IT meets scientists.”
The book is also about the next step in libraries: digital data libraries. It concerns the whole range of issues to do with curating, preserving and making accessible, now and in the future, scientific data.
I believe that we will soon see a time when data will live forever as archival media—just like paper-based storage— and be publicly accessible in the “cloud” to humans and machines. Only recently have we dared to consider such permanence for data, in the same way we think of “stuff” held in our national libraries and museums! Such permanence still seems far-fetched until you realize that capturing data provenance, including individual researchers’ records and sometimes everything about the researchers themselves, is what libraries insist on and have always tried to do. The “cloud” of magnetic polarizations encoding data and documents in the digital library will become the modern equivalent of the miles of library shelves holding paper and embedded ink particles.
What is envisaged here is an interlinked network of the world’s scientific knowledge in one big database.