Data now stream from daily life: from phones and credit cards and televisions and computers; from the infrastructure of cities; from sensor-equipped buildings, trains, buses, planes, bridges, and factories. The data flow so fast that the total accumulation of the past two years—a zettabyte—dwarfs the prior record of human civilization. “There is a big data revolution,” says Weatherhead University Professor Gary King. But it is not the quantity of data that is revolutionary. “The big data revolution is that now we can dosomething with the data.”
The revolution lies in improved statistical and computational methods, not in the exponential growth of storage or even computational capacity, King explains. The doubling of computing power every 18 months (Moore’s Law) “is nothing compared to a big algorithm”—a set of rules that can be used to solve a problem a thousand times faster than conventional computational methods could. One colleague, faced with a mountain of data, figured out that he would need a $2-million computer to analyze it. Instead, King and his graduate students came up with an algorithm within two hours that would do the same thing in 20 minutes—on a laptop: a simple example, but illustrative.
New ways of linking datasets have played a large role in generating new insights. And creative approaches to visualizing data—humans are far better than computers at seeing patterns—frequently prove integral to the process of creating knowledge. Many of the tools now being developed can be used across disciplines as seemingly disparate as astronomy and medicine. Among students, there is a huge appetite for the new field. A Harvard course in data science last fall attracted 400 students, from the schools of law, business, government, design, and medicine, as well from the College, the School of Engineering and Applied Sciences (SEAS), and even MIT. Faculty members have taken note: the Harvard School of Public Health (HSPH) will introduce a new master’s program in computational biology and quantitative genetics next year, likely a precursor to a Ph.D. program. In SEAS, there is talk of organizing a master’s in data science.
“There is a movement of quantification rumbling across fields in academia and science, industry and government and nonprofits,” says King, who directs Harvard’s Institute for Quantitative Social Science (IQSS), a hub of expertise for interdisciplinary projects aimed at solving problems in human society. Among faculty colleagues, he reports, “Half the members of the government department are doing some type of data analysis, along with much of the sociology department and a good fraction of economics, more than half of the School of Public Health, and a lot in the Medical School.” Even law has been seized by the movement to empirical research—“which is social science,” he says. “It is hard to find an area that hasn’t been affected.”
The story follows a similar pattern in every field, King asserts. The leaders are qualitative experts in their field. Then a statistical researcher who doesn’t know the details of the field comes in and, using modern data analysis, adds tremendous insight and value. As an example, he describes how Kevin Quinn, formerly an assistant professor of government at Harvard, ran a contest comparing his statistical model to the qualitative judgments of 87 law professors to see which could best predict the outcome of all the Supreme Court cases in a year. “The law professors knew the jurisprudence and what each of the justices had decided in previous cases, they knew the case law and all the arguments,” King recalls. “Quinn and his collaborator, Andrew Martin [then an associate professor of political science at Washington University], collected six crude variables on a whole lot of previous cases and did an analysis.” King pauses a moment. “I think you know how this is going to end. It was no contest.” Whenever sufficient information can be quantified, modern statistical methods will outperform an individual or small group of people every time.
In marketing, familiar uses of big data include “recommendation engines” like those used by companies such as Netflix and Amazon to make purchase suggestions based on the prior interests of one customer as compared to millions of others. Target famously (or infamously) used an algorithm to detect when women were pregnant by tracking purchases of items such as unscented lotions—and offered special discounts and coupons to those valuable patrons. Credit-card companies have found unusual associations in the course of mining data to evaluate the risk of default: people who buy anti-scuff pads for their furniture, for example, are highly likely to make their payments.
In the public realm, there are all kinds of applications: allocating police resources by predicting where and when crimes are most likely to occur; finding associations between air quality and health; or using genomic analysis to speed the breeding of crops like rice for drought resistance. In more specialized research, to take one example, creating tools to analyze huge datasets in the biological sciences enabled associate professor of organismic and evolutionary biology Pardis Sabeti, studying the human genome’s billions of base pairs, to identify genes that rose to prominence quickly in the course of human evolution, determining traits such as the ability to digest cow’s milk, or resistance to diseases like malaria.
King himself recently developed a tool for analyzing social media texts. “There are now a billion social-media posts every two days…which represent the largest increase in the capacity of the human race to express itself at any time in the history of the world,” he says. No single person can make sense of what a billion other people are saying. But statistical methods developed by King and his students, who tested his tool on Chinese-language posts, now make that possible. (To learn what he accidentally uncovered about Chinese government censorship practices, see “Reverse-engineering Chinese Censorship.”)
King also designed and implemented “what has been called the largest single experimental design to evaluate a social program in the world, ever,” reports Julio Frenk, dean of HSPH. “My entire career has been guided by the fundamental belief that scientifically derived evidence is the most powerful instrument we have to design enlightened policy and produce a positive social transformation,” says Frenk, who was at the time minister of health for Mexico. When he took office in 2000, more than half that nation’s health expenditures were being paid out of pocket—and each year, four million families were being ruined by catastrophic healthcare expenses. Frenk led a healthcare reform that created, implemented, and then evaluated a new public insurance scheme, Seguro Popular. A requirement to evaluate the program (which he says was projected to cost 1 percent of the GDP of the twelfth-largest economy in the world) was built into the law. So Frenk (with no inkling he would ever come to Harvard), hired “the top person in the world” to conduct the evaluation, Gary King.
Given the complications of running an experiment while the program was in progress, King had to invent new methods for analyzing it. Frenk calls it “great academic work. Seguro Popular has been studied and emulated in dozens of countries around the world thanks to a large extent to the fact that it had this very rigorous research with big data behind it.” King crafted “an incredibly original design,” Frenk explains. Because King compared communities that received public insurance in the first stage (the rollout lasted seven years) to demographically similar communities that hadn’t, the results were “very strong,” Frenk says: any observed effect would be attributable to the program. After just 10 months, King’s study showed that Seguro Popular successfully protected families from catastrophic expenditures due to serious illness, and his work provided guidance for needed improvements, such as public outreach to promote the use of preventive care.
King himself says big data’s potential benefits to society go far beyond what has been accomplished so far. Google has analyzed clusters of search terms by region in the United States to predict flu outbreaks faster than was possible using hospital admission records. “That was a nice demonstration project,” says King, “but it is a tiny fraction of what could be done” if it were possible for academic researchers to access the information held by companies. (Businesses now possess more social-science data than academics do, he notes—a shift from the recent past, when just the opposite was true.) If social scientists could use that material, he says, “We could solve all kinds of problems.” But even in academia, King reports, data are not being shared in many fields. “There are even studies at this university in which you can’t analyze the data unless you make the original collectors of the data co-authors.”
The potential for doing good is perhaps nowhere greater than in public health and medicine, fields in which, King says, “People are literally dying every day” simply because data are not being shared.
Rasalkhaimah, ras, al, khaimah, dubai, university, salford, manchester, @hishamsafadi, hisham, safadi, European, medical, center, business, entrepreneur, startup, economy, money, motivation, education, Leadership, Transactional, analysis, emotional, intelligence, organisations, development, innovative, technology, care, health, investor, investment, production, shark, tank, sharktank, USA, UK, London, group, european, canada, india, china, japan, KSA, projectmanagement, datascience, bigdata, IOT, internetofthings, cloud