Over the last few years, we’ve heard a lot about how “Big Data”—which as far as I can tell is just data mining in a glossy new wrapper–are going to revolutionize science and help us create a better world.* These claims strike me as all too familiar. They remind me of the hype generated in the 1980s by chaos and in the 1990s by complexity (which was just chaos in a glossy new wrapper). Chaos and complexity enthusiasts promised (and are still promising) that ever-more-powerful computers plus jazzy new software and math were going to crack riddles that resisted more traditional scientific methods.
Advances in data-collection, computation and search programs have led to impressive gains in certain realms, notably speech recognition, language-translation and other traditional problems of artificial intelligence. So some of the enthusiasm for Big Data may turn out to be warranted. But in keeping with my crabby, glass-half-empty persona, in this post I’ll suggest that Big Data might be harming science, by luring smart young people away from the pursuit of scientific truth and toward the pursuit of profits.
My attention was drawn to this issue by a postdoc in neuroscience, whose research involves lots of data crunching. He prefers to remain anonymous, so I’ll call him Fred. After reading my recent remarks on the shakiness of the scientific literature, he wrote me to suggest that I look into a trend that could be exacerbating science’s woes.
“I think the big science journalism story of 2014 will be the brain drain from science to industry ‘data science,’” Fred writes. “Up until a few years ago, at least in my field, the best grad students got jobs as professors, and the less successful grad students took jobs in industry. It is now the reverse. It’s a real trend, and it’s a big deal. One reason is that science tends not to reward the graduate students who are best at developing good software, which is exactly what science needs right now…
“Another reason, especially important for me, is the quality of research in academia and in industry. In academia, the journals tend to want the most interesting results and are not so concerned about whether the results are true. In industry data science, [your] boss just wants the truth. That’s a much more inspiring environment to work in. I like writing code and analyzing data. In industry, I can do that for most of the day. In academia, it seems like faculty have to spend most of their time writing grants and responding to emails.”
Fred sent me a link to a blog post, “The Big Data Brain Drain: Why Science is in Trouble,” that expands on his concerns. The blogger, Jake VanderPlas, a postdoc in astrophysics at the University of Washington, claims that Big Data is, or should be, the future of science. He writes that “in a wide array of academic fields, the ability to effectively process data is superseding other more classical modes of research… From particle physics to genomics to biochemistry to neuroscience to oceanography to atmospheric physics and everywhere in-between, research is increasingly data-driven, and the pace of data collection shows no sign of abating.”
Vanderplas suggests that the growing unreliability of peer-reviewed scientific results, to which I alluded in my last post, may stem in part from the dependence of many research results on poorly written and documented software. The “crisis of irreproducibility” could be ameliorated, VanderPlas contends, by researchers who are adept at data-analysis and can share their methods with others.
The problem, VanderPlas says, is that academia is way behind Big Business in recognizing the value of data-analysis talent. “The skills required to be a successful scientific researcher are increasingly indistinguishable from the skills required to be successful in industry. While academia, with typical inertia, gradually shifts to accommodate this, the rest of the world has already begun to embrace and reward these skills to a much greater degree. The unfortunate result is that some of the most promising upcoming researchers are finding no place for themselves in the academic community, while the for-profit world of industry stands by with deep pockets and open arms.”
VanderPlas and Fred, who are are apparently software whizzes themselves, perhaps overstate the scientific potential of data crunching just a tad. And Fred’s aforementioned claim that industry “just wants the truth” strikes me as almost comically naïve. [**See Fred's clarification below.] For businesses, peddling products trumps truth–which makes the brain drain described by Fred and VanderPlas even more disturbing.
Fred is a case in point. Increasingly despondent about his prospects in brain research, he signed up for training from the Insight Data Science, which trains science Ph.D.s in data-manipulation skills that are desirable to industry (and claims to have a 100 percent job placement record). The investment paid off for Fred, who just got a job at Facebook.
*Should “Big Data” be treated as plural or singular? I polled my students, and they said plural, so I went with plural.
**Re his comment about industry bosses wanting “truth,” “Fred” just emailed me this clarification: “I think there is a distinction, which I perhaps should have made clearer, between ‘marketing’ and ‘analytics.’ When it comes to marketing a product to consumers, I agree it’s pretty obvious that business incentives are not aligned with truth telling. No one disputes that. But when it comes to the business’s internal ‘analytics’ team, the incentives are very aligned with truth telling. Analytics teams do stuff like: determining how users are interacting with the product, measuring trends in user engagement or sales, analyzing failure points in the product. This is the type of work that most data scientists do.”
***A couple of afterthoughts on this topic: First, Lee Vinsel, my Stevens colleague and former friend, points out in a comment below that industry has long lured scientists away from academia with promises of filthy lucre and freedom from the grind of tenure-and-grant-chasing. Yup. Wall Street “quants” are just one manifestation of this age-old phenomenon. So what’s new about the Big Data Brain Drain? Does it differ in degree or kind from previous academia-to-business brain drains? Good questions, Lee. I have no idea, but I bet Big Data can provide the answer! (Unless of course it’s subject to some sort of Godelian limit on self-analysis.)
Second, a fascinating implication of the rise of Big Data is that science may increasingly deliver power—that is, solutions to problems—without understanding. Big Data can, for example, help artificial intelligence researchers build programs that play chess, recognize faces and converse without knowing how human brains accomplish these tasks. The same could be true of problems in biology, physics and other fields. If science doesn’t yield insight, is it really science? (For a smart rebuttal of the notion that Big Data could bring about “the end of theory,” see the smart blog post mentioned below by Sabine Hossenfelder.)
0 comments:
Post a Comment