I work for Stanford just down the road at Hopkins Marine Station on the Tagging of Pacific Predators Project. Basically we stick electronic sensor tags on everything from squid to blue whale to take a peek at what they do, and also at environmental parameters while they're doing it. And, in the end a fair bit of what we find ends up being used in establishing policy for various environmental and conservation efforts. We're arguably a "Big Data" driven project, integrating large geographic information and satellite remote sensing data sets with our own tag data; but, I think we all have at least some personal notion of what's meant by "Big Data." So for just a few of minutes I'm going to swap my computerist hat for my environmentalist hat and express some concerns centered around what seems like a fundamental question: are analysis methods that work great, for example, in linguistics, really applicable when it comes to, say, setting environmental policy? One thing you'll need to keep in mind here is the contentious nature of conservation work. We actually have received death threats at our leatherback sea turtle research station in Costa Rica. Not, as you might expect, from poachers, who, with a little prodding, are starting to embrace eco-tourism as a more sustainable means of income, but, from big monied beach front development interests, with strong ties to major US real estate companies. We even have our own personal little Rovian on-line swift-boat campaign to deal with. What that all boils down to is we can't afford even the slightest question about data quality, filtering, or analysis to maintain our credibility for making recommendations to policy makers. That said, some of you may be familiar with our on-line awareness raising event, "The Great Turtle Race." The race is really an after thought that came about while examining our incoming satellite data streams. Those arrive from sensors harnessed on sea turtles to collect data for use in confirming a hypothesized turtle migration corridor from Costa Rica past the Galapagos. Long story short, last June we published a report in the on-line journal Public Library of Science. And just three weeks ago, largely due to that report, the IUCN (International Union for Conservation of Nature) put forth a resolution to protect the critical areas as we defined them. So, we get to put up cool articles about it on our website, and it's all good. Well, all good in blog world, anyway. In the real world resolutions are a little more stark. And, the problems of handling data become very much like the problems of maintaining the chain of evidence in a criminal proceeding. For example, here is an issue that could have been used to call our quality control into question and kill our credibility. This is an overlay of temperature versus depth records obtained from diving sea turtles. Doesn't look too impressive, (it's not), but, it exposes an error in an on-board depth data compression algorithm that was missed by the sensor manufacturer until I spotted it and we pointed it out to them. The error was actually exposed because of accepted models of ocean thermal structure. Individual data records look reasonable in and of themselves, but, in aggregate just "not quite right" according to the models. Which leads to two "Big Data" questions: When can you drop the notion of needing models, hence the scientific method from your analysis? And, also, just how far will you go trusting others to filter your data for you? That's amusing, yes, but it's also an example of what happens when you turn the scientific method on its head and start looking for reasons for your data, instead of reason from your data, in essence ignoring an issue extremely important to policy makers, validation, which is another potential "Big Data" issue. Here's something a bit less Orwellian. This page is from a project by a gentleman who studies deer migration, and even though his data set is absolutely tiny, he can still be affected by issues at the cloud computing end of "Big Data". His processing system is literally an iPhone, which he uses to collect his satellite tag data, drops that into Google Spreadsheets for processing, and then onto Google Earth for display and assessment. Not much to go wrong. But, part of what is being looked at here is how the physical environment affects behavior. That means he has to trust Google Earth images are always representative of the current situation in his study region. Not a bad bet in this case, but for more dynamic studies, maybe not good. And, just how trusting can you be? As more and more reliance is put into the correctness of more and more remote data sets, are we setting ourselves up for perhaps even maliciously introduced data problems? Maybe something akin to "Big Data" man-in-the-middle attacks? Certainly our detractors wouldn't hesitate if they had the skills. And, of course, last but not least, what does this gentleman or any of us do when we finally get that call, as so eloquently expressed by Mick and Keith some 40 odd years ago?