Coders to the rescue for NASA’s Earth science data – Grist

This story was originally published by Wiredand is reproduced here as part of the Climate Desk collaboration.

OnFeb. 11, the white stone buildings on UC Berkeleys campus radiated with unfiltered sunshine. The sky was blue, the campanile was chiming. But instead of enjoying the beautiful day, 200 adults had willingly sardined themselves into a fluorescent-lit room in the bowels of Doe Library to rescue federal climate data.

Like similar groups across the country in more than 20 cities they believe that the Trump administration might want to disappear this data down a memory hole. So these hackers, scientists, and students are collecting it to save outside government servers.

But now theyre going even further. Groups like DataRefugeand the Environmental Data and Governance Initiative, which organized the Berkeley hackathon to collect data from NASAs Earth sciences programs and the Department of Energy, are doing more than archiving. Diehard coders are building robust systems to monitor ongoing changes to government websites. And theyre keeping track of whats been removed to learn exactly when the pruning began.

The data collection is methodical, mostly. About half the group immediately sets web crawlers on easily copied government pages, sending their text to the Internet Archive, a digital library made up of hundreds of billions of snapshots of webpages. They tag more data-intensive projects pages with lots of links, databases, and interactive graphics for the other group. Called baggers, these coders write custom scripts to scrape complicated data sets from the sprawling, patched-together federal websites.

Its not easy. All these systems were written piecemeal over the course of 30 years. Theres no coherent philosophy to providing data on these websites, says Daniel Roesler, chief technology officer at UtilityAPI and one of the volunteer guides for the Berkeley bagger group.

One coder who goes by Tek ran into a wall trying to download multi-satellite precipitation data from NASAs Goddard Space Flight Center. Starting in August, access to Goddard Earth Science Data required a login. But with a bit of totally legal digging around the site (DataRefuge prohibits outright hacking), Tek found a buried link to the old FTP server. He clicked and started downloading. By the end of the day he had data for all of 2016 and some of 2015. It would take at least another 24 hours to finish.

The non-coders hit dead-ends too. Throughout the morning they racked up 404 Page not found errors across NASAs Earth Observing System website. And they more than once ran across empty databases, like the Global Change Data Centers reports archive and one of NASAs atmospheric CO2 datasets.

And this is where the real problem lies. They dont know when or why this data disappeared from the web (or if anyone backed it up first). Scientists who understand it better will have to go back and take a look. But in the meantime, DataRefuge and EDGI understand that they need to be monitoring those changes and deletions. Thats more work than a human could do.

So theyre building software that can do it automatically.

Later that afternoon, two dozen or so of the most advanced software builders gathered around whiteboards, sketching out tools theyll need. They worked out filters to separate mundane updates from major shake-ups, and explored blockchain-like systems to build auditable ledgers of alterations. Basically its an issue of what engineers call version control how do you know if something has changed? How do you know if you have the latest? How do you keep track of the old stuff?

There wasnt enough time for anyone to start actually writing code, but a handful of volunteers signed on to build out tools. Thats where DataRefuge and EDGI organizers really envision their movement going a vast decentralized network from all 50 states and Canada. Some volunteers can code tracking software from home. And others can simply archive a little bit every day.

By the end of the day, the group had collectively loaded 8,404 NASA and DOE webpages onto the Internet Archive, effectively covering the entirety of NASAs Earth science efforts. Theyd also built backdoors in to download 25 gigabytes from 101 public datasets, and were expecting even more to come in as scripts on some of the larger datasets (like Teks) finished running. But even as they celebrated over pints of beer at a pub on Euclid Street, the mood was somber.

There was still so much work to do. Climate change data is just the tip of the iceberg, says Eric Kansa, an anthropologist who manages archaeological data archiving for the nonprofit group Open Context. There are a huge number of other datasets being threatened with cultural, historical, sociological information. A panicked friend at the National Parks Service had tipped him off to a huge data portal that contains everything from park visitation stats to GIS boundaries to inventories of species. While he sat at the bar, his computer ran scripts to pull out a list of everything in the portal. When its done, hell start working his way through each quirky dataset.

Go here to read the rest:

Coders to the rescue for NASA's Earth science data - Grist

2D Laser Profiling Scanner for Detecting Targets [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
NASA Energy Concept Could Harness the Power of Ocean Waves [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Data Acquisition Modules [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Dr. Scott Barthelmy, Research Scientist, Laboratory for High Energy Astrophysics, Goddard Space Flight Center, Greenbelt, MD [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Laser Tracker Ensures Accurate Alignment of Ares I Components [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Dual Cryogenic Capacitive Density Sensor [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Advanced Technologies Will Help Hubble Yield More Remarkable Discoveries [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Dr. Gerard Holzmann, Senior Research Scientist at the Laboratory for Reliable Software, NASA’s Jet Propulsion Laboratory [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
NASA Research Will Help Aircraft Avoid Ocean Storms and Turbulence [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
NASA Awards 2008 Software of the Year [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Here Come The Tricorders - Update [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
China's View on Space [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Milsat Coordination and Tracking Issues [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Trash Talking and End Runs at NASA HQ [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Ares 1-Y is Toast [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Beyond Augustine [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Analyzing LCROSS' Plume [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Live Event: NASA-Sponsored Power Beaming Challenge [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
JSC Wants To Build a Replicator [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
USA: Looking For Ways To Hang On [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Lunar Lander Challenge Prizes Awarded [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Senate Votes To Restore NASA Budget Cuts [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
New FAA Regs for Commercial Reentry [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
TEDxNASA: An Invitation-Only NASA Meeting - Unless You Are Lucky [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Close Call For Courtney Stadd [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Space: A Waste? [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Making NASA Cool [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Engaging JSC’s Next Gen: A Leadership Analysis [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Dumpster Diving for Rockets [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
TEDx NASA [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Reflections On a Business Trip in Huntsville [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Staying the Course [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
The Economics of Space [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Ideas at Work [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Blah Blah Blah - Why We Should Care About Social Media [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
Will White House Speak Soon About NASA? [Last Updated On: December 12th, 2009] [Originally Added On: December 12th, 2009]
Software Aids Design of Ares V Composite Shroud Structure [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
ASDX Series of silicon pressure sensors [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Industry Update: Analysis & Simulation Software [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Battery Will Provide Backup Power for Space Shuttles [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
NASA Employee Claims To Have Witnessed Hijacking Planning [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Big Party in The Mojave Tonight [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Looking at Boulders on the Moon [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
SpaceBook Featured by White House [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
New Ways to Use Constellation Stuff [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
LaRC internal Poll Update [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Coalition for Space Exploration Does a (Much Needed) Reboot [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Lunar Orbiter: Comparing Old and New Images [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Boulder Trails On The Moon [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Vote for John Grunsfeld - National Geographic Adventurer of the YeAR [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Charlie Bolden at WIA/AIAA [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Live Webcast From The Lunar Orbiter Image Recovery Project [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Big Aerospace Warns of Job Cut Impact [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
The Boulders of Copernicus [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
shame on us [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
2009 Space Elevator Games [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Random Hacks of Kindness [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
TEDx NASA Tickets Available to the Public [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
It’s better in person [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Leading Amidst the Disruptive Innovation Storm [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Space: What’s NOT to Hope for? [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Government in the Digital Age [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
SpaceUp – A Space Unconference [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Starfleet Academy? [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Crowdsourcing NASA [Last Updated On: December 13th, 2009] [Originally Added On: December 13th, 2009]
Bringing Home The Bacon [Last Updated On: December 14th, 2009] [Originally Added On: December 14th, 2009]
Anti-Space Mom with Pro-Space Kids [Last Updated On: December 14th, 2009] [Originally Added On: December 14th, 2009]
How Quickly We Forget [Last Updated On: December 14th, 2009] [Originally Added On: December 14th, 2009]
WISE Launch A Success [Last Updated On: December 14th, 2009] [Originally Added On: December 14th, 2009]
Dynetics Buys Orion Propulsion [Last Updated On: December 15th, 2009] [Originally Added On: December 15th, 2009]
New NASA Governance Structure Under Development [Last Updated On: December 16th, 2009] [Originally Added On: December 16th, 2009]
Bolden Meets With Obama on Wednesday [Last Updated On: December 16th, 2009] [Originally Added On: December 16th, 2009]
MSFC Procurement Doesn't Understand what "Open Source" Means [Last Updated On: December 16th, 2009] [Originally Added On: December 16th, 2009]
Bolden Meets With Obama [Last Updated On: December 17th, 2009] [Originally Added On: December 17th, 2009]
Parker Griffith AT MSFC Today [Last Updated On: December 18th, 2009] [Originally Added On: December 18th, 2009]
Why Your NASA Computer May Not Work Properly [Last Updated On: December 18th, 2009] [Originally Added On: December 18th, 2009]
Lakes and Fog on Titan [Last Updated On: December 18th, 2009] [Originally Added On: December 18th, 2009]
Waterworld Found [Last Updated On: December 18th, 2009] [Originally Added On: December 18th, 2009]
Pandora Could Exist [Last Updated On: December 18th, 2009] [Originally Added On: December 18th, 2009]
Laurie Leshin Is The New ESMD Deputy AA [Last Updated On: December 18th, 2009] [Originally Added On: December 18th, 2009]