This story was originally published by Wiredand is reproduced here as part of the Climate Desk collaboration.
OnFeb. 11, the white stone buildings on UC Berkeleys campus radiated with unfiltered sunshine. The sky was blue, the campanile was chiming. But instead of enjoying the beautiful day, 200 adults had willingly sardined themselves into a fluorescent-lit room in the bowels of Doe Library to rescue federal climate data.
Like similar groups across the country in more than 20 cities they believe that the Trump administration might want to disappear this data down a memory hole. So these hackers, scientists, and students are collecting it to save outside government servers.
But now theyre going even further. Groups like DataRefugeand the Environmental Data and Governance Initiative, which organized the Berkeley hackathon to collect data from NASAs Earth sciences programs and the Department of Energy, are doing more than archiving. Diehard coders are building robust systems to monitor ongoing changes to government websites. And theyre keeping track of whats been removed to learn exactly when the pruning began.
The data collection is methodical, mostly. About half the group immediately sets web crawlers on easily copied government pages, sending their text to the Internet Archive, a digital library made up of hundreds of billions of snapshots of webpages. They tag more data-intensive projects pages with lots of links, databases, and interactive graphics for the other group. Called baggers, these coders write custom scripts to scrape complicated data sets from the sprawling, patched-together federal websites.
Its not easy. All these systems were written piecemeal over the course of 30 years. Theres no coherent philosophy to providing data on these websites, says Daniel Roesler, chief technology officer at UtilityAPI and one of the volunteer guides for the Berkeley bagger group.
One coder who goes by Tek ran into a wall trying to download multi-satellite precipitation data from NASAs Goddard Space Flight Center. Starting in August, access to Goddard Earth Science Data required a login. But with a bit of totally legal digging around the site (DataRefuge prohibits outright hacking), Tek found a buried link to the old FTP server. He clicked and started downloading. By the end of the day he had data for all of 2016 and some of 2015. It would take at least another 24 hours to finish.
The non-coders hit dead-ends too. Throughout the morning they racked up 404 Page not found errors across NASAs Earth Observing System website. And they more than once ran across empty databases, like the Global Change Data Centers reports archive and one of NASAs atmospheric CO2 datasets.
And this is where the real problem lies. They dont know when or why this data disappeared from the web (or if anyone backed it up first). Scientists who understand it better will have to go back and take a look. But in the meantime, DataRefuge and EDGI understand that they need to be monitoring those changes and deletions. Thats more work than a human could do.
So theyre building software that can do it automatically.
Later that afternoon, two dozen or so of the most advanced software builders gathered around whiteboards, sketching out tools theyll need. They worked out filters to separate mundane updates from major shake-ups, and explored blockchain-like systems to build auditable ledgers of alterations. Basically its an issue of what engineers call version control how do you know if something has changed? How do you know if you have the latest? How do you keep track of the old stuff?
There wasnt enough time for anyone to start actually writing code, but a handful of volunteers signed on to build out tools. Thats where DataRefuge and EDGI organizers really envision their movement going a vast decentralized network from all 50 states and Canada. Some volunteers can code tracking software from home. And others can simply archive a little bit every day.
By the end of the day, the group had collectively loaded 8,404 NASA and DOE webpages onto the Internet Archive, effectively covering the entirety of NASAs Earth science efforts. Theyd also built backdoors in to download 25 gigabytes from 101 public datasets, and were expecting even more to come in as scripts on some of the larger datasets (like Teks) finished running. But even as they celebrated over pints of beer at a pub on Euclid Street, the mood was somber.
There was still so much work to do. Climate change data is just the tip of the iceberg, says Eric Kansa, an anthropologist who manages archaeological data archiving for the nonprofit group Open Context. There are a huge number of other datasets being threatened with cultural, historical, sociological information. A panicked friend at the National Parks Service had tipped him off to a huge data portal that contains everything from park visitation stats to GIS boundaries to inventories of species. While he sat at the bar, his computer ran scripts to pull out a list of everything in the portal. When its done, hell start working his way through each quirky dataset.
Go here to read the rest:
Coders to the rescue for NASA's Earth science data - Grist
- 2D Laser Profiling Scanner for Detecting Targets - November 8th, 2009 [November 8th, 2009]
- NASA Energy Concept Could Harness the Power of Ocean Waves - November 8th, 2009 [November 8th, 2009]
- Data Acquisition Modules - November 8th, 2009 [November 8th, 2009]
- Dr. Scott Barthelmy, Research Scientist, Laboratory for High Energy Astrophysics, Goddard Space Flight Center, Greenbelt, MD - November 8th, 2009 [November 8th, 2009]
- Laser Tracker Ensures Accurate Alignment of Ares I Components - November 8th, 2009 [November 8th, 2009]
- Dual Cryogenic Capacitive Density Sensor - November 8th, 2009 [November 8th, 2009]
- Advanced Technologies Will Help Hubble Yield More Remarkable Discoveries - November 8th, 2009 [November 8th, 2009]
- Dr. Gerard Holzmann, Senior Research Scientist at the Laboratory for Reliable Software, NASA’s Jet Propulsion Laboratory - November 8th, 2009 [November 8th, 2009]
- NASA Research Will Help Aircraft Avoid Ocean Storms and Turbulence - November 8th, 2009 [November 8th, 2009]
- NASA Awards 2008 Software of the Year - November 8th, 2009 [November 8th, 2009]
- Here Come The Tricorders - Update - November 8th, 2009 [November 8th, 2009]
- China's View on Space - November 8th, 2009 [November 8th, 2009]
- Milsat Coordination and Tracking Issues - November 8th, 2009 [November 8th, 2009]
- Trash Talking and End Runs at NASA HQ - November 8th, 2009 [November 8th, 2009]
- Ares 1-Y is Toast - November 8th, 2009 [November 8th, 2009]
- Beyond Augustine - November 8th, 2009 [November 8th, 2009]
- Analyzing LCROSS' Plume - November 8th, 2009 [November 8th, 2009]
- Live Event: NASA-Sponsored Power Beaming Challenge - November 8th, 2009 [November 8th, 2009]
- JSC Wants To Build a Replicator - November 8th, 2009 [November 8th, 2009]
- USA: Looking For Ways To Hang On - November 8th, 2009 [November 8th, 2009]
- Lunar Lander Challenge Prizes Awarded - November 8th, 2009 [November 8th, 2009]
- Senate Votes To Restore NASA Budget Cuts - November 8th, 2009 [November 8th, 2009]
- New FAA Regs for Commercial Reentry - November 8th, 2009 [November 8th, 2009]
- TEDxNASA: An Invitation-Only NASA Meeting - Unless You Are Lucky - November 8th, 2009 [November 8th, 2009]
- Close Call For Courtney Stadd - November 8th, 2009 [November 8th, 2009]
- Space: A Waste? - November 8th, 2009 [November 8th, 2009]
- Making NASA Cool - November 8th, 2009 [November 8th, 2009]
- Engaging JSC’s Next Gen: A Leadership Analysis - November 8th, 2009 [November 8th, 2009]
- Dumpster Diving for Rockets - November 8th, 2009 [November 8th, 2009]
- TEDx NASA - November 8th, 2009 [November 8th, 2009]
- Reflections On a Business Trip in Huntsville - November 8th, 2009 [November 8th, 2009]
- Staying the Course - November 8th, 2009 [November 8th, 2009]
- The Economics of Space - November 8th, 2009 [November 8th, 2009]
- Ideas at Work - November 8th, 2009 [November 8th, 2009]
- Blah Blah Blah - Why We Should Care About Social Media - November 8th, 2009 [November 8th, 2009]
- Will White House Speak Soon About NASA? - December 12th, 2009 [December 12th, 2009]
- Software Aids Design of Ares V Composite Shroud Structure - December 13th, 2009 [December 13th, 2009]
- ASDX Series of silicon pressure sensors - December 13th, 2009 [December 13th, 2009]
- Industry Update: Analysis & Simulation Software - December 13th, 2009 [December 13th, 2009]
- Battery Will Provide Backup Power for Space Shuttles - December 13th, 2009 [December 13th, 2009]
- NASA Employee Claims To Have Witnessed Hijacking Planning - December 13th, 2009 [December 13th, 2009]
- Big Party in The Mojave Tonight - December 13th, 2009 [December 13th, 2009]
- Looking at Boulders on the Moon - December 13th, 2009 [December 13th, 2009]
- SpaceBook Featured by White House - December 13th, 2009 [December 13th, 2009]
- New Ways to Use Constellation Stuff - December 13th, 2009 [December 13th, 2009]
- LaRC internal Poll Update - December 13th, 2009 [December 13th, 2009]
- Coalition for Space Exploration Does a (Much Needed) Reboot - December 13th, 2009 [December 13th, 2009]
- Lunar Orbiter: Comparing Old and New Images - December 13th, 2009 [December 13th, 2009]
- Boulder Trails On The Moon - December 13th, 2009 [December 13th, 2009]
- Vote for John Grunsfeld - National Geographic Adventurer of the YeAR - December 13th, 2009 [December 13th, 2009]
- Charlie Bolden at WIA/AIAA - December 13th, 2009 [December 13th, 2009]
- Live Webcast From The Lunar Orbiter Image Recovery Project - December 13th, 2009 [December 13th, 2009]
- Big Aerospace Warns of Job Cut Impact - December 13th, 2009 [December 13th, 2009]
- The Boulders of Copernicus - December 13th, 2009 [December 13th, 2009]
- shame on us - December 13th, 2009 [December 13th, 2009]
- 2009 Space Elevator Games - December 13th, 2009 [December 13th, 2009]
- Random Hacks of Kindness - December 13th, 2009 [December 13th, 2009]
- TEDx NASA Tickets Available to the Public - December 13th, 2009 [December 13th, 2009]
- It’s better in person - December 13th, 2009 [December 13th, 2009]
- Leading Amidst the Disruptive Innovation Storm - December 13th, 2009 [December 13th, 2009]
- Space: What’s NOT to Hope for? - December 13th, 2009 [December 13th, 2009]
- Government in the Digital Age - December 13th, 2009 [December 13th, 2009]
- SpaceUp – A Space Unconference - December 13th, 2009 [December 13th, 2009]
- Starfleet Academy? - December 13th, 2009 [December 13th, 2009]
- Crowdsourcing NASA - December 13th, 2009 [December 13th, 2009]
- Bringing Home The Bacon - December 14th, 2009 [December 14th, 2009]
- Anti-Space Mom with Pro-Space Kids - December 14th, 2009 [December 14th, 2009]
- How Quickly We Forget - December 14th, 2009 [December 14th, 2009]
- WISE Launch A Success - December 14th, 2009 [December 14th, 2009]
- Dynetics Buys Orion Propulsion - December 15th, 2009 [December 15th, 2009]
- New NASA Governance Structure Under Development - December 16th, 2009 [December 16th, 2009]
- Bolden Meets With Obama on Wednesday - December 16th, 2009 [December 16th, 2009]
- MSFC Procurement Doesn't Understand what "Open Source" Means - December 16th, 2009 [December 16th, 2009]
- Bolden Meets With Obama - December 17th, 2009 [December 17th, 2009]
- Parker Griffith AT MSFC Today - December 18th, 2009 [December 18th, 2009]
- Why Your NASA Computer May Not Work Properly - December 18th, 2009 [December 18th, 2009]
- Lakes and Fog on Titan - December 18th, 2009 [December 18th, 2009]
- Waterworld Found - December 18th, 2009 [December 18th, 2009]
- Pandora Could Exist - December 18th, 2009 [December 18th, 2009]
- Laurie Leshin Is The New ESMD Deputy AA - December 18th, 2009 [December 18th, 2009]