DARPA Is Developing a Search Engine for the Dark Web

Posted: February 10, 2015 at 11:47 am

A new search engine being developed by DARPA aims to shine a light on the dark web and uncover patterns and relationships in online data to help law enforcement and others track illegal activity.

The project, dubbed Memex, has been in the works for a year and is being developed by 17 different contractor teams who are working with the militarys Defense Advanced Research Projects Agency. Google and Bing, with search results influenced by popularity and ranking, are only able to capture approximately five percent of the internet. The goal of Memex is to build a better map of more internet content.

The main issue were trying to address is the one-size-fits-all approach to the internet where [search results are] based on consumer advertising and ranking, says Dr. Chris White, the program manager for Memex, who gave a demo of the engine to the 60 Minutes news program.

To achieve this goal, Memex will not only scrape content from the millions of regular web pages that get ignored by commercial search engines but will also chronicle thousands of sites on the so-called Dark Websuch as sites like the former Silk Road drug emporium that are part of the TOR networks Hidden Services.

These sites, which have .onion web addresses, are accessible only through the TOR browser and only to those who know a sites specific address. Although sites do exist that index some Hidden Services pagesoften around a specific topicand there is even already a search engine called Grams for uncovering sites selling illicit drugs and other contraband, the majority of Hidden Services remain well under the radar.

White says part of the Memex project is aimed at determining just how much of TOR traffic is related to Hidden Services sites. The best estimates before were in the single digitsin the one-thousands, he says. But we think there are, at any given time, between 30,000 and 40,000 Hidden Service Onion sites that have content on them that one could index.

The content on Hidden Services is publicin the sense that its not password protectedbut is not readily accessible through a commercial search engine. Were trying to move toward an automated mechanism of finding [Hidden Services sites] and making the public content on them accessible, White says. The DARPA team also wants to find a way to better understand the turnover of such sitesthe relationships that exist for example between two sites when one goes down and a seemingly unrelated site pops up.

But the creators of Memex dont want just to index content on previously undiscovered sites. They also want to use automated methods to analyze that content in order to uncover hidden relationships that would be useful to law enforcement, the military, and even the private sector. The Memex project currently has eight partners involved in testing and deploying prototypes. White wont say who the partners are but they plan to test the system around various subject areas or domains. The first domain they targeted were sites that appear to be involved in human trafficking. But the same technique could be applied to tracking Ebola outbreaks or any domain where there is a flood of online content, where youre not going to get it if you do queries one at a time and one link at a time, he says.

In a demo conducted for 60 Minutes, Whites team showed how law enforcement could possibly track the movement of peopleboth trafficked and traffickersbased on data related to online advertisements for sex. The 60 Minutes piece wasnt clear about how this was done and appeared to focus on the IP address of where the ads were hosted, implying that tracking where an ad moves from one IP address to another could reveal to law enforcement where the trafficker is located. But White says the IP address is the least important information they analyze. Instead they focus on other data points.

Sometimes its a function of IP address, but sometimes its a function of a phone number or address in the ad or the geolocation of a device that posted the ad, he says. There are sometimes other artifacts that contribute to location.

Continued here:
DARPA Is Developing a Search Engine for the Dark Web

Related Posts