Leveraging AI and automation to identify sensitive data at scale – Help Net Security

Posted: September 24, 2021 at 10:39 am

In this interview with Help Net Security, Apoorv Agarwal, CEO at Text IQ, talks about the risk of unstructured data for organizations and the opportunity to leverage AI and automation to identify sensitive data at scale.

Ideally, organizations should have a handle on where sensitive information is sitting in their data. In general, companies end up retaining the information they collect for a long time, even when they have no real use for this information. I think the problem boils down to a broader issue of data governance.

Its impossible to have strong data governance without some level of automation; for instance, the volume of data generated by enterprises is rising exponentially and relying on humans to take stock of all the sensitive information thats laying buried in their databaseundetected, and more often than not, in an unstructured formatsimply does not work at scale.

Data breaches and ransomware attacks will continue to happen, but organizations have a real opportunity to leverage AI, which gives them the ability to proactively identify sensitive and personal data at scale; once the data is identified, they can choose to redact, delete, encrypt or take whatever the necessary steps are to secure it so that it never falls into the wrong hands.

For one, up to 80% of enterprise data is unstructured the sheer size of its attack surface makes it very vulnerable to be targeted by bad actors. Secondly, this unstructured data is replete with all types of sensitive information: trade secrets, personal information, health information, intellectual property, etc; for instance, no one builds a structured database containing an organizations trade secretsits more likely lying scattered in emails, chats, Excel sheets and other forms of unstructured data.

The challenge presented by unstructured data is that it is voluminous and finding the sensitive information lying within it is like looking for the proverbial needle in the haystack. Finding those risky and sensitive needles requires machine learning techniques that are scalable.

Well, I think its obvious that data is growing at a faster rate than the human population. There are not enough humans, not enough time in the day for the volume and complexity of the task.

I think its also important to note that machines are not a point where you can just push a button and complete these tasks autonomously. They do need some help from humans. The job cannot be done by machines or humans alone.

It doesnt safeguard sensitive informationit identifies it. Once it has identified it, organizations can then take actions to safeguard it either by deleting, redacting, encrypting or changing the access controls to it.

The challenge is in the identification itself. The status quo, when it comes to identification, is based on antiquated approaches and technologies RegEx, search terms. Besides being slow and not very scalable, these labor-intensive approaches produce results that can be riddled with inaccuracies.

But not every 9 digit number is an SSN. AI, on the other hand, can look at the larger context of the information to more accurately determine if a piece of information is sensitive or not. As an example, consider email. When analyzing emails for sensitive information, AI has the ability to consider contexts such as who wrote it, who consumed, who was copied to it and the network of relationships between the people in the email chain in determining whether a part of the email is sensitive or not.

Now, theoretically, humans could triangulate all of these contexts, but theres not enough humans in the world to pull this off; and besides, humans arent good at computational tasks, they are better at abstract thinking.

They are very aware of it. No organization believes that its completely invulnerable to data breaches. It is very much top-of-mind at the board level.

Where they can improve is in the following: For too long, they have been relying on data loss prevention, search terms and manual review. They really need to pivot and tap into new technologies such as AI.

More:

Leveraging AI and automation to identify sensitive data at scale - Help Net Security

Related Posts