When CryptoLocker ransomware first appeared late last year, many in the security industry said the only way to fight against its strong encryption implementation was to regularly backup files and systems, essentially admitting defeat once CryptoLocker had attacked a system. Customers of San Francisco-based security vendor OpenDNS weren't forced to fly a white flag, however, according to Chief Technology Officer Dan Hubbard, because the company had already utilized big data analysis techniques to spot and block the communications stemming from ransomware, such as the
Continue Reading This Article
Enjoy this article as well as all of our content, including E-Guides, news, tips and more.
Hubbard, who will be speaking at the Interop Las Vegas event on Friday, said his company has been using techniques like machine learning and data mining to stop Internet threats before they have even been classified.
That's possible, he said, because much like security professionals and enterprises, attackers must build out the infrastructure needed to do their jobs. Attackers need to ensure command-and-control infrastructures are resilient, for example, or even test certain attack techniques to determine their efficacy.
Hubbard said this activity leaves "trails of data" that can be invaluable for the security industry.
"So there's a number of these breadcrumbs -- or 'threatcrumbs,' as we're calling them -- that attackers are leaving before the actual attack happens. We've been focusing on collecting all these features and attributes of the attackers early in their attack setup phase," said Hubbard, "and then training and tuning via machine learning to discover the domains, IP addresses, and locations of where these are in order to stop them before they happen."
In the case of CryptoLocker, those breadcrumbs come in the form of the randomly generated domains created every day as part of its command-and-control infrastructure.
Much like other ransomware, CryptoLocker encrypts files on a victim's system and then demands payment for the decryption key. To obtain that key, each CryptoLocker infection must connect to an attacker-controlled domain, of which there are up to a thousand created every day by computers utilizing a domain-generation algorithm (DGA).
DGA capabilities make it impossible for humans to manually block access to CryptoLocker's domain, according to Hubbard, but OpenDNS trained one of its classifiers to look specifically at how domains around the Internet are constructed. The classifier takes into account characteristics, such as how many characters are included in a domain name, the likelihood that certain characters would appear one after the other, when a domain was created, and who is visiting certain domains.
"We train the machine with a good corpus of domains and names, and then train it with a bad corpus of DGAs that have been used in the past," Hubbard said. "And then we can compute the likelihood of a domain being created by a computer or a human in real time."
Hubbard said OpenDNS' classifier blocked connections between machines infected by CryptoLocker and its domains before the infamous ransomware was even known. Without such a connection, CryptoLocker can still be installed on a victim's machine, but is left unable to grab the private key needed to encrypt any files, allowing targets to clean the ransomware off a machine with no harm done.
Most enterprises are incapable of implementing such domain classifiers themselves, simply because they don't have the necessary Internet-wide view of threats, admitted Hubbard, but that doesn't mean big data analytics doesn't open up any possibilities.
In the short term, Hubbard said most companies should focus on creating a centralized repository of data that can be queried to uncover exactly what is happening on a network.
"Sometimes that's an SIEM, which the majority of people we talk to haven't really got running, or working, or [have] deployed correctly. Sometimes it's a repository of syslog data," said Hubbard, "but being able to get that one big collection and being able to query it, especially with historical data, is super important for finding those needles in a haystack when you need them."