OpenDNS, a company in San Francisco, has released a new model for threat detection called NLPRank. This is a predictive model that uses natural language processing to flag domains that may be involved with malicious activity such as phishing attacks. To accomplish this, the model looks at many different aspects of a site including autonomous system number (ASN) mappings, HTML tags, whois patterns and domain spoofing analysis. The goal is to protect companies from phishing attacks by possibly flagging fraudulent sites before they are even used.
The model builds a lexicon to identify malicious sites by analyzing legitimate sites and known spoofing’s of these sites. This gives a reference to flag new domains that contain patterns seen before in malicious sites. For example, there may be patterns in the whois data of spoofing sites created by a particular attacker or group.
One of the techniques used by this model to flag illegitimate domains is a minimum distance algorithm. This algorithm measures the number of edits using the operations insert, delete, and substitution to assign a value to a domain name. This value equates to the amount of changes needed to transform one into the other. The lower this number is the more likely it is to be a spoofed name. For example, the distance between google.com and g00gle.com would be 2 because 2 substitutions are required to change the name. This technique would be similar to something used for spell-checking and can provide a reference on the validity of a site.
This model has already shown to produce results. Kaspersky released a report about a group that has stolen $1 billion from banks in many countries. Before this report was released, they asked OpenDNS for information on the domains that were used in these attacks. Some of the domains had already been flagged by NLPRank without knowledge of these attacks.
Blog post created by the designer of NLPRank