You may not know it, but for the past several months OpenDNS has been building an incredible community of security enthusiasts.  We recently received an interesting site submission from someone in our security community that presented a challenge in obfuscation research.

Before I explain, here’s a bit of background. The first thing a malware author does before releasing malicious code is to check that popular antiviruses and content filters are not going to detect it. Evasion techniques are numerous and constantly evolving, and Web malware is especially hard to detect. Web browsers have become extremely complex, and with that complexity comes a lot of attack vectors, and a lot of possible obfuscation techniques.

Early on, Web filters simply scanned Web content for malicious links. As expected, malware authors quickly worked around these, by exploiting the fact that different parsers can interpret the same content in a totally different way.

These days, Javascript is by far the most common obfuscation vector for Web-based malware, and with good reason. Static analysis of Javascript code is possible, but it can be intentionally made to require a lot of non-parallelizable CPU cycles. More importantly, Javascript is tightly coupled with other components when in a Web browser. Adding a dependency on the DOM, on shared data, or on external resources is enough to make static analysis insufficient. Moreover, testing the code in a sandbox is an expensive and non-deterministic operation. This is why Web malware code is almost always served obfuscated.

So, should obfuscated code systematically be flagged as suspicious by content filters? Unfortunately, distinguishing legitimate web application code from malware code is far from trivial.

Modern web applications typically use compilers, compressors, DOM attributes, lazy/deferred loading and optimization techniques that make it difficult to distinguish obfuscated from unobfuscated code without raising an excessive number of false positives.

OpenDNS use a different approach. We are analyzing billion of DNS queries every day in order to detect suspicious behavior. This allows us to spot domains serving malware very quickly. Then, our research team analyzes the code itself, in order to proactively block other domain names the malware is going to spread to.

Let’s take a look at some real-world, obfuscated malware code we stumbled upon.

As expected, this is barely readable. However, it’s easy to see that a lot of code is just a distraction: extra parenthesis and numbers expressed in different ways are all around, but may not be actually useful.

The first thing we did was to filter this code through a Javascript compressor, namely UglifyJS.

Modern Javascript compressors do more transformations than just shortening functions and variable names. In particular, they can optimize the control flow, resolve constant expressions and remove dead code. This is a perfect fit for removing the noise from obfuscated code.

Compressing the code, and reformatting it makes it easier to understand how this code works.

There is a setup phase where variables are constructed by combining a lot previously defined variables initially containing just a few characters. At at some point, this forms code that is ready to be executed. And we want to trace the code from here.

In order to do so, we copied the code to a sandbox, and wrapped it in an object:

Javascript has a notion of “scopes”: a block of code is evaluated in a specific scope, and when a symbol has to be resolved, the current scope is searched first. If no match is found, the parent scope is then scanned, and the process keeps bubbling up to the root scope.

Running the malicious code in a dedicated scope makes it easy to override properties and functions in order to observe its behavior.

In this context, the code fails because “eval” hasn’t been defined, whereas it would have been if executed in the global (window) scope:

We learn that:

eval() is being used to run code.
eval() has been evaluated in the current context (presumably using an explicit reference to “this”), not in the global context.
Thus, we can easily override it in order to check what code is evaluated, and drop to the debugger whenever we find something that looks interesting to trace. This can also be used in order to simulate different web browsers.
There are numerous ways to load code from a remote server and to replace the current page with third-party content: image+canvas, inserting script tags, loading external stylesheets, XHR, frames…

But most of the functions required to do so can be overridden, including DOM manipulation functions.

As a starting point, we are going to override XMLHttpRequest() and eval().

This let us see what dynamically generated code is being evaluated, and what XHR queries are attempted.

It’s interesting to see that XHR requests are actually used to detect and work around debuggers (Firebug) and deobfuscation tools. Also note that “GE” is used instead of “GET”. This works fine in all major browser, yet this is something that content filters may not be aware of.

Running the code after these two XHR requests makes the browser immediately load a different page hosting different malicious code.

Since we diverted eval() calls, something else must have been used in order to achieve this result.

Let’s find out what. For starters, we can track the global scope for variables containing something which is likely to be related to an external resource, like “http” or “location”. A simple piece of code that we can run manually or as a watched expression can do the trick:

The “MxG” variable matches, and contains an obvious reference to malicious third-party content. We now have to track any function calls involving this variable or something composed with its content.

But there are no more XHR calls. No DOM manipulation. No more eval() calls. How does this code get executed?

Using a debugger, we can keep tracing the code. It can be reconstructed by replacing variables with their content.

Here is the reconstructed, unobfuscated, commented code:

As we can see, it’s using a combination of two interesting tricks in order to achieve the same result as eval() without actually using it.

The constructor of a Javascript function happens to be another function. And that function has the ability to construct a function out of a string.

And Javascript type coercion can leverage conversion functions. In particular, when a non-string object has to become a string, the toString() function is applied to the object itself.

The dynamic nature of Javascript allows overriding this toString() method. In this case, the code abuses this feature in order to evaluate the previously created function when a number has to be converted to a string, effectively achieving code execution.

eval() can thus be rewritten as:

evil=function(c,x){x={toString:''.constructor.constructor(c)};x+''}

If you’ve made it this far into the post you can see that it’s no longer enough to simply look for specific patterns in the code. It takes a keen eye, and behavioral analysis to detect malware given the advent of such sophisticated obfuscation techniques. This is just one way that OpenDNS is approaching malware research in order to evolve ahead of threats. Stay tuned in the coming weeks as the security research team will be sharing more details on how we look at the 40+ billion DNS queries we see each day.