Malware Analysis, Part 2: Deobfuscating Code in a Word Macro

In part 1, we defined what code obfuscation is, what it could be used for, and we studied some code with a small amount of obfuscation. Now that we have covered the basics, we are going to move on to serious things with a more concrete case that we encounter every day—that is, a WORD file with a macro!

Office documents (Word, Excel, PowerPoint, etc.) and PDFs are the most commonly used formats for email-borne malware attacks. The main method used through these documents is the use of a macro, and this is what we are going to see today.

What is a macro and what can it be used for? To answer this question, there is nothing better than to quote the official definition from Microsoft:

A macro is a series of commands and instructions that you group within a single command to automatically execute a task.

A malicious Office file will therefore use a macro to execute code on the victim’s computer. Today we will see how macros can be obfuscated and how to understand what they are doing. The analysis will be more complicated than in the preceding article, but don’t panic, we will go through it step by step.

Lots of code for not very much...

Here we go, we’re off! The code that we will study comes from a macro contained in the Word file below.

File Type SHA1 VirusTotal VirusBay
Word a8a1b0191d35b4530afacab33cffb751d42d3d64

Link

Link

Code malveillant macro

As you can see (image above) the code of the macro is obfuscated. The obfuscation is rather successful, even though there are still some keywords that we can read in the code.

  • Line 7: We note a "md", "/V", ":ON/C" and further on the line a "s^e^t" (set). This gives us indications that the macro will execute code via the Windows command prompt.
  • Line 176: we can read a "for" and "in", which appears to be a loop.
  • Lines 178 and 181: there is a "do", "set", "if" and "equ". It all seems to indicate that the code will execute a loop with tests inside.

Finally, if we look carefully at the very beginning of line 176, we can read a "p"; and now let's look at line 174: if we read it backwards it gives "o", "wer", "sh", "ell". Concatenate everything and we get "powershell"! PowerShell code will be executed on the computer. Let's continue to learn what will actually happen.

The first step will be to simplify the code as much as possible by removing anything that serves no purpose. To do this, we will go function by function, so let's begin with the last one, here it is:

code malveillant macro image 2

With a little experience, we note that all the lines of code beginning with the keyword "Hour" are useless.  This what is called noise: this technique consists of adding dead code, which will not be used by the program, in order to complicate understanding of the code. Here we quickly notice the noise, but it is rarely the case in more advanced code obfuscations where it can be difficult to identify. Here is what the "Sub AutoOpen()" function gives after removing the noise:

code malveillant macro image3

A search for the keyword "AutoOpen" on the Microsoft website gives:

The AutoOpen macro runs after you open a new document. AutoOpen runs when you open a document in the following ways:        Use the Open command on the File menu.

Use the FileOpen or FileFind commands.

Select a document from the Most Recently Used list on the File menu.

When a document is opened, an AutoOpen macro runs if the AutoOpen macro is saved as part of that document or if the macro is saved as part of the template on which the document is based.

An AutoOpen macro does not run when it is saved as part of a global add-in.

You may prevent an AutoOpen macro from running by holding down the Shift key when you open a document.

So when the document is opened, the AutoOpen macro executes the code in it. Here, the macro is a single line (line 24). The keyword "She(ll" is a function that will execute a program. This function requires 2 arguments.

Here is its syntax:

Shell argument1, argument2
argument1: The name of the program that will run.
argument2: The focus of the program executed. Determines whether the running program will be hidden, what size it will have, etc…

In our code, the first argument is a sequence of added variables and if we look at the rest of the code it can be seen that they are returns from functions that will be added together and then executed. Let's simplify the rest of the code to see it more clearly.

code malveillant macro image 4

Let's take the function above for example, and look at the different steps:

  1. We delete the lines that begin with the keyword "Hour" as explained earlier.
  2. For the remaining lines (lines 7, 9, 12, 14 and 19), we add together the character strings. For example, a line written: variable = "Powe" + "rSh" + "ell" gives: variable = "PowerShell".
  3. Line 2 is only used while the code is running so we can delete it as well.
  4. Finally, on line 20 we concatenate the results obtained in the previous steps.

Here is what we get in the end:

code malveillant macro image6

Now we can perform the preceding steps on the rest of the code. The result obtained is below. We have gone from code with 200 lines to 23 lines! Not so bad, isn't it? We can still do much better.

code malveillant macro image7

We are going to do 3 steps at the same time.

  1. As we did previously, we can replace the return from each function directly in the "AutoOpen" macro.
  2. Then replace the "KeyString" function.
  3. Then delete the "^" characters.

The "KeyString()" function returns a key combination. In our example the function is asked to return the key or keys that correspond to the value 67. Here is the Microsoft documentation to see what this number is. The function will therefore return the 'C' key.

As you have probably noticed, the character strings contain several of the "^" character.

What is it used for? This character is used to escape other characters. Microsoft's definition:

An escape character is a single character that suppresses any special meaning of the character that follows it.

Example:

  • ^          →       a
  • ^^^     →     ^a
  • ^^^^^a →  ^^a

Here, since the code will not be executed, we have no need of it, so we can delete them.

Let's look at what the result is once the 3 steps have been done.

Code Malveillant Macro image8

Perfect! The code is beginning to resemble something, don't you think? Let's continue, we are on the right track!

There are still four variables that have not been replaced. The variables are not declared and are not used anywhere. It is noise, so we can delete them and concatenate the rest of the code.

code malveillant macro image9

And here we are! We managed to get the syntax of the "Shell" function, namely "Shell argument1, argument2". The program which will be executed is "cmd" (cmd.exe, calls the Windows command prompt) with a whole list of options and arguments. The second argument is 0, corresponding to the focus that the program will have when executed. To know what this number means, refer to the documentation of the function on the Microsoft site. The second argument is called "windowstyle" and the value 0 corresponds to "hidden window".
Now we know what the macro does. It runs the Windows command prompt, then hides the window that will be created so the attack is stealthy and the user is not aware of anything. What does the command prompt actually do? This is what we are going to see now.

Obfuscation, again and again

The following code will be executed on the victim's computer. It has been simplified for better understanding of what it is doing.

code malveillant macro image 10

The code can be divided into two sections: the obfuscated code and then the code for its de-obfuscation, which are the last four lines. The code is a loop starting from 1021 and going to 0 by decrementing 1 with each turn through the loop. If you look carefully at the data, especially by reading backwards, you read the word "powershell" which is located at the end of the obfuscated data. We can guess what the rest of the code does, but we will check it anyway.

Let's look at the next line:

set eDYP = !eDYP!!6k:~%X,1!

What does it do? Initially, the variable "eDYP" is empty, then with each turn of the loop 1 character will be added that is found at position X in the obfuscated data, knowing that X corresponds to the number of the turn through the loop (1021 then 1020, 1019, etc.).

Here is a short example when the loop is run:

1st turn:

X = 1021

eDYP = "p"

2nd turn:

X = 1020

eDYP = "po"

3rd turn:

X = 1019

eDYP = "pow"

The code will simply return the data, then when the loop reaches 0, the variable "eDYP" will be executed. Here is the data once the loop has finished:

code malveillant macro image11

The PowerShell code will be executed, base64 encoded strings can be given and, with the right parameter option, the code will be decoded then executed. Here is the PowerShell documentation for encoded strings:

Code powershell malveillant

The "-e" option is short for "-EncodedCommand". We will now concentrate on the decoded data.

code malveillant image12

Another piece of code that appears incomprehensible at first glance… Let's correct this with a simple formatting of the code.

analyse du code malveillant macro image13

We will now simplify the code as we have already done several times (string concatenation, substitution of variables, …).

  • The "Split" function splits a string into several parts using a token, in our code the "@" character.
  • The environment variable "$env:public" is used to retrieve the Windows Public folder.

Analyse code malveillant macro

And there it is! We're finally done with this obfuscation. What does this code do? It will try, for each URL in the "$raw" variable, to download a program (which will be named "735.exe" and saved in the "C:\Users\Public\" folder) then execute it. The program will stop when a URL has worked. The downloaded file is the Emotet malware which is a bank Trojan with the aim of stealing banking credentials.

Files SHA1 VirusTotal VirusBay
735.exe 58dab44d83c6f1ad4407d77b88874e126b732688

Link

Link

As we have seen, the obfuscation is more complicated than in the first part but it is not insurmountable. We have been able to get to the end with simple substitution of variables, concatenation of character strings and decoding a character string. The obfuscation we have just seen is what is currently done in terms of difficulty in downloaders and droppers. Nonetheless, there are code obfuscations that are much more complicated that what we have seen so far and which can be difficult to do by hand as we have done.

The world of obfuscation is vast, complex and fascinating! We have only seen a small part of it for the moment.