Hooked on Mnemonics Worked for Me

A Primer on Cracking XOR Encoded Executables

A while back Locky JS downloaders were downloading executable payloads encrypted with XOR. The infection chain consisted of a victim double clicking on a JS (JavaScript), JSE (Encoded JavaScript), WSH (Windows Script Host ) or another Jscript based interpreted language, the script would then connect to a compromised website, download a binary file, decrypt the binary file using XOR and then execute the decrypted executable file.  At the time I was relying on one of two analysis approaches to retrieve the decrypted payload. The first was using automated malware analysis systems to recover the dropped payload. The second was reversing obfuscated JavaScript or other languages interpreted by wscrpt.exe to find the XOR key. Once I found the key I would decrypt the network traffic carved from a PCAP to recover the Locky executable. Both of these approaches are laborious because either I was relying on automated malware analysis system or successfully deobfuscating the script to recover the key.

Side Note:
For anyone doing deobfuscation of languages interpreted by wscript.exe, I would recommend investigating  hooking APIs. Most of the APIs that need to be hooked can be identified by using an API monitor. Also with hooking it allows you to control what the APIs return. This is useful if you want to recover all URLS that sample might want to connect to. I'll try to post some example code in the next week or two. 

Since the attackers were using XOR on an Portable Executable (PE) file I decided to crack it. This is not very difficult because XOR is not a secure cipher and when used on a portable executable file a  padding attack is introduced. Cracking XOR is a four step process. The first is recovering the key size, second is recovering the key, then decrypting the data with the found key and finally checking for the correct decrypted data.

To recover the key size Hamming distance can be used. Hamming distance can be used to calculate the number of substitutions needed to change one string into the other. From a XOR cracking standpoint, the smallest hamming distance found in a XOR file is likely the XOR key size or a multiple of it. I say a multiple of it because sometimes the smallest hamming distance could be the key size times 2 or another value. For example the below output contains a list of tuples that has the hamming distance and the key size. The actual key size was 29 but the lowest hamming distance found was 58.

[(2.6437784522003036, 58), (2.6952976867652634, 29), (3.2587556654305727, 63), (3.270363951473137, 53), (3.285315243415802, 61), (3.2863494886616276, 34), (3.29136690647482, 55), (3.300850228907783, 50), (3.306188371302278, 26), (3.309218485361723, 37)]
Length: 58, Key: IUN0mhqDx239nW3vpeL9YWBPtHC0HIUN0mhqDx239nW3vpeL9YWBPtHC0H File Name: dc53de4f4f022e687908727570345aba.bin

Here is the code for computing the hamming distance. Note, the two strings must have the same size.

def hamming_distance(bytes_a, bytes_b):
    return sum(bin(i ^ j).count("1") for i, j in zip(bytearray(bytes_a), bytearray(bytes_b)))

Identifying the key size is very important. Earlier versions of my script used standard key sizes of 16,32, 64, etc but shortly after releasing my code some Locky downloaders started using a 29 byte XOR key size. This broke my code because I was not using Hamming distance to check for the key size.

The second step is recovering the key. When a Portable executable is compiled one flag is /filealign:number. The number specifies the alignment of sections in the compiled PE file. It can be found in the Portalble Executable file format in OptionalHeader under FileAlignment. All sections within the executable will need to start at an address that is a multiple of the value defined within the FileAlignment. If the FileAlignment is 0x200, and the size of a data is 0x201 then the next section will start at offset 0x400. In between the data and the start of the section is padded with NULL bytes represented as "\x00".  The file alignment padding introduces a large amount of null bytes into the executable. When null bytes are XORed the encoded data will contain the key.  Searching for the most common recurring byte patterns in a XOR encoded executable can be used to recover the key. The following code can be used to find the 32 most common occurring bytes in an executable

substr_counter = Counter(message[i: i+size] for i in range(len(message) - size))
sub_count = substr_counter.most_common(32)

The third step is XOR the data. The following code can be used to XOR data with single or multibyte keys. If you don’t understand the code I would recommend walking through each section of it. This is personally one of my favorite pieces of Python code. It covers a number of Python concepts from list comprehension, logical operations and standard functions.

def xor_mb(message, key):

    return''.join(chr(ord(m_byte)^ord(k_byte)) for m_byte,k_byte in zip(message, cycle(key)))

The last step is to verify that the key and decrypted data is correct. Since the decrypted payload is an executable file with a known file structure I used pefile to verify the data has been decrypted correctly. If the PE structure is invalid Pefile would throw an exception.

def pe_carv(data):
    '''carve out executable using pefile's trim'''
    c = 1
    for offset in [temp.start() for temp in re.finditer('\x4d\x5a',data)]:
        # slice out executable 
        temp_buff = data[offset:]
            pe = pefile.PE(data=temp_buff)
        return pe.trim()
    return None

Complete code with example output - link

        Alexander Hanel
         - POC that searches for n-grams and uses them as the XOR key.
         - Also uses hamming distance to guess key size. Check out cryptopals Challenge 6
         for more details https://cryptopals.com/sets/1/challenges/6
pe_ham_brute.py ba5aa03d724d17312d9b65a420f91285caff711e2f891b3699093cc990fdaae0
Hamming distances & calculated key sizes
[(2.6437784522003036, 58), (2.6952976867652634, 29), (3.2587556654305727, 63), (3.270363951473137, 53), (3.285315243415802, 61), (3.2863494886616276, 34), (3.29136690647482, 55), (3.300850228907783, 50), (3.306188371302278, 26), (3.309218485361723, 37)]
Length: 58, Key: IUN0mhqDx239nW3vpeL9YWBPtHC0HIUN0mhqDx239nW3vpeL9YWBPtHC0H File Name: dc53de4f4f022e687908727570345aba.bin

import base64
import string
import sys
import collections
import pefile
import re
import hashlib

from cStringIO import StringIO
from collections import Counter
from itertools import cycle 
from itertools import product

DEBUG = True

def xor_mb(message, key):
    return''.join(chr(ord(m_byte)^ord(k_byte)) for m_byte,k_byte in zip(message, cycle(key)))

def hamming_distance(bytes_a, bytes_b):
    return sum(bin(i ^ j).count("1") for i, j in zip(bytearray(bytes_a), bytearray(bytes_b)))

def key_len(message, key_size):
    """"returns [(dist, key_size),(dist, key_size)]"""
    avg = []
    for k in xrange(2,key_size): 
        hd = []
        for n in xrange(len(message)/k-1):
        if hd:
            avg.append((sum(hd) / float(len(hd)), k))
    return sorted(avg)[:10]

def pe_carv(data):
    '''carve out executable using pefile's trim'''
    c = 1
    for offset in [temp.start() for temp in re.finditer('\x4d\x5a',data)]:
        # slice out executable 
        temp_buff = data[offset:]
            pe = pefile.PE(data=temp_buff)
        return pe.trim()
    return None

def write_file(data, key):
    m = hashlib.md5()
    name = m.hexdigest()
    key_name = "key-" + name + ".bin"
    file_name = name + ".bin"
    print "Length: %s, Key: %s File Name: %s" % (len(key),key, file_name)
    f =  open(file_name, "wb")
    fk = open(key_name , "wb")

def run(message):
    key_sizes = key_len(message, 64)
    if DEBUG:
        print "Hamming distances & calculated key sizes"
        print key_sizes
    for temp_sz in key_sizes:
        size = temp_sz[1]
        substr_counter = Counter(message[i: i+size] for i in range(len(message) - size))
        sub_count = substr_counter.most_common(32)
        for temp in sub_count:
            key, count = temp
            if count == 1:
            temp = xor_mb(message, key)
            pe_c = pe_carv(temp)
            if pe_c:
                write_file(pe_c, key)
data = open(sys.argv[1],'rb').read()

For anyone else interested in learning about crypto I'd recommend checking out Understanding Cryptography. It is a great beginner book with not a lot of math. Each chapter has corresponding video lectures on YouTube. Another resource is attempting The Cryptopals Crypto Challenges. I can not recommend the CryptoPals challenge enough. Here are my solutions so far. At one point I contemplated quitting my job so I could just focus only on the challenges. Not one of my most practical ideas but the challenges exposed many of my weaknesses in programming and mathematics. It's pretty rare to find something that points you in the direction of what you need to learn and gives you a definitive answer (cracking the challenge) when you can move on to the next area of study.   Pretty awesome. If you have any questions or comments you can ping me on Twitter, leave a comment or send me an email at alexander dot hanel at gmail dot com.

ObfStrReplacer & ExtractSubfile Snippets

ObfStrReplacer is a script that replaces obfuscated variable names with easier to read strings. Some obfuscation techniques rely on common looking strings to make the code difficult to read. For example the string Illl1III111I11 is hard to distinguish from lIll1III111I11. ObfStrReplacer takes a regular expression as an argument to match obfuscated strings, it will then add all matches to a set and replace the matches with a unique string.  11ll1III111I11 would become _carat. All renamed strings start with "_". In the image above we can see the obfuscated code on the left and the de-obfuscated code on the right.

Please see the command line example in the source code for details on usage. I have confirmed it works well on obfuscated ActionScript.  The code blindly replaces matches. It does not check for the reuse of variable names within the scope of different functions. I plan on adding this at a later date. Please leave a VT hash in the comments if you have an example.

ObfStrReplacer Source Code

ExtractSubfile is a simple modification to hachoir subfile's search.py. It is used to extract embedded files. The carving functionality was already included in hachoir-subfile but not exposed.

__@___:~/hachoir-subfile crsenvironscan.xls 
[+] Start search on 126444 bytes (123.5 KB)

[+] File at 0 size=80384 (78.5 KB): Microsoft Office document
[+] File at 2584 size=52039 (50.8 KB): Macromedia Flash data: version 9

[+] End of search -- offset=126444 (123.5 KB)
Total time: 1 sec 478 ms -- global rate: 83.5 KB/sec
__@___:~/$ python ExtractSubFile.py  crsenvironscan.xls 
[+] Start search on 126444 bytes (123.5 KB)

[+] File at 0 size=80384 (78.5 KB): Microsoft Office document => /home/file-0001.doc
[+] File at 2584 size=52039 (50.8 KB): Macromedia Flash data: version 9 => /home/file-0002.swf

[+] End of search -- offset=126444 (123.5 KB)

In the second and third lines at the end of the output we can see a document and SWF were carved.

ExtractSubFile Source Code

Base91 & Angler SWFs

If anyone is curious the encoding that Angler is using in their SWFs is base91. The encoding was hinted at in an excellent article by Palo Alto Networks but was only identified as a function named DecodeToByteArray. Below are my notes to decode and decompress the embedded SWF. 

___*____$ swfextract c34266299460225c0354df5438417924579641095ffd7588a42d8fae07ae8511 
Objects in file c34266299460225c0354df5438417924579641095ffd7588a42d8fae07ae8511:
 [-i] 1 MovieClip: ID(s) 4
 [-F] 1 Font: ID(s) 1
 [-b] 1 Binary: ID(s) 5
 [-f] 1 Frame: ID(s) 0

___*____$ swfextract c34266299460225c0354df5438417924579641095ffd7588a42d8fae07ae8511 -b 5
___*____$ ls
c34266299460225c0354df5438417924579641095ffd7588a42d8fae07ae8511  output.bin  xxxswf.py
___*____$ hexdump -C output.bin | head

00000000  40 5a 7a 55 7b 5a 78 30  46 3b 49 26 52 48 43 5d  |@ZzU{Zx0F;I&RHC]|
00000010  40 62 66 40 40 6d 32 7b  59 25 52 5d 75 75 62 55  |@bf@@m2{Y%R]uubU|
00000020  59 4d 53 61 30 34 2a 76  7b 5e 21 74 39 5a 5b 7d  |YMSa04*v{^!t9Z[}|
00000030  62 3f 38 42 3d 5f 51 6b  24 5b 23 3a 50 2c 2c 5e  |b?8B=_Qk$[#:P,,^|
00000040  22 7b 6e 6b 23 69 21 48  2b 35 54 60 24 22 2e 36  |"{nk#i!H+5T`$".6|
00000050  58 6c 75 6d 6d 4c 54 67  48 28 5a 6a 44 4b 30 63  |XlummLTgH(ZjDK0c|
00000060  37 2a 23 3f 53 78 6c 57  4a 67 68 60 48 45 76 67  |7*#?SxlWJgh`HEvg|
00000070  35 2e 79 4a 35 3c 46 6c  5b 47 46 3f 79 42 30 47  |5.yJ5<Fl[GF?yB0G|
00000080  35 6d 3c 67 2c 54 7b 59  42 2b 6a 4f 50 2b 3b 65  |5m<g,T{YB+jOP+;e|
00000090  79 26 26 3c 30 7c 65 59  7a 59 5e 57 22 4b 72 4b  |y&&<0|eYzY^W"KrK|

While reviewing the data I noticed all of the bytes were valid ASCII. This usually infers base64 but the characters '@'' or '$' meant it must be a modified version it. A mistake I made after deobfuscating the ActionScript was I only cursory looked at the decoder. The code and data had the patterns of base64 and I blindly assumed it was. If it was a modified version of base64 I could reconstruct all the chars from the table. This can be done by reading each character from the data into a set. From there I would need to find the right sequence of chars. Strangely, this hackish approach lead me to the encoding.

In [1]: f = open("output.bin", "rb")

In [2]: d = f.read()

In [3]: o = set([])

In [4]: for x in d:

In [5]: "".join(sorted(o))
Out[5]: '!"#$%&()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~'

In [6]: len(o)
Out[6]: 91

91?  As in Base91? Weird, never heard of that. A search lead me to some code written by Adrien Beraud which confirmed the ActionScript is indeed Base91.

After the data is decoded with base91 each byte is XORed. Initially the key is set to a hard coded value then the key becomes the previous byte that was encoded. Once the XOR loop is completed it is decompressed with zlib. The initial XOR key is not static. In PAN's write up the key is 91 and in mine it was 75. The key can be found with a decompiler (Trillix or JPEXS) or a disassembler (swfdump). The later can be done to extract the XOR key from the command line. swfdump -a can be used to get the assembly of the ActionScript. Searching for bitxor and pushint should provide the XOR key.

___*____$ swfdump -a c34266299460225c0354df5438417924579641095ffd7588a42d8fae07ae8511 > asm.as
___*____$ swfdump vi asm.as

        00045) + 2:1 callpropvoid <q>[public]::I1lllIII111I11, 1 params
        00046) + 0:1 pushint 75   <- KEY
        00047) + 1:1 convert_u
        00048) + 1:1 setlocal r4
        00049) + 0:1 pushint 0
        00050) + 1:1 setlocal r5
        00051) + 0:1 label
        00052) + 0:1 getlocal r5
        00053) + 1:1 getlocal r3
        00054) + 2:1 getlocal_0
        00055) + 3:1 getproperty <q>[private]::1Ill1III111I11
        00056) + 3:1 getproperty <q>[public]::+ll1III111I11
        00057) + 3:1 getproperty <l,multi>{[public]""}
        00058) + 2:1 lessthan
        00059) + 1:1 iffalse ->81
        00060) + 0:1 getlocal r3
        00061) + 1:1 getlocal r5
        00062) + 2:1 getproperty <l,multi>{[public]""}
        00063) + 1:1 getlocal r4
        00064) + 2:1 bitxor       <- XOR
        00065) + 1:1 convert_u

Quickly written Python code for decoding and extracting the second SWF. The key will likely need to be modified.

# The Base91 code is written by Adrien Beraud
# https://github.com/aberaud/base91-python/blob/master/base91.py

# Base91 encode/decode
# Copyright (c) 2012 Adrien Beraud
# All rights reserved.
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#   * Redistributions of source code must retain the above copyright notice,
#     this list of conditions and the following disclaimer.
#   * Redistributions in binary form must reproduce the above copyright notice,
#     this list of conditions and the following disclaimer in the documentation
#     and/or other materials provided with the distribution.
#   * Neither the name of Adrien Beraud, Wisdom Vibes Pte. Ltd., nor the names
#     of its contributors may be used to endorse or promote products derived
#     from this software without specific prior written permission.

import struct
import sys
import zlib

base91_alphabet = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
 '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '#', '$',
 '%', '&', '(', ')', '*', '+', ',', '.', '/', ':', ';', '<', '=',
 '>', '?', '@', '[', ']', '^', '_', '`', '{', '|', '}', '~', '"']

decode_table = dict((v,k) for k,v in enumerate(base91_alphabet))

def decode(encoded_str):
    ''' Decode Base91 string to a bytearray '''
    v = -1
    b = 0
    n = 0
    out = bytearray()
    for strletter in encoded_str:
        if not strletter in decode_table:
        c = decode_table[strletter]
        if(v < 0):
            v = c
            v += c*91
            b |= v << n
            n += 13 if (v & 8191)>88 else 14
            while True:
                out += struct.pack('B', b&255)
                b >>= 8
                n -= 8
                if not n>7:
            v = -1
    if v+1:
        out += struct.pack('B', (b | v << n) & 255 )
    return out

def main():
    f = open(sys.argv[1], 'rb')
    x = f.read()
    d = decode(x)
    dd = ""
    key = 75
    for y in d:
        dd += chr(y ^ key)
        key = y
    o = zlib.decompress(dd)
    kk = open( sys.argv[1] + "-out.bin", "wb")


output.bin is the binary data extracted using swfextract. The above Python code is stored in angler-decoder.py. After running the script the decoded SWF is saved to output.bin-out.bin. Then I use xxxswf.py to verify the SWF is present.

___*____$ python angler-decoder.py output.bin 
___*____$ ls
angler-decoder.py  c34266299460225c0354df5438417924579641095ffd7588a42d8fae07ae8511  output.bin  output.bin-out.bin  xxxswf.py
___*____$ python xxxswf.py output.bin-out.bin 

[SUMMARY] Potentially 1 SWF(s) in MD5 d41d8cd98f00b204e9800998ecf8427e:output.bin-out.bin
 [ADDR] SWF 1 at 0x0 - CWS Header
___*____$ python xxxswf.py -d output.bin-out.bin 

[SUMMARY] Potentially 1 SWF(s) in MD5 d41d8cd98f00b204e9800998ecf8427e:output.bin-out.bin
 [ADDR] SWF 1 at 0x0 - CWS Header
  [FILE] Carved SWF MD5: 5d4c794c3a3011da71cc31d5fd7015ce.swf

The extracted second SWF is also obfuscated.  

My "cleaned up" ActionScript

    import flash.display.*;
    import flash.system.*;

    public class ExtendedMovieClipFunction extends MovieClip
        private var DEUNCOMPRESSED_BUFFER:Object;
        private var _CLASS_BUFFER:Class;
        private var FuncNameToStrInstance:AssignFuncNameToString;
        private var int_0:uint = 0;
        private var _uint_0:uint = 0;
        private var _uint_255:uint = 255;
        private var _object2:Object;
        private var _object3:Object;

        public function ExtendedMovieClipFunction(param1:Object = null)
            this.FuncNameToStrInstance = new AssignFuncNameToString();
            #  a SWF file from other domains than that of the Loader object can call Security.allowDomain() to
            #  permit a specific domain
            var _loc_3:* = ApplicationDomain[this.FuncNameToStrInstance.currentDomain];
            var  ldr:Loader :* = _loc_3[this.FuncNameToStrInstance.getDefinition](this.FuncNameToStrInstance.flash.display.Loader) as Class;
            this.DEUNCOMPRESSED_BUFFER = new  ldr:Loader ;
            this._CLASS_BUFFER = _loc_3[this.FuncNameToStrInstance.getDefinition](this.FuncNameToStrInstance.flash.utils.ByteArray) as Class;
            ## The Stage class represents the main drawing area.
            if (this[this.FuncNameToStrInstance.stage])
                this[this.FuncNameToStrInstance.addEventListener](this.FuncNameToStrInstance.addedToStage, this.FuncEventListener);
        }// end function

        public function 1_object1(param1:Object, param2:int) : void
        }// end function

        private function FuncEventListener(param1:Object = null) : void
            this[this.FuncNameToStrInstance.removeEventListener](this.FuncNameToStrInstance.addedToStage, this.FuncEventListener);
            this[this.FuncNameToStrInstance.addEventListener](this.FuncNameToStrInstance.enterFrame, this.I1111IIIlllIl1);
            var _loc_2:* = new ExtendedByteArrayFunction();
            var DECODE_BUFFER:* = new this._CLASS_BUFFER();
            this.BASE91(_loc_2, _loc_2[this.FuncNameToStrInstance.length], DECODE_BUFFER);
            var _loc_4:* = 75;
            var INDEX:* = 0;
            // XOR loop 
            if (INDEX < DECODE_BUFFER[this.FuncNameToStrInstance.length])
                var _loc_6:* = DECODE_BUFFER[INDEX] ^ _loc_4;
                _loc_4 = DECODE_BUFFER[INDEX];
                DECODE_BUFFER[INDEX] = _loc_6;
            # XORs the data then uncompresses 
            var _loc_8:* = null;
        }// end function

        private function I1111IIIlllIl1(param1) : void
            if (this.currentFrame == 200)
                this.I1ll1III111I11(new Number(2));
        }// end function

        # Create Key  
        private function CONSTRUCT_KEY() : void
            this._object2 = new this._CLASS_BUFFER();
            this._object3 = new this._CLASS_BUFFER();
            var _loc_2:* = 0;
            _loc_2 = 65;
            if (_loc_2 < 91)
            _loc_2 = 97;
            if (_loc_2 < 123)
            _loc_2 = 48;
            if (_loc_2 < 58)
            _loc_2 = 33;
            if (_loc_2 < 48)
                if (_loc_2 == 34 || _loc_2 == 39 || _loc_2 == 45)
            _loc_2 = 58;
            if (_loc_2 < 65)
            _loc_2 = 91;
            if (_loc_2 < 97)
                if (_loc_2 == 92)
            _loc_2 = 123;
            if (_loc_2 < 127)
            var _loc_3:* = 0;
            _loc_3 = 0;
            if (_loc_3 < 255)
                this._object2[_loc_3] = 255;
            _loc_3 = 0;
            if (_loc_3 < this._object3[this.FuncNameToStrInstance.length])
                this._object2[this._object3[_loc_3]] = _loc_3;
        }// end function

        public function func_123(param1) : uint
            var _loc_2:* = 0;
            if (this._uint_255 != 255)
                param1[param1[this.FuncNameToStrInstance.length]] = this.int_0 | this._uint_255 << this._uint_0;
                _loc_2 = _loc_2 + 1;
            return _loc_2;
        }// end function

        public function BASE91(param1, _length:uint, param3) : uint
            var _loc_4:* = 0;
            var _loc_5:* = 0;
            var _int_8191:* = 8191;
            _INDEX = 0;
            # previously IF
            while (_INDEX < _length)
                if (this._object2[param1[_INDEX]] == 255)
                    if (this._uint_255 == 255)
                        this._uint_255 = this._object2[param1[_INDEX]];
                        #   _uint_255 =  _uint_255 + * len(_object3)
                        this._uint_255 = this._uint_255 + this._object2[param1[_INDEX]] * this._object3[this.FuncNameToStrInstance.length];
                        this.int_0 = this.int_0 | this._uint_255 << this._uint_0;
                        this._uint_0 = this._uint_0 + ((this._uint_255 & _int_8191) > 88 ? (13) : (// label, 14));
                        # increament _loc_8
                        var _loc_8:* = _loc_5;
                        _loc_5 = _loc_5 + 1;
                        # move to out buffer
                        param3[_loc_8] = this.int_0 & 255;
                        this.int_0 = this.int_0 >> 8;
                        this._uint_0 = this._uint_0 - 8;
                        if (this._uint_0 > 7) goto 160;
                        this._uint_255 = 255;
            return _loc_5;
        }// end function


Exploring the Top 100 ebooks of The Pirate Bay

I wrapped up an analysis of the Top 100 ebooks of the Pirate Bay.  Rather than posting to code I decided to use a notebook viewer. All the data and code can be found on my bit-bucket repo. Cheers.

The Beginner's Guide to IDAPython

In my spare time for the past couple of months I have been working on an ebook called "The Beginner's Guide to IDAPython". I originally wrote it as a reference for myself - I wanted a place to go to where I could find examples of functions that I commonly use (and forget) in IDAPython.  Since I started the book I have used it many times as a quick reference to understand syntax or see an example of some code. I hope others will find it equally useful.  The book is not a static document. I already have a list of content/topics that I would like to write about., like a cover.. Please feel free to email me if you would like a topic added, have a correction or would like to say hi. My email is a the bottom of the introduction . The ebook can be found  in the below link.


The price is free (move the slider to left) but has a suggested price of $14.99. In all honesty I don't care if you purchase it. A purchase would be nice but I'd rather you learn something from it.

I wrote the book in markup language. I used StackEdit [1] as an editor. I paid for a sponsor account. This allowed me to download it in a PDF. I did version control and hosting via bit-bucket [2]. Not sure why Dan [3] and I are the only people on it.  Bit-Bucket is awesome. Unlimited free private repos for the win.  I'm using leanpub [4] as the distributor for my ebook. Ange Albertini is also using leanpub to publish an ebook called  Binary is beautiful [5]. Disclaimer his book will be way better than mine.

I'd like to thank Hexacorn for all his feedback and support.

1. https://stackedit.io/
2. https://bitbucket.org/
3. https://bitbucket.org/daniel_plohmann
4. http://leanpub.com/
5. https://leanpub.com/binaryisbeautiful

Updates:  Usual grammar issues...

Dyre IE Hooks

I recently wrapped up my analysis of Dyre. A PDF document can be found in my papers repo. Most of the document focuses on the different stages that  Dyre interacts with the operating system. There are still some areas that I'd like to dig deeper into. For now it should be a good resource for anyone trying to identify a machine infected with Dyre or wanting to know more about the family of malware.

During the reversing process I found one part of Dyre functionality worthy of a post. As with most banking trojans Dyre contains functionality to hook APIs to log browser traffic. Typically to get the addresses of the APIs the sample will call GetProcAddress or manually traverse the portable executable file format to resolve symbols. If you are unfamiliar with the later technique I'd highly recommend reading section 3.3 of "Understanding Windows Shellcode" by Skape [1]. Dyre attempts to hook APIs in firefox.exe, chrome.exe and iexplorer.exe. It uses the standard GetProcAddress approach for resolving symbols in firefox.exe, is unsuccessful in chrome.exe and uses the GetProcAddress approach for the APIs LoadLibraryExW and CreateProcessInternalW in iexplorer.exe. Dyre hooks two APIs in WinInet.dll but it does it in a unique way. Dyre will read the image header timedatestamp [2] from WinInet. This value contains the time and date from when Wininet was created by the linker during compiling.  It will then compare the timedatestamp to a list of timedatestamps stored by Dyre.  The list contains presumably every time stamp for WinInet.dll since '2004-08-04 01:53:22' to '2014-07-25 04:04:59'.  Below is an example of the values that can be found in the list.

seg000:00A0C05F           db    0
seg000:00A0C060 TimeStampList dd 4110941Bh              ; DATA XREF: TimeStamp:_loopr
seg000:00A0C064 dword_A0C064 dd 0                       ; DATA XREF: TimeStamp+1Cr
seg000:00A0C064                                         ; TimeStamp:loc_A07A0Dr ...
seg000:00A0C068           dd 411095F2h <- Time stamp
seg000:00A0C06C           dd 0         <- WinInet index
seg000:00A0C070           dd 4110963Fh
seg000:00A0C074           dd 0
seg000:00A0C078           dd 4110967Dh
seg000:00A0C07C           dd 0
seg000:00A0C080           dd 411096D4h
seg000:00A0C084           dd 0
seg000:00A0C088           dd 411096DDh
seg000:00A0C08C           dd 0
seg000:00A0C090           dd 41252C1Bh
seg000:00A0C094           dd 0
seg000:00A0C0AC           dd 1
seg000:00A0C0B0           dd 435862A0h
seg000:00A0C0B4           dd 2
seg000:00A0C0B8           dd 43C2A6A9h
seg000:00A0C0BC           dd 3
seg000:00A0D230           dd 4CE7BA3Fh
seg000:00A0D234           dd 78h
seg000:00A0D238           dd 53860FB3h
seg000:00A0D23C           dd 79h
seg000:00A0D240           dd 53D22BCBh
seg000:00A0D244           dd 7Ah

Values converted to time

>>> datetime.datetime.fromtimestamp(0x411095F2).strftime('%Y-%m-%d %H:%M:%S')
'2004-08-04 01:53:22'

>>> datetime.datetime.fromtimestamp(0x53D22BCB).strftime('%Y-%m-%d %H:%M:%S')
'2014-07-25 04:04:59'    

If the timedatestamp is not present or an error occurs Dyre will send the hash of WinInet to the attackers server. If the hash is not found it will send WinInet back to the attackers. Below are some of the strings responsible for displaying errors for the command and control.

"Check wininet.dll on server failed"
"Send wininet.dll failed"

If the timedatestamp is found in the list the next value is used as an index into another list. For example if the timedatestamp was 4802A13Ah it would be found at the 49th entry and the next value would be 0x15 or 21.

seg000:00A0C1E8           dd 4802A13Ah  <- '2008-04-13 18:11:38'
seg000:00A0C1EC           dd 15h  <- 21 index

Assembly to read index value

seg000:00A07A0D           movsx   edx, word ptr ds:TimeStampIndex[eax*8] ; edx = 21
seg000:00A07A15           lea     edx, [edx+edx*2] ; edx  = 63
seg000:00A07A18           mov     edx, ds:offset[edx*4]
seg000:00A07A1F           mov     [ecx], edx            ; save off value

Python: calculate offset
Python>hex(0x0A0D3E0 + (21+21* 2) * 4)

seg000:00A0D4DC           dw 0F3Ch  0x0f3C offset to inline hook in wininet

The value 0xF3C + the base address of WinInet is the function prologue for ICSecureSocket::Send_Fsm. Dyre uses this to know the address to place it's hooks.

ICSecureSocket::Send_Fsm(CFsm_SecureSend *)
77200F37    90              NOP
77200F38    90              NOP
77200F39    90              NOP
77200F3A    90              NOP
77200F3B    90              NOP
77200F3C  - E9 C7F0398A     JMP 015A0008   <- Inline hook
015A0008    68 4077A000     PUSH 0A07740
015A000D    C3              RETN

00A07740    55              PUSH EBP
00A07741    8BEC            MOV EBP,ESP
00A07743    83EC 08         SUB ESP,8
00A07746    894D FC         MOV DWORD PTR SS:[EBP-4],ECX
00A07749    68 2077A000     PUSH 0A07720
00A0774E    FF75 08         PUSH DWORD PTR SS:[EBP+8]
00A07751    FF75 FC         PUSH DWORD PTR SS:[EBP-4]
00A07754    FF15 94DEA000   CALL DWORD PTR DS:[A0DE94]
00A0775A    8945 F8         MOV DWORD PTR SS:[EBP-8],EAX

It will also hooks ICSecureSocket::Receive_Fsm in the same fashion.

Rather than calling GetProcAddress (the hooked APIs are not exportable) Dyre stores the timedatestamp and patch offset of every known version of WinInet to avoid triggering heuristic based scanners. Seems like an arduous approach but still kind of cool. Another interesting fact is Dyre has the ability to patch Trusteer's RapportGP.dll if found in the browser memory. Dyre is actually a family of malware worthy of a deep dive. At first glance I ignored it because everything looked pretty cut & paste. I'd recommend others to check it out. If you find anything useful please shoot me an email. Cheers.

Hash Analyzed 099c36d73cad5f13ec1a89d5958486060977930b8e4d541e4a2f7d92e104cd21
  1. http://www.nologin.org/Downloads/Papers/win32-shellcode.pdf
  2. http://msdn.microsoft.com/en-us/library/ms680313.aspx


I have been reversing Dyre in my spare time. I'm hoping to have a full analysis out in the next week or two. Something kind of annoying about Dyre is it uses what looks like a massive structure to store it's data and function pointers. For example in the image below we can see it it passing a handle stored at [eax+0x130] to WaitForSingleObject.
Manually tracing the code or searching for all cross references is kind of painful to find what populated the value. Since the displacement is kind of unique due to it's value of 0x130 or 304 it can be targeted very easily in IDAPython.

import idautils 
import idaapi
displace = {}

# for each known function 
for func in idautils.Functions():
    flags = idc.GetFunctionFlags(func)
    # skip library & thunk functions 
    if flags & FUNC_LIB or flags & FUNC_THUNK:
    dism_addr = list(idautils.FuncItems(func))
    for curr_addr in dism_addr:
        op = None
        index = None 
        # same as idc.GetOptype, just a different way of accessing the types
        if idaapi.cmd.Op1.type == idaapi.o_displ:
            op = 1
        if idaapi.cmd.Op2.type == idaapi.o_displ:
            op = 2
        if op == None:
        if "bp" in idaapi.tag_remove(idaapi.ua_outop2(curr_addr, 0)) or \
               "bp" in idaapi.tag_remove(idaapi.ua_outop2(curr_addr, 1)):
            # ebp will return a negative number
            if op == 1:
                index = (~(int(idaapi.cmd.Op1.addr) - 1) & 0xFFFFFFFF)
                index = (~(int(idaapi.cmd.Op2.addr) - 1) & 0xFFFFFFFF)
            if op == 1:
                index = int(idaapi.cmd.Op1.addr)
                index = int(idaapi.cmd.Op2.addr)
        # create key for each unique displacement value 
        if index:
            if displace.has_key(index) == False:
                displace[index] = []
The above code will create a dictionary of all the displacement values in known functions. A simple for loop can be used to find the address and disassembly of all uses for the defined displacement value.
Python>for x in displace[0x130]: print hex(x), GetDisasm(x)
0x10004f12 mov     [esi+130h], eax
0x10004f68 mov     [esi+130h], eax
0x10004fda push    dword ptr [esi+130h]  ; hObject
0x10005260 push    dword ptr [esi+130h]  ; hObject
0x10005293 push    dword ptr [eax+130h]  ; hHandle
0x100056be push    dword ptr [esi+130h]  ; hEvent
0x10005ac7 push    dword ptr [esi+130h]  ; hEvent
With the addresses it makes it easy to find where the value is populated.

The dictionary created by the script is named displace. It will contain all displaced values.  Not super 1337 but still useful. Cheers.