November 10, 2020

Bitdefender: UPX Unpacking Featuring Ten Memory Corruptions

This post breaks the two-year silence of this blog, showcasing a selection of memory corruption vulnerabilities in Bitdefender’s anti-virus engine.

Introduction

The goal of binary packing is to compress or obfuscate a binary, usually to save space/bandwidth or to evade malware analysis. A packed binary typically contains a compressed/obfuscated data payload. When the binary is executed, a loader decompresses this payload and then jumps to the actual entry point of the (inner) binary. Most anti-virus engines support binary unpacking at least for packers (such as UPX) that are very popular and that are also used by non-malware software.

This blog post is about UPX unpacking of PE binaries in the Bitdefender core engine. The main steps in UPX unpacking of PE binary files are the following:

  1. Detect the loader from the entry point
  2. Find the compressed data payload and extract it
  3. Unfilter the extracted code
  4. Rebuild various structures (such as the import table, the relocation table, the export table, and the resources)

The following vulnerabilities are presented in the control-flow order of the UPX unpacker.

Disclaimer: In the following, decompiled code from Bitdefender’s core engine is presented. The naming of variables, fields, and macros is heavily inspired by the original UPX. For some snippets, a reference to the original function is added for comparison. It is likely that some types are incorrect.

//1//: Stack Buffer Overflow in Pre-Extraction Deobfuscation

After the UPX loader has been detected, the Bitdefender engine tries to detect whether the loader applies a specific kind of deobfuscation to the compressed data payload before extracting it. The (de)obfuscation is very simple, making only use of the three operations ADD, XOR, and ROTATE_LEFT. If this deobfuscation is detected, then the engine iterates through the corresponding instructions of the loader and parses them with their operands in order to be able to deobfuscate the data as well. This looks as follows:

int32_t operation[16]; // on the stack
int32_t operand[16]; // on the stack
int i = 0;
int pos = 0;
do {
  bool op_XOR_or_ADD = false;
  if (loaderdata[pos] == 0x81u
    && (loaderdata[pos + 1] == 0x34  || loaderdata[pos + 1] == 0x4)) {
      operation[i] = (loaderdata[pos + 1] == 0x34) ? OP_XOR : OP_ADD;
      operand[i] = *(int32_t *)&loaderdata[pos + 3];
      ++i;
      pos += 7;
      op_XOR_or_ADD = true;
    }
  }
  if (loaderdata[pos] == 0xC1u && loaderdata[pos + 1] == 4) {
    operation[i] = OP_ROTATE_LEFT;
    operand[i] = loaderdata[pos + 3];
    ++i;
    pos += 4;
    if (i == 16) break;
    continue;
  }
  if (op_XOR_or_ADD) {
    if (i == 16) break;
    continue;
  }
  if (loaderdata[pos] == 0xE2u) { /* omitted: apply collected operations */ }
  pos += 2;
} while (pos + SOME_SLACK < loaderdata_end);

Observe how the bound-check on the index variable i is performed. As the buffer loaderdata is fully attacker-controlled, it is easy to verify that we can increase the index variable i by two before running into one of the checks i == 16. In particular, we can increase i from 15 to 17, after which we can overwrite the stack with completely arbitrary data.

(10ec.12dc): Break instruction exception - code 80000003 (first chance)
00000000`0601fe42 cc              int     3

The debug break is due to the stack canary which we have overwritten. If we continue, we see that the return fails because the stack is corrupted.

0:000> g
(10ec.12dc): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
00000000`06006603 c3              ret

0:000> dd rsp
00000000`0014ed98  deadbeef deadbeef deadbeef deadbeef
00000000`0014eda8  deadbeef deadbeef deadbeef deadbeef

//2//: Heap Buffer Overflow in Pre-Extraction Deobfuscation

The collected operations (for the deobfuscation shown in //1//) are applied to the payload buffer at an attacker-controlled offset write_offset. Obviously, this offsets needs to be checked before writing to it. There are two checks on write_offset. The first is

if (write_offset <= extractobj->dword10 + 3) 

and the second one is

if (loaderdata[pos] == 0xE2u) {
  if (write_offset >= extractobj->dword10 - 3)

Both checks test against the field dword10. The field dword10, sitting on the calling functions’s stack frame, is never initialized. This makes the bound check useless and introduces a fully attacker-controlled heap buffer overflow.

//3//: Heap Buffer Overflow in Post-Extraction Deobfuscation

After the extraction, the engine attempts to deobfuscate the extracted data with a static XOR key.

for(int i=0; i<0x300; i++) {
  if (*(int32_t *)&entrypoint_data[i] == 0x4243484B) {
    int32_t j = i + 0x4A;
    uint8_t xor_key = entrypoint_data[j]; // attacker-controlled
    int32_t xor_len = *(int32_t *)&entrypoint_data[j - 7]; // attacker-controlled
    if (xor_len > packer->set_to_size_of_rawdata) return j; // <-- wrong bound check
    for(int32_t k=0; k<xor_len; k++) {
      packer->extracted_data[k] ^= xor_key; // <-- oob write
    }
    *info_string = "encrypted";
  }
}

The bound check is completely wrong. It should check against the size of the extracted data buffer. Instead, it checks against a value that is previously set to the raw data size of the section we extracted the data from. Those two sizes have nothing to do with each other. In particular, one can be much smaller than the other, or vice-versa.

As the function does not return after the first deobfuscation run, the memory corruption can be triggered up to 0x300 times in a row. This allows us to bypass the limitation that in a single deobfuscation run we always XOR with the same byte. We would simply XOR as follows:

First run (i=0):  XOR with B0 B0 B0 B0 B0 B0 B0
Second run (i=1): XOR with B1 B1 B1 B1 B1
Third run (i=2):  XOR with B2 B2

Overall, we then have XORed with C0 C0 C1 C1 C1 C2 C2 for completely arbitrary C0, C1, and C2. We can essentially XOR with such a pattern of almost arbitrary length, and switch the byte at most 0x300 times.

Needless to say, this vulnerability is a useful exploitation primitive as it enables very powerful memory corruptions: XORing allows us to modify selectively only certain parts of data, leaving other parts (for example heap metadata or critical objects) untouched.

//4//: Heap Buffer Overflow in the Filters

A filter is a simple transformation on binary code (say, x86-64 code) that is applied before compression, with the goal to make the code more compressible. After we have decompressed the data, we need to revert this filtering. Bitdefender supports about 15 different filters. Here is one of them (filter 0x11):

int32_t bytes_to_filter = /* omitted. is guaranteed not to be oob. */; 
int i = 0;
while (1) {
  do {
    if (--bytes_to_filter < 0) break;
  } while (extracted_data[i++] != 0xE8u);
  if (bytes_to_filter < 0) break;
  *(int32_t *)&extracted_data[i] -= i; // <-- oob write
  i += 4;
}

The problem is that bytes_to_filter is only updated when i is incremented by one, but not when it is later incremented by four.

Of the 15 filters, about 8 seem to be affected by such a heap buffer overflow. I treated them all together as one bug (after all, it is not unlikely that they share code).

//5//: Heap Buffer Overflow when Rebuilding Imports

The following memory corruption occurs in a loop of the function PeFile::rebuildImports (cf. PeFile::rebuildImports). It looks like this:

this->im->iat = this->iatoffs;
this->newiat = &extract_obj->extracted_data[this->iatoffs - (uint64_t)(uint32_t)pefile->rvamin];
while (*p) {
  if (*p == 1) {
    ilen = strlen(++p) + 1;
    if (this->inamespos) {
      if (ptr_diff(this->importednames,this->importednames_start) & 1) --this->importednames;
      memcpy(this->importednames + 2, p, ilen); // <-- memory corruption
      *this->newiat = ptr_diff(this->importednames,extract_obj->extracted_data - pefile->rvamin);
      this->importednames += ilen + 2;
      p += ilen;
    }
    else {
      //omitted, see below //5//
    }
  }
  else if (*p == 0xFFu) {
    p += 3;
    *this->newiat = ord_mask + *(uint16_t *)(p + 1);
  }
  else {
    // omitted
  }
  ++this->newiat;
}

The length ilen that is passed to memcpy is completely attacker-controlled and thus needs to be checked. Observe that the original UPX does a checked omemcpy at this place.

//6//: Another Heap Buffer Overflow when Rebuilding Imports

In the same loop of the function PeFile::rebuildImports (cf. PeFile::rebuildImports), there is another memory corruption:

this->im->iat = this->iatoffs;
this->newiat = &extract_obj->extracted_data[this->iatoffs - (uint64_t)(uint32_t)pefile->rvamin];
while (*p) {
  if (*p == 1) {
    ilen = strlen(++p) + 1;
    if (this->inamespos) {
      //omitted, see above //5//
    }
    else {
      extracted_data = extract_obj->extracted_data;
      dst_ptr = (extracted_data - pefile->rvamin) + (*this->newiat + 2);
      if (dst_ptr < extracted_data) return 0;
      extracted_data_end = &extracted_data[extract_obj->extractbuffer_bytes_written];
      if (dst_ptr > extracted_data_end || &dst_ptr[ilen + 1] > extracted_data_end) return 0;
      strcpy(dst_ptr,p); // <-- memory corruption
      p += ilen;
    }
  }
  else if (*p == 0xFFu) {
    p += 3;
    *this->newiat = ord_mask + *(uint16_t *)(p + 1);
  }
  else {
    // omitted
  }
  ++this->newiat;
}

The problem is that the strings dst_ptr and p can overlap, so we overwrite the string that we called strlen() on earlier. This can turn a terminating null-byte into a non-null byte and when strcpy() is called, the string is longer than expected, overflowing the buffer.

A possible fix is to replace the strcpy(dst_ptr,p) with memmove(dst_ptr,p,ilen).

It looks like original UPX is affected as well. The two commits 14992260 and 1faaba8f are an attempt to fix the problem in the devel branch of UPX.

//7//: Heap Buffer Overflow when Unoptimizing the Relocation Table

Another memory corruption is in the function Packer::unoptimizeReloc (cf. Packer::unoptimizeReloc):

for (uint8_t * p = *in; *p; p++, relocn++) {
  if (*p >= 0xF0u) {
    if (*p == 0xF0u && !*(uint16_t *)(p + 1)) {
      p += 4;
    }
    p += 2;
  }
}
uint32_t * outp = (uint32_t *)malloc(4*relocn + 4);
if (!outp) return -1;
uint32_t * relocs = outp;
int32_t jc = -4;
for (uint8_t * p = *in; *p; p++) {
  if (*p >= 0xF0u) {
    uint32_t dif = *(uint16_t *)(p + 1) + ((*p & 0xF) * 0x10000);
    p += 2;
    if (dif == 0) {
      dif = *(int32_t *)(p + 1);
      p += 4;
    }
    jc += dif;
  }
  else {
    jc += *p;
  }
  *relocs = jc; // <-- oob write
  ++relocs;

  if (!packer->extracted_data) return -1;
  if (bits == 32) {
    if (jc > packer->extractbuffer_bytes_written - 4) return -1;
    uint32_t tmp = *(uint32_t*)&extracted_data[jc];
    packer->extracted_data[jc + 0] = (uint8_t)(tmp >> 24);
    packer->extracted_data[jc + 1] = (uint8_t)(tmp >> 16);
    packer->extracted_data[jc + 2] = (uint8_t)(tmp >> 8);
    packer->extracted_data[jc + 3] = (uint8_t)tmp;
  }
  else {
    // omitted
  }
}

Ignoring the if-branch if (bits == 32), this looks fine. The very first loop runs through the table until a null byte is encountered and relocn counts how many entries there are. Then, a buffer of size 4 * relocn + 4 is allocated, and we run through the table a second time.

The problem is that in the branch if (bits == 32), the endianness of a 4-byte value is swapped, and since the offset jc can point to the position where the null byte was, this can turn a null byte into a non-null byte. If that happens, it is easy to see that in the second loop, the variable p is increased further than in the first loop, and thus the allocated buffer is too small. The write *reloc = jc is then eventually out of bounds.

It looks like the original UPX is affected as well. Commit e03310fc is an attempt to fix the bug in the devel branch of UPX.

//8//: Heap Buffer Overflow when Finishing Building the Relocation Table

The next memory corruption is to be found in the function PeFile::reloc::finish (cf. PeFile::reloc::finish):

*(uint32_t *)&start[4 * counts[0] + 1024] = this->endmarker;
qsort(start + 1024, ++counts[0], 4i64, le32_compare);
rel = (reloc *)start;
rel1 = (uint16_t *)start;
for (ic = 0 ; ic < counts[0]; ++ic) {
  unsigned pos = *(int32_t *)&start[4 * ic + 1024];
  if ((pos ^ this->prev) >= 0x10000) {
    rel1 = rel1;
    this->prev = pos;
    *rel1 = 0;
    rel->size = ALIGN_UP(ptr_diff(rel1,rel), 4); //rel1 increased by up to 3 bytes
    next_rel = (reloc *)((char *)rel + rel->size);
    rel = next_rel;
    rel1 = (uint16_t *)&next_rel[1];// rel1 increased by sizeof(reloc)==8 bytes
    next_rel->pagestart = (pos >> 4) & ~0xFFF;
  }
  *rel1 = ((int16_t)pos << 12) + ((pos >> 4) & 0xFFF); // <-- oob write
  ++rel1;
}

It seems that without the inner if-branch, the bound check ic < counts[0] would be fine, since it is then guaranteed that rel1 is before the position that the index 4*ic+1024 represents. However, if we go into the if-branch, rel1 is increased faster than one would expect. On top of the 2-byte increase at the end of every loop iteration, the if-branch may increase rel1 by 3 bytes (due to the ALIGN_UP(_,4)) and by another sizeof(reloc) == 8 bytes.

It seems like the original UPX is affected as well. Looking at the original code, we see that rel and rel1 are advanced in a call to the function PeFile::reloc::newRelocPos which was inlined in the above (decompiled) code snippet. Interestingly, it is this inlining that makes the bug easy to spot. The UPX developers have been notified about this on September 20, but no patch is available yet.

//9//: Heap Buffer Overflow when Building the Export Table

It looks like the engine is not really fully reconstructing the export table, but only doing some basic virtual address adjustments (compare to the original PeFile::Export::build):

uint32_t num_entries = export_dir_buffer[6]; // attacker-controlled
uint32_t va_dif = export_dir_virtualaddress - outer_export_dir_virtualaddress;
uint32_t table_base_offset = export_dir_buffer[8] - outer_export_dir_virtualaddress;
export_dir_buffer[3] += va_dif;
export_dir_buffer[7] += va_dif;
export_dir_buffer[8] += va_dif;
export_dir_buffer[9] += va_dif;
for(uint32_t i=0; i<num_entries; i++) {
  if ((table_base_offset + 4*i > export_dir_buffer_size)
   || (table_base_offset + 4*i < export_dir_buffer_size)) // <-- what?
    goto LABEL_ERROR;
  *(uint32_t*)((uint8_t*)export_dir_buffer + table_base_offset + 4*i) += va_dif; // <-- oob write
}

The bound check on table_base_offset + 4*i looks very suspicious. It is probably a typing error. Even ignoring memory safety, it is unlikely that this comes close to implementing the desired functionality: the loop can execute at most one iteration, and in this one iteration we have table_base_offset + 4*i == export_dir_buffer_size, which is guaranteed to cause a memory corruption (an attacker-controlled 4-byte integer is written out of bound).

The fixed check looks as follows.

if ((table_base_offset + 4*i >= export_dir_buffer_size)
 || (table_base_offset + 4*i + 3 >= export_dir_buffer_size))
  goto LABEL_ERROR;

It is likely that this bug was introduced due the original check being added via a bound checking macro of UPX in a wrong way.

//10//: Heap Buffer Overflow when Rebuilding Resources

Finally, there is a memory corruption at the very end of PeFile::rebuildResource (cf. PeFile::rebuildResource), where the constructed resources are written back:

if (!*(int32_t *)(&extracted_data[*((int32_t *)pefile->extracted_ntheader + 34)
                                              - pefile->rvamin + 12])) {
  result = memcpy(&extracted_data[*((uint32_t*)pefile->extracted_ntheader + 34)
                                              - (uint64_t)(uint32_t)pefile->rvamin],
                  p,
                  res.dirsize());
}

There is no bound check on res.dirsize().

Note that the original UPX-code has a checked omemcpy.

Conclusion

Virtually all main steps of Bitdefender’s UPX unpacker were affected by a critical memory corruption vulnerability. Interestingly, the actual decompression is perhaps the only step that seems to be unaffected. This is because the engine is extracting into a dynamically-sized buffer (similar to an std::vector) via an API that does not even allow memory corruptions (the only operations used are append_byte(b) and append_bytes(buf,len)).

In almost half of the presented cases, the critical bound check was simply missing. In most the other cases, the bound check was obviously wrong. Perhaps the only memory corruptions that are not completely obvious are //6// and //7//, as they are caused by unexpected overlaps.

It seems that //6//, //7//, and //8// have been inherited from the original UPX code. The UPX developers have been notified about this and are in the process of fixing the bugs. Since the original UPX is clearly not made to unpack untrusted binaries (such as malware) and does not run at a high level of privileges, these bugs are not that critical there. However, in the context of an anti-virus engine these bugs become dangerous vulnerabilities.

Bitdefender’s engine is running unsandboxed as NT\AUTHORITY SYSTEM and is scanning untrusted files fully automatically, making it an attractive target for remote code execution exploits. An actual exploit would have to bypass ASLR and DEP (and possibly the stack canary). The previous exploit on F-Secure Anti-Virus shows that this is not that difficult, even in a non-scriptable environment.

The outlined vulnerabilities are part of Bitdefender’s core engine that is not only used by many of their own anti-virus products on various operating systems (macOS, Linux, FreeBSD, Windows), but is also licensed to numerous other anti-virus vendors. Vulnerabilities in the core engine are therefore immediately affecting numerous users and devices.

Do you have any comments, feedback, doubts, or complaints? I would love to hear them. You can find my e-mail address on the about page.

Timeline of Disclosure

The following timeline looks somewhat scrambled, because the bugs were not discovered in the natural control-flow order that is used for the presentation in this blog post.

  • 2020-07-03 - Discovery of //5//
  • 2020-07-08 - //5// has been patched (the Bitdefender team found the bug before I could report it)
  • 2020-07-15 - Discovery and report of //7//
  • 2020-07-16 - Discovery and report of //9//
  • 2020-07-20 - Bitdefender rolls out patches for //7// and //9//
  • 2020-07-28 - Bitdefender reserved CVE-2020-15729 for //7//
  • 2020-08-09 - Discovery and report of //4//
  • 2020-08-09 - Discovery and report of //6//
  • 2020-08-09 - Discovery and report of //8//
  • 2020-08-12 - Bitdefender has rolled out patches for //4//, //6//, and //8//
  • 2020-08-14 - Bitdefender communicates that they are “planning to stop allocating CVEs automatically for each and every single vuln[erability]” (emphasis in original).
  • 2020-08-15 - Discovery and report of //3//
  • 2020-08-16 - Discovery and report of //1//
  • 2020-08-17 - Bitdefender rolls out patches for //1// and //3//
  • 2020-08-30 - Discovery and report of //10//
  • 2020-09-02 - Bitdefender rolls out patch for //10//
  • 2020-09-04 - The patch for //8// is incomplete, even the same file can be used to trigger a crash. Asking for a full patch. Response: “that particular issue will be fixed via update probably on Monday, next week”.
  • 2020-09-07 - Bitdefender rolls out second patch for //8//
  • 2020-09-07 - The patch for //3// is incomplete, asking for a complete patch for //3//
  • 2020-09-08 - Bitdefender: “Can you give us more details regarding this bug [//3//]? Can you still reproduce this vulnerability?” I describe the problem again and send a new file triggering the bug (since the original one did not trigger it anymore)
  • 2020-09-08 - The second patch for //8// is still incomplete. Submitting a new file to trigger the bug
  • 2020-09-10 - Bitdefender rolls out second patch for //3// and third patch for //8//
  • 2020-09-17 - Discovery and report of //2// (without a PoC file)
  • 2020-09-20 - Reached out to UPX developers, describing //6//, //7//, and //8//
  • 2020-09-21 - Bitdefender rolls out first patch for //2//
  • 2020-09-30 - The third patch for //3// is still incomplete. Submitting a new file to trigger the bug
  • 2020-09-30 - The first patch for //2// is still incomplete. Submitting a more detailed problem description
  • 2020-09-30 - Bitdefender rolls out second patch for //2//
  • 2020-10-05 - Bitdefender rolls out fourth patch for //3//
  • 2020-10-31 - The second patch for //2// checks against an initialized value, but introduces an integer underflow (causing a fully attacker-controlled heap buffer overflow). Submitting a file to trigger the new bug
  • 2020-11-02 - Bitdefender rolls out third patch for //2//

Bug bounties were paid for all submissions (including follow-up submissions pointing out incomplete patches).

Thanks & Acknowledgements

I want to thank the Bitdefender team for fixing all reported bugs. In addition, I want to thank Alex Balan, Octavian Guzu, and Marius Gherghinoiu for providing me with regular status updates.

© 2023 | about