August 8, 2017

F-Secure Anti-Virus: Arbitrary Free Vulnerability via TNEF

The previous posts of this blog series have been about stack based buffer overflows. With this post, I want to move on to bugs that involve dynamic memory management.

Since there are not that many publicly documented arbitrary free vulnerabilities in prominent software products, I thought it would be worth sharing this one.

Introduction

The Transport Neutral Encapsulation Format (TNEF) is an email attachment format developed by Microsoft. It can be used to represent complicated messages and attachments, consisting of many different files and file types, as a flattened stream. Microsoft does not provide an open-source reference implementation. Instead, they provide an official documentation1, so most anti-virus vendors write their own TNEF parser.

The bug presented in this post occurs when parsing the properties of an attachment.

Getting Into the Details

Section 2.1.3.4 of the TNEF documentation1 describes the so-called MsgPropertyList. The following definitions are relevant for this discussion (slightly simplified).

MsgPropertyList = MsgPropertyCount *MsgPropertyValue
MsgPropertyCount = UINT32

MsgPropertyValue = MsgPropertyTag MsgPropertyData
MsgPropertyTag = MsgPropertyType MsgPropertyId [NamedPropSpec]
//NamedPropSpec is optional, it must be present if MsgPropertyId >= 0x8000
//Definition of MsgPropertyData omitted, because irrelevant

MsgPropertyType = UINT16 (simplified)
MsgPropertyId = UINT16
NamedPropSpec = PropNameSpace PropIDType PropMap

PropNameSpace = 16 byte GUID
PropIDType = IDTypeNumber / IDTypeString
IDTypeNumber = %x00.00.00.00 (little-endian 32-bit 0x0)
IDTypeString = %x01.00.00.00 (little-endian 32-bit 0x1)

PropMap = PropMapID / PropMapString
PropMapID = UINT32
PropMapString = UINT32 *UINT16 %x00.00 [PropMapPad]

Let us look at how MsgPropertyList is parsed.

char TNEF::ReadMsgPropertyList() {
  MsgPropertyValue msgpropvalue; //constructor call here
  
  //some init stuff omitted
  if (this->currentoffset + 4 > this->endoffset) { return 0; }
  else {
    uint32_t MsgPropertyCount = *(uint32_t*)(&this->inputdata[this->currentoffset]);
    this->currentoffset += 4;
    this->MsgPropertyCount = MsgPropertyCount;
    for (int32_t processedMsgProperties = 0;
         this->currentoffset < this->endoffset && processedMsgProperties < this->MsgPropertyCount;
         processedMsgProperties++)  {
      int32_t readbytes = msgpropvalue.Read(&this->inputdata[this->currentoffset],
                                                           this->endoffset - this->currentoffset);
      if (readbytes == -1) { return 0; }
      irrelevant_processing(&this->dword14, &msgpropvalue);
      msgpropvalue.Cleanup();
      this->currentoffset += readbytes;
    }
    return 1;
  }
  //destructor call on msgpropvalue here
}

This is pretty much as expected. First, MsgPropertyCount is read from the input. Then, a loop traverses the list of MsgPropertyValues.

Note that the same object msgpropvalue is reused for every new MsgPropertyValue that is read from the list.

Let us dig a little deeper and try to understand how a single MsgPropertyValue is parsed. In fact, it suffices to see how to MsgPropertyTag is parsed, so we will focus only on this part.

int32_t MsgPropertyValue::Read(char *inputdata, int32_t remainingbytes) {
  //omitted some init stuff
  if (remainingbytes < 4) { return -1; }
  this->msgpropertytype = *(int16_t *)(inputdata);
  this->msgpropertyid = *(int16_t *)(inputdata+2);
  int32_t readbytes = 4;
  if (this->namedpropspec_present() == 1) {
    if (remainingbytes < 20) { return -1; }
    memcpy(&this->guid[0], inputdata+4, 16);
    this->currentguid = &this->guid[0];
    if (remainingbytes < 24) { return -1; }
    uint32_t propidtype = *(uint32_t *)(inputdata + 20);
    this->propidtype = propidtype;
    if (propidtype != IDTypeNumber) { // IDTypeNumber==0
      if (propidtype != IDTypeString || remainingbytes < 28) { return -1; } // IDTypeString==1
      size_t propmapstring_size = *(size_t*)(inputdata + 24);
      this->propmapstring_size = propmapstring_size;
      if ((int32_t)(propmapstring_size + 28) > remainingbytes) { return -1; }
      char *propmapstringbuffer = (char*) malloc(propmapstring_size);
      this->PropMap = propmapstringbuffer; //PropMap holds the pointer to the string buffer
      if (!propmapstringbuffer) { return -1; }
      memmove(propmapstringbuffer, inputdata + 28, propmapstring_size);
      readbytes = this->propmapstring_size + 28;
      if (this->propmapstring_size % 4) { readbytes += 4 - this->propmapstring_size % 4; }
    }
    else {
      if (remainingbytes < 28) { return -1; }
      this->PropMap = *(uint32_t*)(inputdata + 24); //PropMap holds the uint32_t PropMapID
      readbytes = 28;
    }
  }
  //rest omitted
  return readbytes;
}

For the most part, this seems to be pretty straightforward, too. Note, however, that the field PropMap does not seem to enjoy real type safety. In particular, the field is used to store a pointer to a dynamically allocated buffer in the first branch (PropMap is an IDTypeString) and a 4-byte integer in the second branch (PropMap is an IDTypeNumber).

Recall that the object msgpropvalue from TNEF::ReadMsgPropertyList (see above) is reused for every new MsgPropertyValue that occurs in the list. As there are dynamic memory allocations in MsgPropertyValue::Read, it is necessary to free the allocated memory after every iteration in order to avoid memory leaks. Therefore, MsgPropertyValue::Cleanup is called at the end of every loop iteration. Additionally, and this turns out to be crucial, the function is called in the destructor of MsgPropertyValue.

void MsgPropertyValue::Cleanup() {
  //other object cleanup omitted
  if (this->currentguid && this->propidtype == IDTypeString && this->PropMap) {
    free(this->PropMap);
    this->PropMap = 0;
  }
}

So far, so good. Now, we exploit the missing type safety of the field PropMap to call free(x) for an attacker chosen x. The control flow is as follows.

  1. An MsgPropertyList is read via TNEF::ReadMsgPropertyListField.
  2. The first call to msgpropvalue.Read() sets propidtype to IDTypeNumber, and PropMap to an arbitrary value x via a NamedPropSpec with PropMap of type IDTypeNumber.
  3. The next call to msgpropvalue.Read() sets propidtype to IDTypeString and returns -1 because remainingbytes is too small.
  4. Now TNEF::ReadMsgPropertyList reads the error and returns, too. This calls the destructor on msgpropvalue, which in turn calls msgpropvalue.Cleanup().
  5. The fields currentguid and PropMap both hold a non-zero value. Moreover, propidtype is still IDTypeString.
  6. free(x)

Triggering the Bug

The outlined control flow describes a class of MsgPropertyLists that will trigger the bug. We embed such an MsgPropertyList into a TNEF stream and feed the engine with it, setting a breakpoint just before the call to free in the function MsgPropertyValue::Cleanup.

0:000> bu fm4av+0x2F157
0:000> g

Breakpoint 0 hit

eax=deadbeef ebx=0057dbe8 ecx=002ae6e0 edx=00000000 esi=0057dbe8 edi=002ae6e0
eip=6ef9f157 esp=002ae694 ebp=002ae6c4 iopl=0         nv up ei ng nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000282
0:000> u
fm4av!fmOpenFileW+0x161f7:
6ef9f157 50              push    eax
6ef9f158 e8ec520000      call    free (6efa4449)
6ef9f15d 83c404          add     esp,4
6ef9f160 c7470c00000000  mov     dword ptr [edi+0Ch],0
6ef9f167 5f              pop     edi
6ef9f168 5e              pop     esi
6ef9f169 5b              pop     ebx
6ef9f16a c3              ret

Note that the value of eax is completely controllable by the attacker, since it is read directly form the input file.

Attacker Control and Exploitation

The attacker can free a completely arbitrary pointer without any restriction. Moreover, the engine runs unsandboxed and as NT Authority\SYSTEM.

Hence, this bug is most likely exploitable for remote code execution as NT Authority\SYSTEM, the only obstacle being ASLR.

Conclusion

We have seen an arbitrary free vulnerability that was the result from missing type safety on PropMap. Relying on a variable t to always reflect the correct type of another variable v is obviously dangerous, especially if the type of v changes frequently. In particular, there is the danger of v and t getting out of sync, which is exactly what happened here.

Do you have any comments, feedback, doubts, or complaints? I would love to hear them. You can find my email address on the about page.

Timeline of Disclosure

  • 11/11/2016 - Discovery
  • 11/11/2016 - Report
  • 11/14/2016 - “raised up to our development team” and “we will get back to you with a progress update”
  • 11/28/2016 - “as for the use-after-free vulnerability, our development team managed to identify the source of the issue and is currently working on a fix to be implemented”
  • 12/09/2016 - “fixes for the issues with [TNEF] you reported have been tested and ready to be deployed. […] However, the development has requested that the database be released when the fixes for the [other] issue has been included as well.”
  • 12/15/2016 - “the development team has decided to release the fixes for [the TNEF] issues you reported to us while still working on a fix for the issues with […]. […] The new database has been released yesterday”
  • 02/28/2017 - Bug bounty paid

Thanks & Acknowledgements

I want to thank the F-Secure team for fixing the bug. In addition, I want to thank Calvin Gan for providing me with regular status updates.

Bonus

The TNEF parser calls the function uint32_t read_input(uint8_t *inputdata) several times to read from the input data buffer. The function looks as follows.

read_input proc near

//inputdata= dword ptr  8

push    ebp
mov     ebp, esp
mov     ecx, [ebp+inputdata]
push    ebx
mov     al, [ecx]
mov     dl, [ecx+1]
mov     byte ptr [ebp+inputdata+3], al
mov     al, [ecx+2]
mov     cl, [ecx+3]
movzx   ebx, al
push    esi
push    edi
movzx   esi, cl
mov     edi, ebx
shl     edi, 8
or      edi, esi
and     edi, 0FF00h
mov     eax, esi
mov     ecx, ebx
shr     ecx, 8
shl     eax, 10h
or      edi, eax
movzx   eax, byte ptr [ebp+inputdata+3]
movzx   edx, dl
or      edi, edx
and     ecx, 0FF00h
and     esi, 0FF0000h
or      ecx, esi
and     edx, 0FF00h
shl     edi, 8
or      ecx, edx
shr     ecx, 8
or      edi, ecx
and     ebx, 0FF00h
or      edi, ebx
or      eax, edi
pop     edi
pop     esi
pop     ebx
pop     ebp
retn

I leave it as an exercise for the interested reader to figure out what this function does. I am also happy to discuss any comments or ideas on the function (my email address can be found here).


  1. [MS-OXTNEF] documents the TNEF format in detail. [return]

© 2017 | about