Avast Antivirus: Remote Stack Buffer Overflow with Magic Numbers

If I told you I found a remotely triggerable stack-based buffer overflow in a conventional anti-virus product, in what part of the software would you expect it to be? A reasonable guess may be: “Probably in the parsing code of some complicated and likely obsolete file format”.

In fact, the most recent anti-virus stack buffer overflows¹ ² ³ clearly show that the implementation of a parser for complex file formats is extremely challenging.

However, I would like to start this blog series with a stack-based buffer overflow that is not of this kind.

Introduction

Let us set up the scene. Given a new file, the anti-virus software needs to decide of what file type it is, such that it can analyze it in the right context. Therefore, the first part of the scanning process usually involves finding the so-called magic numbers that are hinting at the file type. For example, PDF files begin with the ASCII string %PDF-. Now, Avast Antivirus tries to be very thorough with this, scanning the file for occurrences of numerous different magic numbers. For some of those types, such as PDF or RAR, it is not satisfied with just one occurrence, but tries to find multiple occurrences.

Getting Into the Details

In the algo module of Avast’s engine, there is a function find_magicnums that scans a given file for various magic numbers (e.g. Rar! or %PDF-).

When a magic number is found, a variable of type magicnum_t is created:

typedef struct {
  uint32_t type;
  uint32_t offset;
  uint32_t priority;
} magicnum_t;

The field type is an integer that maps to a filetype (such as PDF or RAR), and offset is the offset at which the magic number appears (measured from the beginning of the file).

Having created the variable, it is stored in a stack allocated structure of type magicnum_collection_t:

typedef struct {
  uint32_t max_magicnum_count;
  uint32_t magicnum_count;
  magicnum_t magicnums[MAXMAGICNUMCOUNT];
} magicnum_collection_t;

The function add_magicnum is responsible for inserting a given magic number into the field magicnums of the collection. It does so while making sure that the entries are ordered with respect to their offset, and with respect to their priority in case the offset is equal.

add_magicnum looks somehow like this⁴.

void add_magicnum(magicnum_collection_t *magicnums, magicnum_t *insertmagicnum) {
  uint32_t magicnum_count = magicnums->magicnum_count;
  uint32_t insertrank = 0;

  //we skip those ranks with < offset
  while (insertrank < magicnum_count
      && magicnums->magicnums[insertrank].offset < insertmagicnum->offset) {
    insertrank++;
  }

  //we skip those ranks with == offset and with <= priority
  while (insertrank < magicnum_count
      && magicnums->magicnums[insertrank].offset == insertmagicnum->offset
      && magicnums->magicnums[insertrank].priority <= insertmagicnum->priority) {
    insertrank++;
  }

  if (insertrank < magicnum_count && insertrank + 1 < magicnums->max_magicnum_count) {
    memmove(&magicnums->magicnums[insertrank + 1] /*destination*/,
            &magicnums->magicnums[insertrank]  /*source*/,
            sizeof(magicnum_t) * (magicnum_count - insertrank));
  }

  if (insertrank < magicnums->max_magicnum_count) {
      magicnum_t *new_magicnum = &magicnums->magicnums[insertrank];
      new_magicnum->type = insertmagicnum->type;
      new_magicnum->offset = insertmagicnum->offset;
      new_magicnum->priority = insertmagicnum->priority;
      magicnums->magicnum_count++;
  }
}

It starts by computing the insertrank, which is the index into the magicnums array where the given insertmagicnum should be inserted.

If the new magic number needs to be inserted before another magic number in the collection (that is, if insertrank < magicnum_count), all elements in the magicnums array beginning from insertrank are shifted by sizeof(magicnum_t) bytes in order to make space for the new magic number.

When doing this, we need to be careful not to overflow the magicnums buffer. This is what the check insertrank+1 < magicnums->max_magicnum_count tries to ensure. However, depending on the order in which magic numbers are inserted, it is possible that the array is full, but the computed insertrank is nevertheless (much) smaller than max_magicnum_count-1.

I believe a correct alternative check would ensure that magicnum_count+1 < magicnums->max_magicnum_count (this could be checked even before computing insertrank).

Triggering the Bug

That sounds nice, but are we actually able to insert magic numbers in such a way that the bug is triggered? It is clear that this will depend on how exactly the function add_magicnum is used.

Looking at the function find_magicnums quickly reveals that PDF magic numbers are inserted before RAR magic numbers. Moreover, I estimate MAXMAGICNUMCOUNT to be roughly 32.

Okay, so let us feed the engine with a file that starts with a couple of Rar!s, followed by some %PDF-s.

Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!
Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!Rar!%PDF-%PDF-%PDF-%PDF-%PDF-

If the PDF magic numbers are inserted first, the RAR magic numbers should get a low enough insertrank and eventually overflow the buffer.

As desired, we get the following:

STATUS_STACK_BUFFER_OVERRUN encountered
(438.8a8): Break instruction exception - code 80000003 (first chance)
eax=00000000 ebx=715c38f4 ecx=76d50544 edx=1398db41 esi=00000000 edi=1398ec3c
eip=76d50325 esp=1398dd88 ebp=1398de04 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
kernel32!UnhandledExceptionFilter+0x5f:
76d50325 cc              int     3

On Attacker Control and Exploitation

Now, the attacker has numerous possibilities to overwrite the stack with those 12 byte magicnum_t structs. First and most importantly, she has full control over the offset field⁵. Moreover, she can choose between many different values for the type field and the priority field to write on the stack. In fact, the type field is assigned values from 7 to 449. Only a few remain unused, so that the total number of actually used magic number types is approximately 300 (in the meantime, it may be more).

Obviously, this vulnerability can be easily exploited remotely, for example by sending an e-mail with a crafted file as attachment to the victim.

However, to exploit the vulnerability for arbitrary Remote Code Execution, another bug would be required to circumvent the stack canary (or SafeSEH), as Avast Antivirus uses /GS as well as SafeSEH on Windows⁶ and I assume -fstack-protector is used on Linux.

Conclusion

We have seen that highly critical memory corruption bugs can appear even in very simple functions. This is probably as simple as it gets. There is no need for complicated file parsers.

Having said that, you can expect posts about very involved bugs in anti-virus file parsers to appear on this blog.

Do you have any comments, feedback, doubts, or complaints? I’d love to hear them. You can find my e-mail address on the about page.

Alternatively, you are invited to join the discussion on HackerNews or on /r/netsec.

Timeline of Disclosure

2016-09-23 - Discovery
2016-09-24 - Reported
2016-09-29 - Confirmed and patch rolled out
2016-12-16 - Bug bounty paid

Thanks & Acknowledgements

I want to thank Avast Software and especially Igor Glücksmann for their fast response. Fixing a vulnerability and actually rolling out the patch within such a short time frame is remarkable.

https://bugs.chromium.org/p/project-zero/issues/detail?id=823 ↩︎
https://bugs.chromium.org/p/project-zero/issues/detail?id=814 ↩︎
https://bugs.chromium.org/p/project-zero/issues/detail?id=518 ↩︎
In case you are interested or you do not trust my pseudocode, you might want to have a look at the assembly version in text form or in graph form. ↩︎
Well, there probably is a maximum file size that limits the possible choices. But still, the attacker has a lot of control over this value. ↩︎
In fact, they seem to use even Control Flow Guard (CFG) on Windows versions that support it. ↩︎

June 27, 2017