Malware and Memory Forensics

Memory forensics is the process of analyzing physical memory (RAM) in order to find artefacts and evidence of malware on a system. For a complete analysis, this is done alongside disk and network forensics, but that is more a thing in COMP6254: Digital Forensics.

Malware that doesn't have files stored on disk is not memory-only, as that would mean that it couldn't exist between reboots, unless reinfected, as the state of RAM is cleared when the system power cycles.

Typically, fileless malware persists in a registry key or some EEPROM storage within the system, so that it can be run again after a reboot.

Useful Data from Memory Dumps

Memory dumps can reveal lots of useful information to us that the malware hasn't persisted to disk in some way. Take, for example, a ransomware that encrypts all data with a symmetric key (dumb, I know, but it's probably some naïve script kiddie). This key might remain in memory once the encryption process has finished, instead of clearing the memory after using the key, or, more sensibly, just using a public key cryptosystem, where the private key is never in memory in the first place.

Memory dumps can also allow us to recover loaded kernel drivers (which may be tainted), malware created artefacts, application information at the session level, currently running processes, and any active network connections, to name a few.

Memory dumps differ from using native system calls, as they capture the raw memory (as it is in the virtual machine), instead of relying on a possibly rootkitted operating system. A comparison table is given below:

Function	Internal System Tooling	Volatility (External)
Running Processes	✓	✓
Terminated Processes		✓
Hidden Processes		✓
Open Handles	✓	✓
Network Connections and Sockets	✓	✓
Kernel Modules		✓
Process Dumps	✓	✓
Strings	✓
Logged On Users	✓	✓
Command Shell History		✓
MFT & MBR Analysis		✓
Cached Files and Registry		✓
Rootkits and API Hooks	Limited	Robust

Memory Analysis

Volatility

In the above table, Volatility was mentioned as a framework for memory analysis, external to the system. It is important, where possible, to take a memory dump from a system that is running in a VM, then analyse that memory where the system can't mask malware, for example.

External tools can recover the entire state of the system, and some historical information, e.g., processes that have finished running, and data that is stored in memory but not written to disk and not easily accessible through an API.

Any advanced and stealthy malware will only operate in memory, as most AV engines are focused on the signatures of the files. Malware and incident response tools can still be run, but this should occur after collecting the memory dump, before those tools attempt their own removal.

Duqu 2

This was an APT attack against Kaspersky in 2015. There were 3 zero days in the Windows API that were being exploited and the only way to establish where and what the malware was doing was to take a forensic memory dump from the machine, then identify based on the memory dumps from machines that hadn't been compromised, versus those that had.

Why is it so Hard?

Memory analysis is harder than disk analysis, because memory is often harder to take a dump of, as it is lost when the computer is rebooted or shutdown. Memory is also much less structured than filesystems, as it is used by computers rather than humans for the most part.

Memory layout also differs based on the operating system that is running, and even the version or build of the same operating system. Sometimes data is also not available in memory, as on systems with low RAM, less frequently used data is 'swapped out' to disk by the OS to allow it to run more applications without crashing.

Techniques

Memory analysis takes three main forms: unstructured, structured, and file craving.

Unstructured makes use of tools like strings, less and regular expressions, to search for things such as passwords, logs, or domains. Structured tools try to reconstruct the OS abstractions (including the process list, heap and stack), then interpret the data in the memory dump just as a live tool on the OS would.

OS Internals

The OS makes use of data structures to keep data organised. Data typically has a way to be inserted, a way to be modified, and a way to be deleted. The data structure will often start at one memory address and have related data close to it. For example, an array stores each data element at the next offset. Strings are just special types of arrays.

Other data structures can include doubly linked list, where each process has a forward link and a back link to every other process in memory.

Memory Management

To aid in memory analysis, it is often useful to think in terms of how the memory is physically organised and the layers and abstractions the OS puts on it to make it appear simple to programs, and encapsulate various security features.

Memory management will set up virtual memory pages, which it tracks in the TLB (translation lookaside buffer). When a user requests a page, it is either a hit in the TLB, and the offset to the physical memory page can be added, of the TLB is missed and we have to search the page table. This is something covered in COMP1203, and to an extent COMP2215, and isn't massively relevant here.

Page Files and Swap

Non-paged memory (e.g., the original OS kernel) can't be swapped to disk. Anything that is in paged memory is a candidate and the less frequently accessed pages are normally the ones swapped out. Typically, the registry is in memory, as are the headers of PE files. When files are mapped into processes, the virtual memory is reserved for the file, but the physical pages aren't allocated until there is a read on the file.

Tools to extract swap exist, but these must be run on the same machine.

Memory in x86

We are focusing on the memory within Windows on x86, as that is the most common. Initial versions of x86 had a 32-bit virtual and physical address space with a total of 4GB addressable memory. Later versions of x86, including PAE (physical address extension) refined the standards for the buses on die, so that there was a 32-bit virtual and 36-bit physical address, bringing the total addressable space to 64GB.

With the advent of x86_64, this increased to a 64-bit virtual address, with an actual on-die space of 48-bits. This means we can now address 256TB of memory, but this is reduced to just 8TB in user mode.

Linux

On linux, we can simply ask the system for the current memory layout with sudo cat /proc/iomem. We need to be superuser as otherwise it just spits out zeroes, for security reasons. We often care about the system RAM portion, as this is where the kernel is running, and any alterations to this could cause a rootkit to work.

Address Translation

Memory forensics tools must implement the same address translation that the CPU does, as the memory dump is of the physical memory layout.

Memory Acquisition

When doing memory analysis, the first and most important step is to take a dump of the RAM into something persistent. When doing this, we need to think about why we need to collect the RAM, the initial indicators to look for, actions that have already been taken by previous analysts, and details of any suspect systems.

We take a dump of the data when the machine is quiet and not doing anything where the memory might change, e.g., whilst doing an antivirus scan or a system backup. Normal user activity may also affect the memory.

On Virtual Machines

When using a VM, we can pause the VM, take a snapshot of the memory, then continue running. Within qemu, we can move to shell mode to perform this acquisition.

This of course works well if we have the right access to the VM, but might be slow and bring the machine offline.

Not Running

If the machine is not running when we want to take a snapshot, that may be because it is hibernating or suspended. If this is the case, we can collect dump files, hibernation files, or the filepage.

Locked/No Admin Access

Sometimes the machine is locked down so that we can't get admin access on it. In this instance, we have to resort to hardware-based approaches. Some of these approaches include Firewire, which is no longer in use, and more recently, PCIe/USB/Thunderbolt based approaches.

These options can be quite costly as the bus speeds are so high. The PCIe-based devices are connected to the computer and can then use DMA to read specific parts of the memory out from the computer. The most popular open source implementation is PCILeech. It is so expensive, because the most recent versions of PCIe (generation 5) operate at 32Gbps for a x1 slot, and 256Gbps for a x8 slot.

The issues in general with these approaches is that we need to have hardware installed on the device. Firewire's direct memory access is now disabled on recent OS versions as a security measure too.

We may also need to mess with the virtualization settings of the UEFI firmware.

Issues

If we are dumping from a live system, then the state of some of the registers might change mid-dump. In particular, the caches, TLB and registers which are important for reconstructing the overall state frequently change.

Furthermore, some ranges in PCIe can cause the kernel to panic. As mentioned earlier, we may also have security measures in place, preventing us from accessing the memory directly.

Dumps of the memory will also take longer than normal perceived memory speeds would lead us to believe as we take all memory in a dump the caches quickly become useless.

We also have to be careful to ensure that the malware doesn't spread to our capture device.

Programs

There are a couple of programs that can enable us to do live memory dumps from the system that we are on, typically loaded into the kernel. For Linux, there is LiME (the Linux Memory Extractor), and for Windows there is Winpmem.

Volatility

Volatility is an open source forensics framework for the analysis of memory dumps from Windows, Linux and MacOS. The framework is used by forensics investigators and other researchers to analyse memory samples.

It also forms the basis for virtual machine introspection (live realtime debugging of a VM), in libraries such as libvmi.

Voltaility comprises many cutting edge malware research techniques and offers prizes for the best new plugins each year. It is a Python library that can be invoked as a library or through the CLI frontend.

It works on x86-based machines and x64-based machines across Windows, Linux, MacOS and Android. It is open source and licensed under the GNU GPLv2 licence. The framework makes use of the distorm3, pycrypto, and yara libraries, and has support for 10+ different types of memory and process dumps, e.g., raw, crash, hiber, VMWare, VirtualBox, QEMU, hpak, lime, ewf, firewire, hyper-v, xen and KVM dumps.

It is very fast and can be used for offline forensics research, but is not a debugger.

Profiles

Volatility has what it calls are profiles, which are essentially the memory layouts of each system. We specify a profile when invoking volatility so that it can analyse the memory properly.

Fortunately, there is a tool that allows us to scan a memory dump from an unknown system and match it to the memory profile that volatility thinks it is. One such plugin to do this is the kdbgscan plugin.

Plugins

Volatility has an extensible and extensive plugins interface, and we can either use pre-made plugins or create our own. The pslist plugin, for example, will list processes on the system, with their PIDs, run status, and executable.

The Windows Object Manager

Structures become objects when Windows appends headers to the start of the structure to manage services, such as naming, access control and reference counts.

Examples of objects include files, drivers, mutexes, processes, threads and events. On Windows 7, there are approximately 45 different types of object.

The manager is also responsible for the allocation and deallocation of the objects, as well as access to them. Example object types include Process, with underlying structure _EPROCESS, Thread with _ETHREAD, File with _FILE_OBJECT, Key with _CM_KEY_BODY (for registry keys), Mutant with _KMUTANT for mutexes, Driver with _DRIVER_OBJECT for a driver object, Token with _TOKEN for an object storing security and access controls, and a SymbolicLink with _OBJECT_SYMBOLIC_LINK for aliases.

All object headers have the same main _OBJECT_HEADER structure, which includes the reference counts to the object, a security descriptor, the object type, and flags. This consumes 0x18 = 24 bytes, with further optional headers.

Different object types have different defaults within each header, e.g., the Process object might layout memory different to the File object (e.g., paging vs. non-paging). All objects also have a 4-byte identifier unique to the object, which can be useful for telling us where the object is, and where to look (e.g., in paged memory, or non-paged memory).

Kernel Pool Allocation

When users interact with the Windows API, they create kernel objects for their actions. Each of these objects are allocated from the system memory pool, as opposed to the user memory. For example, the ObCreateObject() call calls ExAllocatePoolWithtag(), which then creates an allocation and prepends the _POOL_HEADER to the start of the memory allocation.

As all objects are allocated to pools, we have memory pool allocations, with a base at _POOL_HEADER. Somehow, we need to fingerprint this to get a more reliable way of finding all objects.

For objects in a pool, then, we have the pool header, followed by some optional headers, then the main object header, and finally the object body.

Scanning for Objects in Pools

Now we know the pools, we can find the objects that exist within that pool, by looking at the PoolType of the object, then the pool tag, which is derived from the Key in the object. Sometimes these can be protected.

Memory pool allocations are a minimum of the object size for the type, and a block size multiplied by the pool alignment gives the total for the allocation.

When volatility scans the memory dump, it looks for the tags for the object type we want in the type of memory we want to look through. There are different plugins for processes, threads, desktops, etc., which we can use to look for different types of object.

Pool Tagging

Windows tags pools for debugging purposes. These pool tags can be used to see the memory that a particular tag is taking up, which can then be tied to a driver, or process. We use this tagging to help find memory leaks.

Pool tagging is a non-essential system function and can be modified without making the system crash. Pools can be reused for similar or smaller-sized allocations, to save reallocating and causing potentially more external fragmentation of the memory.

Processes

Processes are containers for resources which are used when we execute a program. For simplicity and security for the end programs, we assign them their own private address space which is their own. The only programs that do not have this private space are shared memory (e.g., for inter-process communication), and most of the kernel memory.

The process object also stores the executable instructions, open handles (e.g., files, registry keys), security contexts, and threads which it is using.

Critical System Processes

In the process list, the following processes should exist:

lsass.exe is the local authority responsible for password changes, access tokens, etc.
smss.exe is the session manager subsystem
services.exe is the service control manager
crss.exe is the client/server runtime subsystem, handling the creation of processes, threads, console windows, etc.
winlogon.exe is the interactive logon manager
explorer.exe is the file explorer
svchost.exe is the shared service host process.

Process Relationships

We should see that these processes are managed mainly by services.exe or svchost.exe. If another program, e.g., a web browser is executing a process under the critical list, then that is an immediate red flag.

Active Process Lists

KDBG points to the head of a doubly-linked list. All APIs reference this list to get information about the processes on a system. If we can somehow get a process to not be referenced to by a next or previous pointer, then we can hide the process from the list, but still have it running.

The list PsActiveProcessHead may contain many terminated processes, in addition to the ones that are actually being used. The list doesn't remove a terminated process until the reference count reaches zero. If the parent process forgets to close the handle, then the subprocess will still have a reference and will thus never be removed. Fortunately, we can look at the exit time of the processes, the number of threads they have open and the number of handles they have open to see if they are still active.

DKOM

This is the process for hiding a process within the system. We can either map in the physical memory or access the memory through the kernel with debugging functions. We then change the Flink for forward link and Blink for backwards link for the process list to hide our processes.

Kernel Mode

When we want to use kernel mode on Windows, we have to load a driver to access the kernel mode structures. This driver has to be signed by both the CA for the manufacturer, and the Microsoft CA.

If DKOM hides the process from the list we still need to find it. One way to do this is to look at the headers for various memory pools, to see where the pool for the process list is. From there, we can simply scan the memory of that pool. As much as malware tries to hide itself on the system, e.g., by removing references in the process head, provided we have a memory dump, we can find it somewhere.

Csrss.exe

This is the client-server runtime subsystem, and is involved in the creation of all processes and threads. This manages keeping an open handle to all objects for the duration of their lifetime.