WinAPI Hooking

Malware technique

Published on 09/09/2024

TL;DR

If you're only here for the source code :
- Hooking implementation programs here.
- Inline hook detection programs here.

Introduction

This article focuses on the study of injection for hooking Windows API functions. We'll explore what hooking is, how it works and how it's implemented. The hooking technique can be used in a variety of ways, for both attack and defense.
We'll look at its use both for malicious purposes, to take control when a function is called, and for analysis purposes, to monitor program execution at specific points, such as the beginning and end of a function (similar to the API monitor program).
This approach offers several advantages, not least the ability to analyze a program outside a debugger, thus avoiding anti-debug techniques.

The study was carried out on a Windows 11 environment, version 23H2.

API Hooking

API hooking is a technique that consists in taking control of a function by modifying its first instructions to redirect execution to a specific code. This method serves several purposes: it allows you to intercept the arguments passed to the function in order to analyze what is going to be executed and how it is structured. This approach is also used by some EDRs.

However, this technique is also used to hide code in places where analysts don't usually think to look, as they often rely on calls such as call VirtualAllocEx because they trust the WinAPI.

The following diagram summarizes the WinAPI:

Here, we take as an example the VirtualAlloc function which comes from the Kernel32.dll library, this function ends up pointing to the NtAllocateVirtualMemory function which comes from the Ntdll.dll library, this function contains as its only instructions :

mov eax, 0x0018
syscall

From the syscall instruction, the SSDT structure is browsed to location 0x0018 in order to retrieve a pointer to NtAllocateVirtualMemory which this time comes from NtOsKrnl.exe which is simply the system kernel.
In a hooking context, we would have the following situation:

Immediately after the VirtualAlloc function call, a jmp instruction will jump to the HookedVirtualAlloc function containing our code. At the end of this code there will be a pointer to the real VirtualAlloc function in order to complete the initial request, this is the trampoline technique.

Process isolation

We can't simply modify a WinAPI function once and apply it to all the processes in the system. The operation must be repeated for each process we wish to hook. This constraint is due to process isolation, which ensures that each process has its own memory space and prevents data sharing. However, it is still possible for one process to share part of its memory with another.

This can be confusing, as the memory address of an in-memory library is identical for all processes on the system using the same architecture (x86 or x64). However, each process has its own copy of the library when it is created.

For example, in the screenshot below we can see that there are several instances of kernel32, yet they all point to the same address:

Although each process points to the same virtual address, they actually point to different physical addresses.

To illustrate this principle, here's a diagram showing two processes using the same function from the same library, one of which has been hooked:

My solution

we have the following constraints:

- We need to inject into several processes, regardless of their architecture (x86 or x64).
- We want to be informed immediately when a new process is created, without using periodic scans.
- The hooking must be totally transparent to the target process and must not interfere with its behavior or actions.
- The hooking process should have two phases:
* The hooking of the target function.
* The hooking of the return address, to control execution both before and after the function call.

To respond to these constraints, we decided to develop a driver that will be notified of the creation of new processes using the PsSetCreateProcessNotifyRoutine callback. This driver will pass information about the new process to two separate injector programs: one designed for 32-bit processes and the other for 64-bit processes.

The purpose of these two programs is to inject their respective DLLs into the newly created process. The injected DLLs will contain the code needed to hook the function, execute the function requested by the original program, and hook the return address.

Here is a diagram to get a better understanding of the solution:

As well as a sequence diagram to help visualize the stages over time:

Initially, the driver will be installed by creating a kernel service. Once the initialization phase is complete, the driver will listen for the creation of new processes via the PsetCreateProcessNotifyRoutine callback, and will also receive requests from our injectors located in user space.

When a request is initiated by the injectors, the driver returns a structure containing the PID and process name to the 64-bit injector, and only the PID to the 32-bit injector. Once the injectors have received the PID of the process to be processed, they begin the injection phase on the targeted process.

The following programs have been developed:

- Poucave.sys : our driver, which communicates with Injector32.exe and Injector64.exe to send them information.
- Injector32.exe / Injector64.exe : our injectors.
- Hook32.dll / Hook64.dll : our DLL for hooking.

KernelSpace - Poucave.sys

In this section, we'll describe each section of the driver code in detail. As a reminder, the purpose of this program is to rapidly transmit, in user space, information concerning the creation of a new process to the Injector32.exe program for 32-bit processes and to the Injector64.exe program for 64-bit processes.
The driver declares two IoControlCode keys in order to communicate with the injectors:

#define IOCTL_GET_DATA_32 CTL_CODE(FILE_DEVICE_UNKNOWN, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_GET_DATA_64 CTL_CODE(FILE_DEVICE_UNKNOWN, 0x801, METHOD_BUFFERED, FILE_ANY_ACCESS)

We also declare:

- The MyProcessInfo structure enables us to store information about the process created, such as its PID and name.
- Variables containing the name of the object and the name of the symbolic link.
- The processInfoLock variable of type KSPIN_LOCK will be used as an argument in the KeAcquireSpinLock and KeReleaseSpinLock functions to lock and unlock resources, as well as in the KeInitializeSpinLock function to initialize the SpinLock.
- The IsProcess32Bit function, which gives us the process architecture based on the value of Wow64Process in the EPROCESS structure:

struct MyProcessInfo {
	HANDLE PID;
	WCHAR processName[256];
};
struct MyProcessInfo processInfo;

UNICODE_STRING devName = RTL_CONSTANT_STRING(L"\\Device\\AgentDriver");
UNICODE_STRING symLink = RTL_CONSTANT_STRING(L"\\??\\AgentDriverLnk");
KSPIN_LOCK processInfoLock;

BOOLEAN IsProcess32Bit(HANDLE PID, NTSTATUS *status) {
	PEPROCESS Process;
	*status = PsLookupProcessByProcessId(PID, &Process);

	PVOID Wow64Process = *(PVOID*)((PUCHAR)Process + 0x580); 
	ObDereferenceObject(Process);
	if (Wow64Process != NULL) {
		return TRUE;
	} else {
		return FALSE;
	}
}

Objects and symbolic links

In this program, we're going to create an object that will be associated with our driver, but this object will only be accessible from kernel space. Consequently, to enable our user-space injectors to manipulate this object, we need to create a symbolic link. This symbolic link will act as an access point enabling injectors to communicate with the driver from user space.

IoCreateDevice(DriverObject, 0, &devName, FILE_DEVICE_UNKNOWN, 0, FALSE, &DeviceObject);
IoCreateSymbolicLink(&symLink, &devName);

[/p]

Callback

The kernel also has an API containing a list of exported functions. You will find a list of callback here.
Among all these callback, we are going to use PsSetCreateProcessNotifyRoutine :

NTSTATUS PsSetCreateProcessNotifyRoutine(
  [in] PCREATE_PROCESS_NOTIFY_ROUTINE NotifyRoutine,
  [in] BOOLEAN                        Remove
);

// The structure of expected function for NotifyRoutine argument
PCREATE_PROCESS_NOTIFY_ROUTINE PcreateProcessNotifyRoutine;
void PcreateProcessNotifyRoutine(
  [in] HANDLE ParentId,
  [in] HANDLE ProcessId,
  [in] BOOLEAN Create
)
{...}

This function will allow us to be notified when a new process will be created.
This is how it is initialized in our program:

PsSetCreateProcessNotifyRoutine(sCreateProcessNotifyRoutine, FALSE);

This is the implementation of the function sCreateProcessNotifyRoutine :

void sCreateProcessNotifyRoutine(HANDLE ppid, HANDLE pid, BOOLEAN create) {
	UNREFERENCED_PARAMETER(ppid);
	if (create) {
		KIRQL oldIrql;
		PEPROCESS process = NULL;
		UNICODE_STRING* processImageName = NULL;

		KeAcquireSpinLock(&processInfoLock, &oldIrql);
		RtlZeroMemory(&processInfo, sizeof(processInfo));
		processInfo.PID = pid;

		if (NT_SUCCESS(PsLookupProcessByProcessId(processInfo.PID, &process))) {
			if (NT_SUCCESS(SeLocateProcessImageName(process, &processImageName)) && processImageName != NULL) {
				size_t length = min(processImageName->Length / sizeof(WCHAR), 255);
				wcsncpy_s(processInfo.processName, sizeof(processInfo.processName) / sizeof(WCHAR), processImageName->Buffer, length);
				processInfo.processName[length] = L'\0';
			}
			ObDereferenceObject(process);
		}
		KeReleaseSpinLock(&processInfoLock, oldIrql);
	}
}

We use the functions PsLookupProcessByProcessId, to retrieve the process PID, and SeLocateProcessImageName to obtain the process name. SpinLock is used to avoid competition between sCreateProcessNotifyRoutine and DriverAgentSendData functions, which we will see shortly, as these two functions access the data in the MyProcessInfo structure. We want to ensure that the data in this structure does not change during the response to the injectors.

IRP requests

IRP requests are data structures used to manage input and output operations between the system and hardware peripherals, and between drivers and user-space programs. In this case, we will use IRP requests to transfer information from the processes created to the injectors.
Our IRP requests are initialized as follows:

DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = DriverAgentSendData;
DriverObject->MajorFunction[IRP_MJ_CREATE] = DriverAgentCreateClose;
DriverObject->MajorFunction[IRP_MJ_CLOSE] = DriverAgentCreateClose;

The IRP_MJ_CREATE and IRP_MJ_CLOSE types both point to the DriverAgentCreateClose function, which handles handle openings and closings initiated on the injector side in order to avoid BSOD.
An overview of this function:

NTSTATUS DriverAgentCreateClose(PDEVICE_OBJECT DeviceObject, PIRP Irp) {
	UNREFERENCED_PARAMETER(DeviceObject);

	Irp->IoStatus.Status = STATUS_SUCCESS;
	Irp->IoStatus.Information = 0;
	IoCompleteRequest(Irp, IO_NO_INCREMENT);
	return STATUS_SUCCESS;
}

The IRP_MJ_DEVICE_CONTROL type points to the DriverAgentSendData function, which is the heart of the program. It is used to distinguish between requests from Injector32.exe and Injector64.exe, as well as the architecture of newly created processes, in order to send the information to the right injector.
An overview of this function:

NTSTATUS DriverAgentSendData(_In_ PDEVICE_OBJECT DeviceObject, _Inout_ PIRP Irp) {
	UNREFERENCED_PARAMETER(DeviceObject);

	PIO_STACK_LOCATION pIoStackIrp = IoGetCurrentIrpStackLocation(Irp);
	NTSTATUS status = STATUS_SUCCESS;

    // Associates buffer variable with a buffer in user space which will contain the data
	PVOID buffer = Irp->AssociatedIrp.SystemBuffer; 
    
	PEPROCESS Process;
	if (buffer == NULL) {
		status = STATUS_INSUFFICIENT_RESOURCES;
		Irp->IoStatus.Status = status;
		IoCompleteRequest(Irp, IO_NO_INCREMENT);
		return status;
	}

    if (processInfo.PID != 0) {
		KIRQL oldIrql;
		KeAcquireSpinLock(&processInfoLock, &oldIrql);
        struct MyProcessInfo localProcessInfo = processInfo;
		KeReleaseSpinLock(&processInfoLock, oldIrql);

		BOOLEAN is32Bit = IsProcess32Bit(localProcessInfo.PID, &status);
		if (!NT_SUCCESS(status)) {
			Irp->IoStatus.Status = status;
			IoCompleteRequest(Irp, IO_NO_INCREMENT);
			return FALSE;
		}

		if (is32Bit && pIoStackIrp->Parameters.DeviceIoControl.IoControlCode == IOCTL_GET_DATA_32) {
            // Copy data from kernel space to user space
			RtlCopyMemory(buffer, &localProcessInfo, sizeof(struct MyProcessInfo));
			Irp->IoStatus.Information = sizeof(struct MyProcessInfo);
            RtlZeroMemory(&processInfo, sizeof(processInfo));
		} 
		else if (!is32Bit && pIoStackIrp->Parameters.DeviceIoControl.IoControlCode == IOCTL_GET_DATA_64) {
            // Copy data from kernel space to user space
			RtlCopyMemory(buffer, &localProcessInfo, sizeof(struct MyProcessInfo));
			Irp->IoStatus.Information = sizeof(struct MyProcessInfo);
            RtlZeroMemory(&processInfo, sizeof(processInfo));
		}
	}

	Irp->IoStatus.Status = status;
	IoCompleteRequest(Irp, IO_NO_INCREMENT);
	return status;
}

In this code, we only process requests when a new process is detected. To differentiate a new process from an old one, we use the data stored in the processInfo structure. The data in this structure is freed after it has been evaluated, which means that after the RtlZeroMemory function is called, the PID value is reset to 0. This explains the processInfo.PID != 0 condition, which checks that the structure does indeed contain information for a new process.

We also distinguish the origin of the request based on the value of pIoStackIrp->Parameters.DeviceIoControl.IoControlCode, which tells us which injector initiated the request. Finally, the data is transferred from kernel space to user space in the buffer associated with the buffer variable.

UserSpace - Injector.exe

This section consists of two parts: the 32-bit version of the program and its 64-bit version. As a reminder, the purpose of these two programs is to inject their respective DLLs: Hook32.dll for the 32-bit version and Hook64.dll for the 64-bit version. Once the DLL has been injected, it needs to be loaded into the target process to execute the code it contains.

The first step is to declare the structure containing the process information and the IoControlCode key.

For Injector32.exe :

#define IOCTL_GET_DATA_32 CTL_CODE(FILE_DEVICE_UNKNOWN, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)

struct MyProcessInfo {
    HANDLE PID;
    WCHAR processName[256];
};

const char* MyDLL32 = "Hook32.dll";

For Injector64.exe :

#define IOCTL_GET_DATA_64 CTL_CODE(FILE_DEVICE_UNKNOWN, 0x801, METHOD_BUFFERED, FILE_ANY_ACCESS)

struct MyProcessInfo {
    HANDLE PID;
    WCHAR processName[256];
};

const char* MyDLL64 = "Hook64.dll";

Communication with the kernel

A handle is placed on the symbolic link associated with the Poucave.sys driver:

HANDLE hDevice = CreateFile(L"\\\\.\\AgentDriverLnk", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);

Our requests are made using the DeviceIoControl function:

DeviceIoControl(hDevice, IOCTL_GET_DATA_32, NULL, 0, &processInfo, sizeof(processInfo), &bytesReturned, NULL)

The DLL is injected into the target process using the WhatTheHook32 function:

if (isProcess32(hCurrentProc, &isWow64)) { // new check on the architecture type
    WhatTheHook32(hCurrentProc); // This will be WhatTheHook64() for Injector64.exe
}

Hook injection

To inject and execute the DLL in the target process, we need to use a series of WinAPI functions:

- VirtualAllocEx, allocates a new memory space in the target process to store the path of the DLL to be loaded.
- WriteProcessMemory, writes the DLL path to the allocated memory space in the target process.
- CreateRemoteThread, creates a thread in the remote process. The fourth argument to this function is LoadLibraryA, which will be executed by this thread to load the DLL into the target process.

An overview of the function :

int WhatTheHook32(HANDLE hCurrentProc) {
    LPVOID pRemoteAddr = VirtualAllocEx(hCurrentProc, NULL, strlen(MyDLL32) + 1, (MEM_COMMIT | MEM_RESERVE),PAGE_READWRITE);

    WriteProcessMemory(hCurrentProc, pRemoteAddr, (LPCVOID)MyDLL32, strlen(MyDLL32) + 1, NULL);

    HANDLE hRemoteThread = CreateRemoteThread(hCurrentProc, NULL, 0, (LPTHREAD_START_ROUTINE)LoadLibraryA, pRemoteAddr, 0, NULL);

    WaitForSingleObject(hRemoteThread, INFINITE);
    VirtualFreeEx(hCurrentProc, pRemoteAddr, 0, MEM_RELEASE);

    CloseHandle(hRemoteThread);
    return 0;
}

UserSpace - Hook.dll

For the first tests I chose to hook the VirtualAllocEx function and execute the NtSuspendProcess function to pause the process. Pausing the process before and after calling a hooked function seems quite appropriate.

Concerning the hook strategy, we will first deal with VirtualAllocEx in the InstallMyHook function and then we will deal with the return address hook in the HookedVirtualAllocEx function.

Firstly, we need to declare our function pointer types for VirtualAllocEx and NtSuspendProcess. As well as some variables that will be useful during the hook phase:

typedef LPVOID(WINAPI* VirtualAllocExType)(
    HANDLE hProcess,
    LPVOID lpAddress,
    SIZE_T dwSize,
    DWORD flAllocationType,
    DWORD flProtect
    );

typedef LONG(NTAPI* NtSuspendProcessType)(
    IN HANDLE ProcessHandle
    );

VirtualAllocExType TrueVirtualAllocEx = NULL; // Will contain a pointer to the true VirtualAllocEx to hook
BYTE originalBytes[5];   // Will contain the first 5 bytes of VirtualAllocEx
BYTE originalRetAddr[5]; // Will contain the first 5 bytes of code from the return address
void* returnAddress;     // Will contain a pointer to the return address

This program includes two important functions that are used when hooking and restoring: PlaceHook and RestoreOriginalFunction.
Overview of the PlaceHook function:

void PlaceHook(BYTE* pOrigFunc, BYTE* origBytes, DWORD_PTR pHookedFunc) {
    DWORD oldProtect;
    if (!VirtualProtect(pOrigFunc, pageSize, PAGE_EXECUTE_READWRITE, &oldProtect)) {
        printf("[-] VirtualProtect failed to change protection: %d\n", GetLastError());
        return;
    }

    memcpy(origBytes, pOrigFunc, 5);

    DWORD offset = (pHookedFunc - (DWORD_PTR)pOrigFunc) - 5;
    BYTE jmp[5] = { 0xE9, 0, 0, 0, 0 };
    *(DWORD*)((BYTE*)jmp + 1) = (DWORD)offset;

    memcpy(pOrigFunc, jmp, 5);

    if (!VirtualProtect(pOrigFunc, pageSize, oldProtect, &oldProtect)) {
        printf("[-] VirtualProtect failed to restore protection: %d\n", GetLastError());
    }
}

PlaceHook is relatively simple, it allows you to patch the target function by first saving the 5 first bytes of it in the origBytes variable, referring to originalBytes, as these bytes are going to be replaced. Next, a relative jump is constructed by calculating the distance between the address of the function to be patched (VirtualAllocEx) and the address of the replacement function (HookedVirtualAllocEx). This distance is then added to the 0xE9 instruction, which represents the jmp instruction:

jmp <distance between VirtualAllocEx and HookedVirtualAllocEx>

Once our new instruction is established (contained in 5 bytes) we use it to patch VirtualAllocEx with the memcpy function.

The RestoreOriginalFunction function is much simpler, as it simply restores the original bytes of the function that has been hooked:

void RestoreOriginalFunction(BYTE* pOrigFunc, BYTE* origBytes) {
    DWORD oldProtect;
    if (VirtualProtect(pOrigFunc, pageSize, PAGE_EXECUTE_READWRITE, &oldProtect)) {
        memcpy(pOrigFunc, origBytes, 5);
        VirtualProtect(pOrigFunc, pageSize, oldProtect, &oldProtect);
    }
    else {
        printf("[-] VirtualProtect failed: %d\n", GetLastError());
    }
}

VirtualAllocEx patch

The InstallMyHook function called when the DLL is loaded allows you to install a hook on VirtualAllocEx using the PlaceHook function. As a reminder, this function will replace the 5 first bytes of the hook function just after saving them in the originalBytes variable.
Here, the replacement function will be HookedVirtualAllocEx :

void InstallMyHook() {
    getPageSize();
    TrueVirtualAllocEx = (VirtualAllocExType)GetProcAddress(GetModuleHandle(TEXT("kernel32.dll")), "VirtualAllocEx");
    if (!TrueVirtualAllocEx) {
        printf("[-] Failed to get address of VirtualAllocEx\n");
        return;
    }
    PlaceHook((BYTE*)TrueVirtualAllocEx, (BYTE*)originalBytes, (DWORD_PTR)HookedVirtualAllocEx);
}

Hook observation for 32-bit version:

Return address patch

As a reminder, the HookedVirtualAllocEx function has 4 objectives, which are as follows:

- Suspend the process
- Restore VirtualAllocEx to its original state
- Retrieve a pointer to the return address and place a hook on it
- Execute the real VirtualAllocEx so as not to alter the programme's initial request

LPVOID WINAPI HookedVirtualAllocEx(
    HANDLE hProcess,
    LPVOID lpAddress,
    SIZE_T dwSize,
    DWORD flAllocationType,
    DWORD flProtect
) {
    HANDLE hCurrentProc = GetCurrentProcess();
    suspend(hCurrentProc);
    RestoreOriginalFunction((BYTE*)TrueVirtualAllocEx, (BYTE*)originalBytes);

    void* retAddr = _ReturnAddress();
    returnAddress = retAddr;
    PlaceHook((BYTE*)retAddr, (BYTE*)originalRetAddr, (DWORD_PTR)retHooked);

    CloseHandle(hCurrentProc);
    LPVOID result = TrueVirtualAllocEx(hProcess, lpAddress, dwSize, flAllocationType, flProtect);
    return result;
}

In the suspend function, there is simply a call to the NtSuspendProcess function to pause the process. Restoring VirtualAllocEx is done by calling the RestoreOriginalFunction function.

The return address, contained in _ReturnAddress(), is used to return to this location:

.text:00E8105B call    ds:GetCurrentProcess
.text:00E81061 mov     esi, ds:VirtualAllocEx
.text:00E81067 push    eax                     
.text:00E81068 call    esi ; VirtualAllocEx      
.text:00E8106A push    offset Format           ; <--- ReturnAddress

This return address is placed in returnAddress and retAddr which will be hooked before calling the VirtualAllocEx function to satisfy the initial request.
This hook will modify the program starting at address 0x00E8106A to replace the first bytes with a jump in the retHooked function:

void __declspec(naked) retHooked() {
    __asm {
        pushad                  // Saves the 8 registers in the stack (EAX, EBX...)
        pushfd                  // Saves status indicator flags in the stack
        call handleRetHooked    // This function contains the execution of NtSuspendProcess
        popfd                   // Restore status indicator flags
        popad                   // Restore the 8 registers in the stack
        jmp returnAddress       // Jump to address 0x00E8106A (original program flow)
    }
}

After calling the handleRetHooked function, which contains a simple NtSuspendProcess, a jump is made to resume the execution flow of the original program.

Return address hook observation for 32-bit version:

For the 64-bit program, there are a few changes. The PlaceHookLongJmp and RestoreOriginalFunctionLongJmp functions are added, they work in a similar way to PlaceHook and RestoreOriginalFunction, except that the number of bytes patched is 12 instead of 5. Here, an absolute jump is applied instead of a relative jump:

memcpy(origBytes, pOrigFunc, 12);

BYTE jump[12];
jump[0] = 0x48;  // Opcode REX.W to force the use of RDX instead of EDX
jump[1] = 0xBA;  // Opcode for MOV RDX, imm64
*((void**)(jump + 2)) = pHookedFunc;  // Loading the 64-bit address of retHooked into RDX
jump[10] = 0xFF;  // Opcode for JMP RDX
jump[11] = 0xE2;  // Opcode for JMP RDX

memcpy(pOrigFunc, jump, 12);

Adding the REX.W prefix is mandatory to force the use of the RDX register, otherwise the EDX register is used instead.

Observation without prefix and then with prefix for the 64-bit version:

Detections

Following our detailed analysis of the API hooking (inline hook type), it is relevant to examine how this mechanism can be detected. As this is an inline hook and not an IAT hook, it is not possible simply to identify the hook via the export table on the basis of addresses. For example, comparing the export table of a modified kernel32.dll with that of an unmodified kernel32.dll will not reveal any difference. However, by analysing the bytes of each exported function, we can check whether it begins with the byte 0xe9, which indicates a relative jump, or with a sequence of values enabling us to identify this type of instruction:

mov REG, 0x12345678
jmp REG

As part of a forensic analysis, the same approach can be applied using two methods: live analysis and memory analysis.

For a live analysis, we can examine the list of modules currently loaded in memory by the process and look for any module that stands out from the others (by its path, name, etc.). It is also useful to check whether this module is loaded in other processes to determine whether it is infecting all the processes on the system.

For a memory analysis, we chose to use MemProcFs on a memory dump generated with FTK Imager. MemProcFs is a very powerful tool that can parse various artefacts, which saves us time in recovering the list of active processes, as well as their loaded modules. This list includes all the module folders with their information, including the DLLs reconstituted from the elements extracted from memory under the name pefile.dll.

At this stage, we could simply open the DLL in a disassembler and look for an exported function that has been modified by a hook, but this task can be quite laborious. To speed up the process, we've created a Python script:

import pefile

def isNearJmp(first_bytes):
    if first_bytes[:1] == b'\xe9':
        return True, int.from_bytes(first_bytes[1:5], "little", signed="True"), None
    else:
        return False, None, None
    
def isAbsoluteJmp(first_bytes):
    instructions32 = {
        b'\xb8':'mov eax',
        b'\xbb':'mov ebx',
        b'\xb9':'mov ecx',
        b'\xba':'mov edx',
    }
    
    instructions64 = {
        b'\x48\xb8':'mov rax',
        b'\x48\xbb':'mov rbx',
        b'\x48\xb9':'mov rcx',
        b'\x48\xba':'mov rdx',
    }
    
    jmp32 = {
       b'\xff\xe0':'jmp eax',
       b'\xff\xe3':'jmp ebx',
       b'\xff\xe1':'jmp ecx',
       b'\xff\xe2':'jmp edx'
    }
    
    jmp64 = {
       b'\xff\xe0':'jmp rax',
       b'\xff\xe3':'jmp rbx',
       b'\xff\xe1':'jmp rcx',
       b'\xff\xe2':'jmp rdx'
    }
    
    # 32-bit mode
    if ( instructions32.get(first_bytes[:1]) ) and ( jmp32.get(first_bytes[5:7]) ):
        return True, int.from_bytes(first_bytes[1:5], "little", signed="True"), f'{instructions32[first_bytes[:1]]}::{jmp32[first_bytes[5:7]]}', True
    # 64-bit mode
    elif ( instructions64.get(first_bytes[:2]) ) and ( jmp64.get(first_bytes[9:11]) ):
        return True, int.from_bytes(first_bytes[2:9], "little", signed="True"), f'{instructions32[first_bytes[:2]]}::{jmp32[first_bytes[9:11]]}', False
    else:
        return False, None, None, None

def isAddressValid(pe, hookFuncAddr, base_address, func_name):
    address = base_address + hookFuncAddr
    for section in pe.sections:
        section_start = base_address + section.VirtualAddress
        section_end = base_address + section_start + section.Misc_VirtualSize
        if section_start <= address < section_end:
            return True 
    return False

def process(cDLL, num_bytes):
    pe = pefile.PE(cDLL)
    if not hasattr(pe, 'DIRECTORY_ENTRY_EXPORT'):
        print(f'{cDLL} contains no export table.')
        return
    
    base_address = pe.OPTIONAL_HEADER.ImageBase
    exports = pe.DIRECTORY_ENTRY_EXPORT.symbols
    mask = 0
    for exp in exports:
        if exp.name:
            func_name = exp.name.decode()
            file_offset = pe.get_offset_from_rva(exp.address)
            with open(cDLL, 'rb') as dll_file:
                dll_file.seek(file_offset)
                first_bytes = dll_file.read(num_bytes)
                
            isNear, hookFuncAddr, __ = isNearJmp(first_bytes)
            isAbsolute, hookFuncAddrAbs, instructions, isWow64 = isAbsoluteJmp(first_bytes)
            if isWow64:
                mask = 0xffffffff
            else:
                mask = 0xffffffffffffffff
                
            if isNear:
                if not isAddressValid(pe, file_offset+hookFuncAddr, base_address, func_name):
                    print(f'[+] {func_name} :\n\tjmp {hex(hookFuncAddr & 0xffffffff)}\n')
            elif isAbsolute:
                if not isAddressValid(pe, file_offset+hookFuncAddrAbs, base_address, func_name):
                    instruction1, instruction2 = instructions.split('::')
                    print(f'[+] {func_name} :\n\t{instruction1}, {hex(hookFuncAddrAbs & mask)}\n\t{instruction2}\n')          

if __name__ == '__main__':
    cDLL = 'pefile.dll'
    num_bytes = 12
    process(cDLL, num_bytes)

We have defined the isAddressValid function to avoid false positives as far as possible because it's possible to find a legitimate relative jump in an exported function, but it should jump into a valid adress. If this is not the case, it means that the jump is made to another module, that's why we looked if the adress was within the range of a section.

Results:

To test all the functions of all the modules loaded in each captured process, we can adjust the script and run it targeting the MemProcFs mount point. Here are the elements added to the script:

def checkHookInFunc(cDLL, num_bytes, cProcess, cModule):
    countHook = 0
    try:
        pe = pefile.PE(cDLL)
    except pefile.PEFormatError:
        return countHook
    if not hasattr(pe, 'DIRECTORY_ENTRY_EXPORT'):
        return countHook
    
    base_address = pe.OPTIONAL_HEADER.ImageBase
    exports = pe.DIRECTORY_ENTRY_EXPORT.symbols
    mask = 0
    for exp in exports:
        if exp.name:
            func_name = exp.name.decode()
            if func_name in ['GetFileBandwidthReservation', '_mbscpy_s', '_spawnve', '_wexeclpe', 'NtUserAllowForegroundActivation', 'NtUserEnablePerMonitorMenuScaling', 'NtUserIsQueueAttached', 'NtUserYieldTask', 'CPNameUtil_ConvertToRoot', 'g_module_open_utf8']: # too many false positives, sounds for each process
                continue
            
            try:
                file_offset = pe.get_offset_from_rva(exp.address)
            except pefile.PEFormatError:
                continue
            
            with open(cDLL, 'rb') as dll_file:
                dll_file.seek(file_offset)
                first_bytes = dll_file.read(num_bytes)
                
            isNear, hookFuncAddr, __ = isNearJmp(first_bytes)
            isAbsolute, hookFuncAddrAbs, instructions, isWow64 = isAbsoluteJmp(first_bytes)
            if isWow64:
                mask = 0xffffffff
            else:
                mask = 0xffffffffffffffff
                
            if isNear:
                if not isAddressValid(pe, file_offset+hookFuncAddr, base_address, func_name):
                    countHook += 1
                    print(f'[+] {cProcess}::{cModule}::{func_name} :\n\tjmp {hex(hookFuncAddr & 0xffffffff)}\n')
            elif isAbsolute:
                if not isAddressValid(pe, file_offset+hookFuncAddrAbs, base_address, func_name):
                    countHook += 1
                    instruction1, instruction2 = instructions.split('::')
                    print(f'[+] {cProcess}::{cModule}::{func_name}  :\n\t{instruction1}, {hex(hookFuncAddrAbs & mask)}\n\t{instruction2}\n')
                    
    return countHook
        
        
def process(MemProcFsPath, num_bytes):
    MemProcFsPath += '\\name\\'
    ProcessList = os.listdir(MemProcFsPath)
    for cProcess in ProcessList:
        if cProcess in ['System-4']:
            continue
        ModulesList = os.listdir(MemProcFsPath + cProcess + '\\modules\\')
        totalCountHook = 0
        for cModule in ModulesList:
            if ".dll" not in cModule.lower():
                continue
            cDLL = MemProcFsPath + cProcess + '\\modules\\' + cModule + '\\pefile.dll'
            countHook = checkHookInFunc(cDLL, num_bytes, cProcess, cModule)
            totalCountHook += countHook
        if totalCountHook == 0:
            print(f'No hook found on {cProcess.split("-")[0]}::{cProcess.split("-")[1]} process')
        else:
            print(f'number of hook -> {totalCountHook}')


if __name__ == '__main__':
    MemProcFsPath = sys.argv[1] if len(sys.argv) > 1 else 'M:'
    num_bytes = 12
    process(MemProcFsPath, num_bytes)

Results:

Conclusion

In this study, we explored the API hooking technique, an effective method for intercepting and controlling function calls within the system. By adopting a hybrid solution combining a kernel-space driver and user-space injectors, we demonstrated how to monitor and manipulate the execution of processes discreetly, without altering their normal behaviour.

This technique is used both by security solutions, such as EDRs, and by malware seeking to hide itself, which underlines the importance of knowing how to detect it. With tools like MemProcFs to analyze memory captures, or a debugger and tools like Process Hacker to identify loaded modules, it becomes possible to spot the signs of such manipulation. This research highlights the need for vigilance during investigations, where every function can potentially be the target of a hook.