Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fileextractor: linux edition #1788

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

delvinru
Copy link
Contributor

As @skvl mentioned in #1787, adding the fileextractor implementation for linux.

Key features:

  • A different approach from windows is used. The plugin extracts data directly from the ext4 file system.
  • Only modified files are extracted, thanks a lot for the help @alex-pentagrid

There are three main events when a file is extracted from the system:

  • The file was closed;
  • The file was deleted;
  • Used mmap;

It is worth considering that if the file was not somehow changed during the analysis, then it will not be extracted to avoid extracting a lot of system files.

In order to avoid caching files at the kernel level, the do_sys_openat2 call is intercepted and special flags are set.

Working with the ext4 file system is located in a separate library specifically in order to be able to add support for other file systems, if necessary.

@drakvuf-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

src/libfs/ext4/ext4.cpp Fixed Show fixed Hide fixed
@tklengyel
Copy link
Owner

@drakvuf-jenkins Test this please

@delvinru
Copy link
Contributor Author

There is an error on ci:

VMI_ERROR: VMI_ERROR: xen_read_disk: vbd is inactive or error occured

It's hard to say what exactly this error is related to, but I'll assume:

  • The guest OS is not configured with the ext4 file system (although it is unlikely);
  • The disk is configured using lvcreate, and not as a qcow2 image, perhaps because of this. The plugin was tested with the qcow2 format;
  • For some reason, libvmi says that the disk as a whole is not active, which is strange;

@tklengyel
Copy link
Owner

@drakvuf-jenkins Test this please

@tklengyel
Copy link
Owner

So there is some regression with this PR on the Linux side. The CI notes two things. First, starting with this change enabled on debian stretch results in CPU utilization average: 100. That likely means there is some type of infinite loop being hit. Second, subsequent startups with the plugin enabled fail with

VMI_ERROR: VMI_ERROR: xen_read_disk: vbd is inactive or error occured
[FILEEXTRACTOR] failed to read struct from disk

@tklengyel
Copy link
Owner

@drakvuf-jenkins Test this please

@delvinru
Copy link
Contributor Author

delvinru commented May 2, 2024

Okay, let's sort out the errors line by line.

Log message:

1714483235.430306 [libfs] devices_ids[0]=51712
1714483235.430332 [libfs] detect filesystem start
VMI_ERROR: VMI_ERROR: xen_read_disk: vbd is inactive or error occured
1714483235.430422 [FILEEXTRACTOR] failed to read struct from disk
Plugin fileextractor startup failed!
  1. 1714483235.430306 [libfs] devices_ids[0]=51712
    We print all available discs:
    /* by default use first device_id */
    device_id = std::string(devices_ids[0]);
    for (uint32_t i = 0; i < number_of_disks; i++)
    {
        PRINT_ERROR("[libfs] devices_ids[%d]=%s\n", i, devices_ids[i]);
        free(devices_ids[i]);
    }

Everything is ok here, the program receives one disk and prints its ID.

  1. 1714483235.430332 [libfs] detect filesystem start
    The program says it is starting to detect the file system, also ok.
bool BaseFilesystem::detect_filesystem_start()
{
    PRINT_ERROR("[libfs] detect filesystem start\n");
    if (drakvuf_get_os_type(drakvuf_) != VMI_OS_LINUX)
        return false;
    ...
}
  1. VMI_ERROR: VMI_ERROR: xen_read_disk: vbd is inactive or error occured
bool BaseFilesystem::detect_filesystem_start()
{
    PRINT_ERROR("[libfs] detect filesystem start\n");
    if (drakvuf_get_os_type(drakvuf_) != VMI_OS_LINUX)
        return false;

    auto mbr = get_struct_from_disk<mbr_t>(ZERO_OFFSET);
    PRINT_ERROR("[libfs] read mbr from disk successfully\n");
    ...
}

The program tries to read data from the disk, but gets an error because we don't see the following message [libfs] read mbr from disk successfully.

Let's look at the get_struct_from_disk function.

    status_t get_raw_from_disk(size_t offset, size_t count, void* buffer)
    {
        auto vmi = vmi_lock_guard(drakvuf_);
        return vmi_read_disk(vmi, device_id.c_str(), offset, count, buffer);
    }

    template <typename T>
    std::unique_ptr<T> get_struct_from_disk(size_t offset)
    {
        std::vector<uint8_t> buffer(sizeof(T));

        if (VMI_FAILURE == get_raw_from_disk(offset, sizeof(T), buffer.data()))
        {
            PRINT_ERROR("[FILEEXTRACTOR] failed to read struct from disk\n");
            throw -1;
        }

        return std::make_unique<T>(*reinterpret_cast<T*>(buffer.data()));
    }

A simple chain of calls is going on get_struct_from_disk -> get_raw_from_disk -> vmi_read_disk. In the error message, we receive information from libvmi that it cannot read the disk.

  1. 1714483235.430422 [FILEEXTRACTOR] failed to read struct from disk
    And a message from get_struct_from_disk that the read was not successful.

So, I think that the error is not specifically in my code, because it does not even have time to start working properly, but something with libvmi and the disk structure on ci. Can you tell me what format the disk was created in?

@tklengyel
Copy link
Owner

Number  Start   End     Size    Type      File system     Flags
 1      1049kB  20.3GB  20.3GB  primary   ext4
 2      20.3GB  21.5GB  1197MB  extended
 5      20.3GB  21.5GB  1197MB  logical   linux-swap(v1)

@tklengyel
Copy link
Owner

This is what I see in Xenstore:

xenstore-read /local/domain/36/device/vbd/51712/state
4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants