Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making wayrs-client generic over different wayland transports #8

Open
madushan1000 opened this issue Feb 25, 2024 · 64 comments
Open

Making wayrs-client generic over different wayland transports #8

madushan1000 opened this issue Feb 25, 2024 · 64 comments

Comments

@madushan1000
Copy link

Currently wayrs-client crate only supports unix socket as the transport for wayland protocol.

Right now, I'm trying to modify the crate to be generic over the transport so I can use it with virtio-gpu cross-domain protocol(used to allow programs running on crosvm/qemu guests to open windows in host wayland compositor).

Are you interested in merging the changes here? It will break the public crate api, because things like Connection struct will have to be generic over the transport.

@MaxVerevkin
Copy link
Owner

Will it be a breaking change if the new generic has a default, like the allocator in Vec<T, A=Global>?

@madushan1000
Copy link
Author

Oh I didn't realize that was a thing, I'll try to set a default and see if I can avoid breaking the public api, I'll make a PR when I'm done. Thank you :)

@madushan1000
Copy link
Author

@MaxVerevkin do you have any suggestions on how to handle async_fd field in Connection struct? I'm thinking of moving it to BufferedSocket struct and adding helper functions to manipulate it.

@MaxVerevkin
Copy link
Owner

@MaxVerevkin do you have any suggestions on how to handle async_fd field in Connection struct? I'm thinking of moving it to BufferedSocket struct and adding helper functions to manipulate it.

Hard to say since I don't know which problems you've encountered, and how the alternative transport you are working on works.

@madushan1000
Copy link
Author

The biggest problem is the virtio protocol won't use a file descriptor, it works by basically opening /dev/dri/card0 drm device and then doing a bunch of IOCTLs and memcpys to copy wayland packets back and forth.
I looked into implementing file descriptor related traits for this but that looks like a lot of work. If there is no other way, I'll end up doing it anyway.

@MaxVerevkin
Copy link
Owner

How do you know then a new message is available to be read?

You can open a draft PR so I can have some idea how wayland over virtio works, since I've never used it.

@MaxVerevkin
Copy link
Owner

Also, may I ask what you are planning to do with this? Are you writing a forwarding compositor? Just curious.

@madushan1000
Copy link
Author

There is an overview of the protocol here. To receive you have to do a blocking IOCTL and poll the channel, see the receiving data section.

I didn't actually start writing the virtio stuff yet because I couldn't settle on a proper abstraction for transport. I was doing something like this, but looks like that's not going to be enough because of the async functions.

pub trait WaylandSocket: AsRawFd {
    fn connect() -> Result<Self, ConnectError>
    where
        Self: Sized;
    fn get_free_msg_args_as_ref(&mut self) -> &mut Vec<Vec<ArgValue>>;
    fn flush(&mut self, mode: IoMode) -> io::Result<()>;
    fn peek_message_header(&mut self, mode: IoMode) -> io::Result<MessageHeader>;
    fn write_message(&mut self, msg: Message, mode: IoMode) -> Result<(), SendMessageError>;
    fn recv_message(
        &mut self,
        header: MessageHeader,
        iface: &'static Interface,
        version: u32,
        mode: IoMode,
    ) -> io::Result<Message>;
}

impl WaylandSocket for BufferedSocket {
    fn connect() -> Result<Self, ConnectError>
    where
        Self: Sized,
    {
        Self::connect()
    }

    fn get_free_msg_args_as_ref(&mut self) -> &mut Vec<Vec<ArgValue>> {
        &mut self.free_msg_args
    }

    fn peek_message_header(&mut self, mode: IoMode) -> io::Result<MessageHeader> {
        self.peek_message_header(mode)
    }

    fn recv_message(
        &mut self,
        header: MessageHeader,
        iface: &'static Interface,
        version: u32,
        mode: IoMode,
    ) -> io::Result<Message> {
        self.recv_message(header, iface, version, mode)
    }

    fn write_message(&mut self, msg: Message, mode: IoMode) -> Result<(), SendMessageError> {
        self.write_message(msg, mode)
    }

    fn flush(&mut self, mode: IoMode) -> io::Result<()> {
        self.flush(mode)
    }
}

And yes, I'm trying to write a forwarding compositor for android. Something like waydroid but running in a vm.

@MaxVerevkin
Copy link
Owner

MaxVerevkin commented Feb 27, 2024

Thanks for the link!

To receive you have to do a blocking IOCTL and poll the channel, see the receiving data section.

The ioctl is not blocking, the read is, right? It doesn't explicitly state this, but I would be very surprised if the fd you are supposed to read 8 bytes from is not poll()-able. If this is true everything is good, and Connection can use that fd for the async stuff.

And yes, I'm trying to write a forwarding compositor for android. Something like waydroid but running in a vm.

Cool! Since you will also write server-side wayland code, do you think it would be worth it to make BufferedSocket generic enough to be used on the server side too? I've written one wayland server (https://github.com/MaxVerevkin/ewc), but I basically copy-pasted the socket code, adapting it as needed.

By the way, I don't mind a v2 release. For example I wanted to slightly change the Proxy trait so that I could reuse argument buffers from events too. And maybe change some of ArgValue variants.

@madushan1000
Copy link
Author

madushan1000 commented Feb 27, 2024

The ioctl is not blocking, the read is, right?

Oh yes, that's true. I will try to actually implement the cross domain protocol using the device fd and see how far I can get, thanks for the advice :)

do you think it would be worth it to make BufferedSocket generic enough to be used on the server side too?

I think this would be useful in general, but not for what I'm doing. My compositor will accept AIDl calls from android system compositor(surfaceflinger) and basically translate them back and forth into wayland calls.

@MaxVerevkin
Copy link
Owner

pub trait WaylandSocket: AsRawFd {
    fn connect() -> Result<Self, ConnectError>
    where
        Self: Sized;
    fn get_free_msg_args_as_ref(&mut self) -> &mut Vec<Vec<ArgValue>>;
    fn flush(&mut self, mode: IoMode) -> io::Result<()>;
    fn peek_message_header(&mut self, mode: IoMode) -> io::Result<MessageHeader>;
    fn write_message(&mut self, msg: Message, mode: IoMode) -> Result<(), SendMessageError>;
    fn recv_message(
        &mut self,
        header: MessageHeader,
        iface: &'static Interface,
        version: u32,
        mode: IoMode,
    ) -> io::Result<Message>;
}

is it required to send messages one-by-one? If no (from a quick look at sommelier I couldn't find any evidence that it is), I think an API with just sendmsg/recvmsg would be better and avoid a lot of duplication.

@madushan1000
Copy link
Author

I think that makes sense, I will refactor once I have a working example.

@MaxVerevkin
Copy link
Owner

I went ahead and made the low-level BufferedSocket generic over

/// An abstraction over Wayland transport methods
pub trait Transport: AsRawFd {
fn send(
&mut self,
bytes: &[IoSlice],
fds: &VecDeque<OwnedFd>,
mode: IoMode,
) -> io::Result<usize>;
fn recv(
&mut self,
bytes: &mut [IoSliceMut],
fds: &mut VecDeque<OwnedFd>,
mode: IoMode,
) -> io::Result<usize>;
}

Connection is not generic yet, though.

@madushan1000
Copy link
Author

Nice, I will rebase my work on this :D I have something slimier locally now. for connect, I just added a connect method to the Tansport trait and called implementation specific connect from BufferedSocket::connect

@MaxVerevkin
Copy link
Owner

I'm not sure about connect being part of Transport, since it may be used by both clients and servers, and connect doesn't make sense for servers. Perhaps we can define a supertrait in wayrs-client, like trait ClientTransport: Transport { fn connect() -> Result<Self> }, which is specific to clients?

@madushan1000
Copy link
Author

yeah I think that will work fine.

@madushan1000
Copy link
Author

madushan1000 commented Mar 4, 2024

I ran into another problem, while getting the example to work. virtio protocol requires clients use dma-buf. virtio-gpu mesa driver only supports egl 1.4, I'm not exactly sure why because the iris driver supports egl 1.5. And wayrs-egl requires egl 1.5. So wayrs egl clients doesn't work on guests properly. I'm trying to get wayrs-egl running on egl 1.4 but I couldn't wrap my head around all the egl stuff yet.

I think I can just use shm and copy the data to a dma-buf buffer. but I would like to avoid that if possible.

@MaxVerevkin
Copy link
Owner

Lowering the required EGL version should be possible, I've chosen 1.5 because it was the latest but at the same time pretty old (10 years old?). The important part is having the required extensions. https://docs.rs/wayrs-egl/latest/wayrs_egl/ has links to all used extensions.

@madushan1000
Copy link
Author

Am I interpriting this correctly? looks like 2 extensions(EGL_KHR_platform_gbm, and GL_OES_EGL_image are missing from virtio-gpu driver.

XDG_RUNTIME_DIR=/run/user/1000/ WAYLAND_DISPLAY=wayland-0 eglinfo
EGL client extensions string:
    EGL_EXT_client_extensions, EGL_EXT_device_base,
    EGL_EXT_device_enumeration, EGL_EXT_device_query, EGL_EXT_platform_base,
    EGL_EXT_platform_device, EGL_EXT_platform_wayland, EGL_EXT_platform_x11,
    EGL_EXT_platform_xcb, EGL_KHR_client_get_all_proc_addresses,
    EGL_KHR_debug, EGL_KHR_platform_gbm, EGL_KHR_platform_wayland,
    EGL_KHR_platform_x11, EGL_MESA_platform_gbm,
    EGL_MESA_platform_surfaceless

GBM platform:
EGL API version: 1.4
EGL vendor string: Mesa Project
EGL version string: 1.4
EGL client APIs: OpenGL OpenGL_ES
EGL driver name: virtio_gpu
EGL extensions string:
    EGL_ANDROID_blob_cache, EGL_ANDROID_native_fence_sync, EGL_EXT_buffer_age,
    EGL_EXT_image_dma_buf_import, EGL_EXT_image_dma_buf_import_modifiers,
    EGL_EXT_pixel_format_float, EGL_KHR_cl_event2, EGL_KHR_config_attribs,
    EGL_KHR_context_flush_control, EGL_KHR_create_context,
    EGL_KHR_create_context_no_error, EGL_KHR_fence_sync,
    EGL_KHR_get_all_proc_addresses, EGL_KHR_gl_colorspace,
    EGL_KHR_gl_renderbuffer_image, EGL_KHR_gl_texture_2D_image,
    EGL_KHR_gl_texture_3D_image, EGL_KHR_gl_texture_cubemap_image,
    EGL_KHR_image, EGL_KHR_image_base, EGL_KHR_image_pixmap,
    EGL_KHR_no_config_context, EGL_KHR_reusable_sync,
    EGL_KHR_surfaceless_context, EGL_KHR_wait_sync,
    EGL_MESA_configless_context, EGL_MESA_drm_image,
    EGL_MESA_image_dma_buf_export, EGL_MESA_query_driver,
    EGL_WL_bind_wayland_display

@MaxVerevkin
Copy link
Owner

MaxVerevkin commented Mar 5, 2024

No. EGL_KHR_platform_gbm is listed in the snippet you provided. And GL_OES_EGL_image should be listed under "OpenGL core profile extensions", not "EGL extensions string". Just grep it.

@madushan1000
Copy link
Author

Okay, so all the extensions are there. this is the error I got when I just set the egl version to 1.4. looks like the format table is just empty.

thread 'main' panicked at wayrs-egl/examples/triangle.rs:403:22:
at least one supported format
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@MaxVerevkin
Copy link
Owner

Can you show the output with WAYLAND_DEBUG=1?

@madushan1000
Copy link
Author

madushan1000 commented Mar 5, 2024

$ XDG_RUNTIME_DIR=/run/user/1000/ WAYLAND_DISPLAY=wayland-0 WAYLAND_DEBUG=1 ./triangle
[wayrs]  -> wl_display@1v1.get_registry(new id wl_registry@2)
[wayrs]  -> wl_display@1v1.sync(new id wl_callback@3)
[wayrs] wl_registry@2v1.global(1, "wl_compositor", 5)
[wayrs] wl_registry@2v1.global(2, "wl_drm", 2)
[wayrs] wl_registry@2v1.global(3, "wl_shm", 1)
[wayrs] wl_registry@2v1.global(4, "wl_output", 4)
[wayrs] wl_registry@2v1.global(5, "zxdg_output_manager_v1", 3)
[wayrs] wl_registry@2v1.global(6, "wl_data_device_manager", 3)
[wayrs] wl_registry@2v1.global(7, "zwp_primary_selection_device_manager_v1", 1)
[wayrs] wl_registry@2v1.global(8, "wl_subcompositor", 1)
[wayrs] wl_registry@2v1.global(9, "xdg_wm_base", 6)
[wayrs] wl_registry@2v1.global(10, "gtk_shell1", 5)
[wayrs] wl_registry@2v1.global(11, "wp_viewporter", 1)
[wayrs] wl_registry@2v1.global(12, "wp_fractional_scale_manager_v1", 1)
[wayrs] wl_registry@2v1.global(13, "zwp_pointer_gestures_v1", 3)
[wayrs] wl_registry@2v1.global(14, "zwp_tablet_manager_v2", 1)
[wayrs] wl_registry@2v1.global(15, "wl_seat", 8)
[wayrs] wl_registry@2v1.global(16, "zwp_relative_pointer_manager_v1", 1)
[wayrs] wl_registry@2v1.global(17, "zwp_pointer_constraints_v1", 1)
[wayrs] wl_registry@2v1.global(18, "zxdg_exporter_v2", 1)
[wayrs] wl_registry@2v1.global(19, "zxdg_importer_v2", 1)
[wayrs] wl_registry@2v1.global(20, "zxdg_exporter_v1", 1)
[wayrs] wl_registry@2v1.global(21, "zxdg_importer_v1", 1)
[wayrs] wl_registry@2v1.global(22, "zwp_linux_dmabuf_v1", 4)
[wayrs] wl_registry@2v1.global(23, "wp_single_pixel_buffer_manager_v1", 1)
[wayrs] wl_registry@2v1.global(24, "zwp_keyboard_shortcuts_inhibit_manager_v1", 1)
[wayrs] wl_registry@2v1.global(25, "zwp_text_input_manager_v3", 1)
[wayrs] wl_registry@2v1.global(26, "wp_presentation", 1)
[wayrs] wl_registry@2v1.global(27, "xdg_activation_v1", 1)
[wayrs] wl_registry@2v1.global(28, "zwp_idle_inhibit_manager_v1", 1)
[wayrs] wl_callback@3v1.done(160)
[wayrs]  -> wl_registry@2v1.bind(22, new id 4@zwp_linux_dmabuf_v1v4)
[wayrs]  -> wl_registry@2v1.bind(1, new id 5@wl_compositorv5)
[wayrs]  -> wl_registry@2v1.bind(9, new id 6@xdg_wm_basev4)
[wayrs]  -> wl_compositor@5v5.create_surface(new id wl_surface@7)
[wayrs]  -> zwp_linux_dmabuf_v1@4v4.get_surface_feedback(new id zwp_linux_dmabuf_feedback_v1@8, 7)
[wayrs]  -> xdg_wm_base@6v4.get_xdg_surface(new id xdg_surface@9, 7)
[wayrs]  -> xdg_surface@9v4.get_toplevel(new id xdg_toplevel@10)
[wayrs]  -> wl_surface@7v5.set_buffer_transform(6)
[wayrs]  -> xdg_toplevel@10v4.set_app_id("wayrs-egl")
[wayrs]  -> xdg_toplevel@10v4.set_title("TITLE")
[wayrs]  -> wl_surface@7v5.commit()
[wayrs] wl_display@1v1.delete_id(3)
[wayrs] zwp_linux_dmabuf_feedback_v1@8v4.format_table(fd 4, 240)
[wayrs] zwp_linux_dmabuf_feedback_v1@8v4.main_device(<array>)
[wayrs] zwp_linux_dmabuf_feedback_v1@8v4.tranche_target_device(<array>)
[wayrs] zwp_linux_dmabuf_feedback_v1@8v4.tranche_flags(0)
[wayrs] zwp_linux_dmabuf_feedback_v1@8v4.tranche_formats(<array>)
[wayrs] zwp_linux_dmabuf_feedback_v1@8v4.tranche_done()
[wayrs] zwp_linux_dmabuf_feedback_v1@8v4.done()
[wayrs] xdg_toplevel@10v4.configure_bounds(1920, 1048)
[wayrs] xdg_toplevel@10v4.configure(0, 0, <array>)
[wayrs] xdg_surface@9v4.configure(10)
thread 'main' panicked at wayrs-egl/examples/triangle.rs:403:22:
at least one supported format
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

btw this is just running in a vm guest using the compositor in the guest. not using the virto wayland protocol

@MaxVerevkin
Copy link
Owner

I've added some debug prints in d2d70de, could you try it?

@madushan1000
Copy link
Author

$XDG_RUNTIME_DIR=/run/user/1000/ WAYLAND_DISPLAY=wayland-0 ./triangle
DRM render node: /dev/dri/renderD128
Feedback tranche contains 15 formats, 0 of them supported by EGL
thread 'main' panicked at wayrs-egl/examples/triangle.rs:410:22:
at least one supported format
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@MaxVerevkin
Copy link
Owner

That's weird. What what does this print?

diff --git a/wayrs-egl/examples/triangle.rs b/wayrs-egl/examples/triangle.rs
index 367f209..6e9bfc3 100644
--- a/wayrs-egl/examples/triangle.rs
+++ b/wayrs-egl/examples/triangle.rs
@@ -370,6 +370,8 @@ impl DmabufFeedbackHandler for State {
 
         let egl_display = EglDisplay::new(self.linux_dmabuf, render_node).unwrap();
 
+        dbg!(egl_display.supported_formats());
+
         let format_table = self.surf.dmabuf_feedback.format_table();
         let mut formats = HashMap::<Fourcc, Vec<u64>>::new();
 

@madushan1000
Copy link
Author

$ XDG_RUNTIME_DIR=/run/user/1000/ WAYLAND_DISPLAY=wayland-0 ./triangle
DRM render node: /dev/dri/renderD128
[wayrs-egl/examples/triangle.rs:372:9] egl_display.supported_formats() = {
    XB48: [],
    AB4H: [],
    R16 : [],
    XR30: [],
    AR30: [],
    RG16: [],
    GR32: [],
    AB30: [],
    AB48: [],
    XB24: [],
    XR24: [],
    GR88: [],
    AB24: [],
    R8  : [],
    AR24: [],
    XB30: [],
    XB4H: [],
}
Feedback tranche contains 15 formats, 0 of them supported by EGL
thread 'main' panicked at wayrs-egl/examples/triangle.rs:411:22:
at least one supported format
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@MaxVerevkin
Copy link
Owner

Huh, is zero modifier implicitly sported? Does this work? Not sure how correct that is, though.

diff --git a/wayrs-egl/src/egl.rs b/wayrs-egl/src/egl.rs
index 1d17db4..07dfede 100644
--- a/wayrs-egl/src/egl.rs
+++ b/wayrs-egl/src/egl.rs
@@ -243,6 +243,10 @@ unsafe fn get_supported_formats(
         }
         unsafe { mods_buf.set_len(mods_len as usize) };
 
+        if mods_buf.is_empty() {
+            mods_buf.push(0);
+        }
+
         retval.insert(Fourcc(format as u32), mods_buf);
     }
 

@madushan1000
Copy link
Author

still similar issue, I wonder if this is a crosvm issue, I'll try running it in qemu too.

$ XDG_RUNTIME_DIR=/run/user/1000/ WAYLAND_DISPLAY=wayland-0 ./triangle
DRM render node: /dev/dri/renderD128
[wayrs-egl/examples/triangle.rs:372:9] egl_display.supported_formats() = {
    XB30: [
        0,
    ],
    XB24: [
        0,
    ],
    AB30: [
        0,
    ],
    XB4H: [
        0,
    ],
    AB48: [
        0,
    ],
    XR24: [
        0,
    ],
    RG16: [
        0,
    ],
    GR32: [
        0,
    ],
    AR30: [
        0,
    ],
    R16 : [
        0,
    ],
    AB24: [
        0,
    ],
    XR30: [
        0,
    ],
    R8  : [
        0,
    ],
    AB4H: [
        0,
    ],
    AR24: [
        0,
    ],
    GR88: [
        0,
    ],
    XB48: [
        0,
    ],
}
Feedback tranche contains 15 formats, 0 of them supported by EGL
thread 'main' panicked at wayrs-egl/examples/triangle.rs:411:22:
at least one supported format
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@MaxVerevkin
Copy link
Owner

What does the compositor advertise then?

diff --git a/wayrs-egl/examples/triangle.rs b/wayrs-egl/examples/triangle.rs
index 367f209..a444ab6 100644
--- a/wayrs-egl/examples/triangle.rs
+++ b/wayrs-egl/examples/triangle.rs
@@ -391,6 +391,12 @@ impl DmabufFeedbackHandler for State {
                         .entry(Fourcc(fmt.fourcc))
                         .or_default()
                         .push(fmt.modifier);
+                } else {
+                    eprintln!(
+                        "Format {:?}:{} not supported by EGL",
+                        Fourcc(fmt.fourcc),
+                        fmt.modifier
+                    );
                 }
             }
             eprintln!(

Huh, is zero modifier implicitly sported? Does this work? Not sure how correct that is, though.

Apparently not. Maybe assuming that if no modifiers where given, then all modifiers from the tranche are supported would work? Can you also try:

diff --git a/wayrs-egl/src/egl.rs b/wayrs-egl/src/egl.rs
index 1d17db4..8121a8c 100644
--- a/wayrs-egl/src/egl.rs
+++ b/wayrs-egl/src/egl.rs
@@ -151,7 +151,7 @@ impl EglDisplay {
     /// Check whether a fourcc/modifier pair is supported
     pub fn is_format_supported(&self, fourcc: Fourcc, modifier: u64) -> bool {
         match self.supported_formats.get(&fourcc) {
-            Some(mods) => mods.contains(&modifier),
+            Some(mods) => mods.is_empty() || mods.contains(&modifier),
             None => false,
         }
     }

@madushan1000
Copy link
Author

$ XDG_RUNTIME_DIR=/run/user/1000/ WAYLAND_DISPLAY=wayland-0 eglinfo | grep 'OpenGL ES profile version'
OpenGL ES profile version: OpenGL ES 3.2 Mesa 23.2.1-1ubuntu3.1
OpenGL ES profile version: OpenGL ES 3.2 Mesa 23.2.1-1ubuntu3.1
OpenGL ES profile version: OpenGL ES 3.2 Mesa 23.2.1-1ubuntu3.1
OpenGL ES profile version: OpenGL ES 3.2 Mesa 23.2.1-1ubuntu3.1
OpenGL ES profile version: OpenGL ES 3.2 Mesa 23.2.1-1ubuntu3.1

with the egl-mod-invalid branch

$ XDG_RUNTIME_DIR=/run/user/1000/ WAYLAND_DISPLAY=wayland-0 ./triangle
DRM render node: /dev/dri/renderD128
Feedback tranche contains 15 formats, 11 of them supported by EGL
EGL v1.4
thread 'main' panicked at wayrs-egl/examples/triangle.rs:39:14:
called `Result::unwrap()` on an `Err` value: Egl(BadAttribute)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@MaxVerevkin
Copy link
Owner

Added more prints...

@madushan1000
Copy link
Author

yeah the context creation is the problem

$ XDG_RUNTIME_DIR=/run/user/1000/ WAYLAND_DISPLAY=wayland-0 ./triangle
DRM render node: /dev/dri/renderD128
Feedback tranche contains 15 formats, 11 of them supported by EGL
EGL v1.4
eglCreateContext failed
thread 'main' panicked at wayrs-egl/examples/triangle.rs:39:14:
called `Result::unwrap()` on an `Err` value: Egl(BadAttribute)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@MaxVerevkin
Copy link
Owner

Maybe EGL_CONTEXT_OPENGL_DEBUG is problematic, I've removed it temporarily.

@madushan1000
Copy link
Author

madushan1000 commented Mar 5, 2024

oh yeah, now it works! nice. Thanks :D

$ XDG_RUNTIME_DIR=/run/user/1000/ WAYLAND_DISPLAY=wayland-0 ./triangle
DRM render node: /dev/dri/renderD128
Feedback tranche contains 15 formats, 11 of them supported by EGL
EGL v1.4
OpenGL-ES v3.2
mod is invalid, using gbm_bo_create
mod is invalid, using gbm_bo_create

@MaxVerevkin
Copy link
Owner

Turns out EGL_CONTEXT_OPENGL_DEBUG was introduced only in 1.5. Fixed in #12.

@madushan1000
Copy link
Author

thank you for the fix :)

@madushan1000
Copy link
Author

madushan1000 commented Mar 9, 2024

@MaxVerevkin looks like I have to do a custom implementation of zwp_linux_buffer_params_v1@<>.add function, because I have to set correct stride and modifiers. I can't figure out how to do this for the auto generated functions. Any pointers?

ps: dma-buf metadata differs between virto-gpu driver and host gpu driver, so I have to do some ioctls and figure out the host values and use that in the above call.

@MaxVerevkin
Copy link
Owner

Not sure what your mean, the params::add function accepts the stride and modifiers as arguments.

@madushan1000
Copy link
Author

I want to have two implementations of params::add for params::add<D, UnixStream> and params::add<D, VirtgpuTransport>. So the client doesn't have to know about the implementation details of how to fix the buffer metadata.

@MaxVerevkin
Copy link
Owner

MaxVerevkin commented Mar 9, 2024

The code generated by wayrs-scanner must not have any hidden logic and must represent the protocol as defined in the xml file.

I don't know what exactly you mean by "fix the buffer metadata", I'm not even sure how you allocate buffers. A client using virtio will have to figure out the correct params, sure. And virtio seems to be niche enough to not be explicitly supported by the core project. If everything is generic enough, then a separate crate may implement all the virtio quirks.

@madushan1000
Copy link
Author

The problem I'm having now is when I'm using wayrs-egl the client doesn't directly call ZwpLinuxBufferParamsV1::add. It's called here. I can add a new fix_metadata function to the ClientTransport trait and call it from Buffer::alloc, is that a good solution?

the underlying technical issue is that when a program asks to allocate a dma-buf on virtio-gpu, it doesn't actually allocate anything. It just calls down to virglrenderer running on the host, and virgl allocates a dma-buf on host gpu. so guest metadata and host metadata about the same buffer can be different(ex: intel GPU drivers expect stride to be 16 byte aligned but virtio-gpu driver doesn't). I have to query the correct metadata from the host, and use it in ZwpLinuxBufferParamsV1::add. Otherwise, host compositor refuses to import the buffer.

@MaxVerevkin
Copy link
Owner

MaxVerevkin commented Mar 10, 2024

The problem is that wayrs-egl is very GBM-centric, and if I understand correctly, in your case, you do not use it. Maybe it should be generic over the allocator, GBM being the default.

@madushan1000
Copy link
Author

madushan1000 commented Mar 10, 2024

I think that's fine, allocations still go trough guest gbm, things happening underneath is pretty much hidden from the guest.

I also manged to get the client working by patching metadata checks in host mesa driver. So wayrs-egl is already working fine.

@MaxVerevkin
Copy link
Owner

Okay, I just don't understand how it works. Are there any docs?

@madushan1000
Copy link
Author

I couldn't find good docs on any of this unfortunately. there are some info about the virglrenderer here.

From what I understand, when a egl client on guest wants to allocate a dma-buf, it'll request the buffer via virgl mesa driver. mesa will request a buffer from virtio-gpu kernel driver. kernel driver will request a buffer from the hypervisor(crosvm or qemu). hypervisor will request a buffer form virglrenderer running on the host, and it will finally request a real buffer on the host gpu using host mesa driver.
When the client wants to render something onto this buffer, it'll do opengl calls, which will be serialized by virgl mesa driver and sent to the host virglrenderer, which will de-serialize the opengl calls and render stuff on the the allocated buffer. Because the buffer allocation request goes trough two mesa drivers, host and gust has two different "views" of the buffer.

@madushan1000
Copy link
Author

I got it to work. I'll update the draft pr with the changes later.
Screencast from 2024-03-10 13-31-11.webm

@MaxVerevkin
Copy link
Owner

Nice!

@madushan1000
Copy link
Author

I've hit another snag :/ so far I've been using an ubuntu VM to test the virtio suff, now I have to start developing it on android which is where I intended to use it.
It looks like mesa on android doesn't have EGL_KHR_platform_gbm extension. Is it required that I use wayrs-egl(or whatever egl) to use dam-buf with wayland? I don't understand this aspect of wayland very well.

@MaxVerevkin
Copy link
Owner

It looks like mesa on android doesn't have EGL_KHR_platform_gbm extension.

That's bad but it is not critical. AFAIK EGL_EXT_platform_device can be used instead, but it is more annoying. Does it have it? And what about the other extensions?

Is it required that I use wayrs-egl(or whatever egl) to use dam-buf with wayland?

No, wayrs-egl just glues GBM, EGL and linux-dmabuf-feedback-v1 protocol together:

  • You need GBM to get a buffer on linux. There may be other allocators. BTW, android has it, right?
  • EGL is a cross platform API to create an OpenGL context, to render to a previously allocated buffer. Again, there may be others.
  • You need the dmabuf feedback protocol to create a wl_buffer. This is required to tell the compositor that you have a GPU buffer.

@madushan1000
Copy link
Author

My compositor would just pass the buffers rendered by android to the host compositor and hopefully don't have to do any allocations. Android will do all the buffer allocations. Maybe I can get away with just using linux-dmabuf-feedback-v1 protocol and not using GBM and EGL at all.

About GBM on android, they have a system service to allocate graphics buffers called android.hardware.graphics.allocator and there is also a graphics mapper service. I'm not very clear on how they're supposed to work. Need to read more.

Looks like bellow are the available extensions on android. EGL_KHR_platform_gbm, EGL_EXT_image_dma_buf_import_modifiers, and GL_OES_EGL_image are missing.

EGL extensions string:
    EGL_ANDROID_front_buffer_auto_refresh
    EGL_ANDROID_get_native_client_buffer EGL_ANDROID_presentation_time
    EGL_EXT_surface_CTA861_3_metadata EGL_EXT_surface_SMPTE2086_metadata
    EGL_KHR_get_all_proc_addresses EGL_KHR_swap_buffers_with_damage
    EGL_ANDROID_get_frame_timestamps EGL_ANDROID_image_native_buffer
    EGL_ANDROID_native_fence_sync EGL_ANDROID_recordable
    EGL_EXT_buffer_age EGL_EXT_pixel_format_float EGL_KHR_config_attribs
    EGL_KHR_create_context EGL_KHR_create_context_no_error
    EGL_KHR_fence_sync EGL_KHR_gl_colorspace
    EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image
    EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image
    EGL_KHR_image EGL_KHR_image_base EGL_KHR_no_config_context
    EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_KHR_wait_sync

Default display:
EGL API version: 1.4
EGL vendor string: Android
EGL version string: 1.4 Android META-EGL
EGL client APIs: OpenGL_ES
EGL extensions string:
    EGL_ANDROID_front_buffer_auto_refresh
    EGL_ANDROID_get_native_client_buffer EGL_ANDROID_presentation_time
    EGL_EXT_surface_CTA861_3_metadata EGL_EXT_surface_SMPTE2086_metadata
    EGL_KHR_get_all_proc_addresses EGL_KHR_swap_buffers_with_damage
    EGL_ANDROID_get_frame_timestamps EGL_ANDROID_image_native_buffer
    EGL_ANDROID_native_fence_sync EGL_ANDROID_recordable
    EGL_EXT_buffer_age EGL_EXT_pixel_format_float EGL_KHR_config_attribs
    EGL_KHR_create_context EGL_KHR_create_context_no_error
    EGL_KHR_fence_sync EGL_KHR_gl_colorspace
    EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image
    EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image
    EGL_KHR_image EGL_KHR_image_base EGL_KHR_no_config_context
    EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_KHR_wait_sync

@MaxVerevkin
Copy link
Owner

It doesn't support any EGL_*_platform_*...

EGL_KHR_platform_gbm, EGL_EXT_image_dma_buf_import_modifiers, and GL_OES_EGL_image are missing.

That's bad.

Regarding GBM, is this related?

How do you get buffers from the android compositor? As dmabufs with known params? If so, and if you don't want to render anything additional (postprocessing?), sure, you can skip gbm/egl thing and just forward the buffers.

@madushan1000
Copy link
Author

Regarding GBM, is this related?

yes, android supports both minigbm and mesa libgbm. But clients access it via the system graphics allocator service.

Composer clients sends buffers using this interface, it uses a NativeHandle I think that's a file descriptor with some additional metadata(need to read some android platform code to figure out what).

I'll try the pass trough proxy approach first. I might have to do some post processing later(mutter doesn't support the graphics buffer format android uses properly AFAIK, so colors are off). I'll figure it out later.

@madushan1000
Copy link
Author

madushan1000 commented Mar 14, 2024

I tried a couple of ways to get async working, but I couldn't. The gpu fd doesn't seem to support epoll well. for read, it works fine. But for writes, it just waits forever. I guess because you can just write to a gpu driver fd whenever you want. so waiting for write availability is sort of meaningless.

Cross domain wayland channel is a GEM object, you can convert it to a fd by using an ioctl. So I tried using two fds for read(gpu fd), and write(cross domain channel fd). But this caused 100% cpu usage.

I think there is a "proper" way to do async stuff in gpu workloads using gpu fences. But I don't understand the documentation on this at all. So I think I'll just keep using the sync wayland client for now.

@MaxVerevkin
Copy link
Owner

If writing never blocks, then there is no problem at all. async_flush just needs to be modified to first try self.flush(IoMode::NonBlocking) and only in case of WouldBlock await the fd.

I'm pretty sure gpu fences have nothing to do with this.

@MaxVerevkin
Copy link
Owner

Actually this seems like a nice optimization even for unixstream, so I did that in 8b078fe.

@madushan1000
Copy link
Author

Interesting, I will give it another try.

So the trasnport channels virtio-gpu uses to send/receive messages from host wayland are GEM objects. Looks like I can use gpu fences to synchronize access to these in an async way just like with actual graphics buffers. But I probably won't need this.

@madushan1000
Copy link
Author

I tried with your latest commits to main, and async works okay now. It still uses three time the cpu as unix sockets implementation (~12% vs %4 cpu). But I think that's probably some other overhead. Nice!

@madushan1000
Copy link
Author

I created a separate crate for this transport https://github.com/madushan1000/wayrs-client-transport-virtgpu,
the code is not pretty and pipes(for copy pasting will not work). I'll clean things up in the future and add missing functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants