InternVL2的C++代码疑问？ #52

w8501 · 2024-10-29T01:05:22Z

void InternVL2::vit_launch(std::vector<float> &pixel_values, std::vector<int> &img_offset) {
  auto &out_mem = net_embed->stages[0].output_mems[0];
  auto &vit_in_mem = net_vit->stages[0].input_mems[0];
  auto &vit_out_mem = net_vit->stages[0].output_mems[0];
  for (size_t i = 0; i < img_offset.size(); i++) {
    int vit_in_size = bm_mem_get_device_size(vit_in_mem);
    assert(vit_in_size);
    
    int offset = i * pixel_values.size() / img_offset.size(); // 假设 IMAGE_BYTES 是每张图片展平后的大小
    bm_memcpy_s2d(bm_handle, vit_in_mem, (void *)(pixel_values.data() + offset));
    
    net_launch(net_vit);

    int vit_out_size = bm_mem_get_device_size(vit_out_mem);
    int dst_offset = img_offset[i] * HIDDEN_SIZE * 2;
    bm_memcpy_d2d_byte(bm_handle, out_mem, dst_offset, vit_out_mem, 0, vit_out_size);
  }
}

为什么这里的dst_offset = img_offset[i] * HIDDEN_SIZE * 2; 而不是 img_offset[i] * HIDDEN_SIZE。这个乘2的目的是什么？
之前从sophon-demo-release仓库的python代码看到是img_offset[i] * HIDDEN_SIZE self.tensors[self.name_embed]["output"][0].sync_d2d(self.vit_output, 0, int(img_offset * self.HIDDEN_SIZE), np.prod(self.vit_output.shape()))

The text was updated successfully, but these errors were encountered:

w8501 · 2024-10-29T01:56:55Z

是因为输出的type是BFLOAT16吗？那还有一个疑问就是，现在示例代码给了多图输入的，但是官方量化模型的时候已经固定了token是512，应该做不了多图吧？如果多图是不是内存越界了？

chuxiaoyi2023 · 2024-11-12T13:49:37Z

对的，bfloat16两个字节，就x2

目前已经支持多图了~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InternVL2的C++代码疑问？ #52

InternVL2的C++代码疑问？ #52

w8501 commented Oct 29, 2024

w8501 commented Oct 29, 2024

chuxiaoyi2023 commented Nov 12, 2024

InternVL2的C++代码疑问？ #52

InternVL2的C++代码疑问？ #52

Comments

w8501 commented Oct 29, 2024

w8501 commented Oct 29, 2024

chuxiaoyi2023 commented Nov 12, 2024