You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From my point of view, use python interface we can insert cudaprofilestart() and cudaprofilestop() to better prof our program, because if we use trtexec, superbench will start anothor thread to execute and nvprof can not correctly prof the real command, and, directly profile trtexec will prof the compilation progress and runtime progress, in most of the case, we only need the last one.
tensorrt python interface example:
importtensorrtastrtimportcommonimporttimeimportpycuda.driverascudaimporttorchimportosTRT_LOGGER=trt.Logger()
definference(context, test_data):
inputs, outputs, bindings, stream=common.allocate_buffers(context.engine)
result= []
inputs[0].host=test_data_, elapsed_time=common.do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
returnresult, elapsed_time# This function builds an engine from a Onnx model.defbuild_engine(model_file, batch_size=32):
withtrt.Builder(TRT_LOGGER) asbuilder, builder.create_network(common.EXPLICIT_BATCH) asnetwork, trt.OnnxParser(network, TRT_LOGGER) asparser, builder.create_builder_config() astrt_config:
# Attention that, builder should be set to 1 because of the implementation of allocate_bufferbuilder.max_batch_size=1# builder.max_workspace_size = common.GiB(1)trt_config.max_workspace_size=common.GiB(4)
# Parse onnx modelwithopen(model_file, 'rb') asmodel:
ifnotparser.parse(model.read()):
print ('ERROR: Failed to parse the ONNX file.')
forerrorinrange(parser.num_errors):
print (parser.get_error(error))
returnNone# This design may not be correct if output more than one""" for i in range(network.num_layers): layer = network.get_layer(i) layer.precision = trt.int8 layer.set_output_type(0, trt.int8) """# network.mark_output(model_tensors.find(ModelData.OUTPUT_NAME))# Build engine and do int8 calibration.# engine = builder.build_cuda_engine(network)engine=builder.build_engine(network, trt_config)
returnengineonnx_path="/workspace/v-leiwang3/benchmark/nnfusion_models/resnet50.float32.1.onnx"dummy_input=torch.rand(1, 3, 224, 224).numpy()
engine=build_engine(onnx_path)
context=engine.create_execution_context()
# warmupforiinrange(5):
_, time=inference(context, dummy_input)
# iterationtime_set= []
foriinrange(100):
_, time=inference(context, dummy_input)
time_set.append(time)
print(f'average time: {sum(time_set)/len(time_set)*1000} ms')
The text was updated successfully, but these errors were encountered:
From my point of view, use python interface we can insert cudaprofilestart() and cudaprofilestop() to better prof our program, because if we use trtexec, superbench will start anothor thread to execute and nvprof can not correctly prof the real command, and, directly profile trtexec will prof the compilation progress and runtime progress, in most of the case, we only need the last one.
tensorrt python interface example:
The text was updated successfully, but these errors were encountered: