Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference speed is slow #6

Open
tlxnulixuexi opened this issue Dec 17, 2023 · 7 comments
Open

inference speed is slow #6

tlxnulixuexi opened this issue Dec 17, 2023 · 7 comments

Comments

@tlxnulixuexi
Copy link

Hello, @tersekmatija,
I'm sorry to bother you.
I have some questions that I'd like to ask you.
When I use the ewasr model to train the LARS data set, the training speed is very fast, but the inference speed is very slow. Compared with wasr, the inference speed does not have a big advantage. It is consistent with your paper. There is a big gap in the ten times speed. What is the reason?

@tersekmatija
Copy link
Owner

Hey @tlxnulixuexi ,
What script do you use for prediction? Training on a different dataset should not have a noticeable impact on the inference speed, since the only thing that should change are the model's weights.

@tlxnulixuexi
Copy link
Author

I would like to ask you to help me see if there is any problem with some of the codes below. I am using the predict.py file in ewasr, adding the code to calculate fps in it, and using the GPU for inference, but the calculation results after inference are Only 22.7 (average fps). Thanks

import time

...

def predict(args):
if args.dataset == "mods":
dataset = MODSDataset(args.dataset_config, normalize_t=PytorchHubNormalization())
else:
dataset = MaSTr1325Dataset(args.dataset_config, normalize_t=PytorchHubNormalization(), include_original=True)

dl = DataLoader(dataset, batch_size=args.batch_size, num_workers=1)

# Prepare model
model = models.get_model(args.model, num_classes=args.num_classes, pretrained=False, mixer=args.mixer, enricher=args.enricher, project=args.project)
state_dict = load_weights(args.weights)
model.load_state_dict(state_dict)
predictor = Predictor(model, args.fp16)

output_dir = Path(args.output_dir)
if not output_dir.exists():
    output_dir.mkdir(parents=True)

start_time = time.time()
processed_images = 0

for features, labels in tqdm(iter(dl), total=len(dl)):
    pred_masks = predictor.predict_batch(features)

    for i, pred_mask in enumerate(pred_masks):
        pred_mask = SEGMENTATION_COLORS[pred_mask]
        orig_img = features["image_original"][i].numpy()
        pred_mask = np.transpose(orig_img, (1,2,0)) * 0.7 + pred_mask * 0.3
        pred_mask = pred_mask.astype(np.uint8)
        mask_img = Image.fromarray(pred_mask)

        out_file = output_dir / labels['mask_filename'][i]

        mask_img.save(out_file)

        processed_images += 1

end_time = time.time()
elapsed_time = end_time - start_time
fps = processed_images / elapsed_time
print(f"Processed {processed_images} images in {elapsed_time} seconds. FPS: {fps:.2f}")

...

@tersekmatija
Copy link
Owner

Hey @tlxnulixuexi ,

Sorry for a late reply. The way you measure FPS is not correct. You need to use torch.cuda.synchronize() which waits for all CUDA streams to complete before taking the time measurement. More insights into why here.

I pushed the slightly modified benchmarking code used for the paper to tools/benchmark.py. You can run it with:
python3 tools/benchmark.py -d GPU -niter 300 after installing dev requirements with pip install -r requirements-dev.txt.

Benchmark should produce lines with latency and FPS like:

eWaSR                          ----- 008.737 [009.06, 001.57] ms latency ----- 114.45 FPS
WaSR                           ----- 089.036 [089.74, 002.47] ms latency ----- 011.23 FPS

It will also visualize the density of the latency measurements of each prediction.
models_full_GPU

Let me know if this works for you.

@tlxnulixuexi
Copy link
Author

Hello @tersekmatija,
thank you very much for your reply. Reply I will run it again using the method you provided to us. Thank you again for your advice

@tersekmatija
Copy link
Owner

Thanks @tlxnulixuexi ,
Feel free to close the issues if you can replicate the results.

Best,
Matija

@tlxnulixuexi
Copy link
Author

Hello Matija Teršek,
Sorry to bother you again. I  have just entered this field, so my foundation is relatively weak. There is a lot of knowledge that is not well understood. Regarding the benchmark.py file you uploaded in ewasr, there are a few things I don’t understand very well and I need to ask you for advice. The first is how to use the weight files I have trained on other data sets in benchmark.py to perform inference and calculate fps. Second, you are using a single image file for inference in benchmark.py. If I modify the code and use an image folder for inference, will it affect other parts? Third, from what part should the fps be calculated as the starting time of inference, and why the way I calculated the fps gave such a bad result.
These are some of my questions and I would be very grateful if you would answer them.

@tersekmatija
Copy link
Owner

The first is how to use the weight files I have trained on other data sets in benchmark.py to perform inference and calculate fps.

The weights shouldn't affect the speed. If you want, you can load your weights onto the models here.

Second, you are using a single image file for inference in benchmark.py. If I modify the code and use an image folder for inference, will it affect other parts?

Depends on the loading/pre-processing speed, but it should not.

Third, from what part should the fps be calculated as the starting time of inference, and why the way I calculated the fps gave such a bad result.

Short article that should explain it here: https://www.speechmatics.com/company/articles-and-news/timing-operations-in-pytorch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants