Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in Bootleg during request #174

Open
gcampax opened this issue Jul 14, 2021 · 6 comments
Open

Crash in Bootleg during request #174

gcampax opened this issue Jul 14, 2021 · 6 comments
Assignees
Labels
bug Something isn't working P1 We're working on it right now server Issues with serving and dynamic inference-time

Comments

@gcampax
Copy link
Contributor

gcampax commented Jul 14, 2021

The following command: "find me a movie with chris pratt" seems to reliably trigger a crash in Booleg with the currently deployed model on staging.

[E 210714 17:56:44 web:1789] Uncaught exception POST /v1/models/x40org-thingpedia-models-defaultx2fen:predict (127.0.0.1)
    HTTPServerRequest(protocol='http', host='x40org-thingpedia-models-defaultx2fen-predictor-default.staging.svc.cluster.local', method='POST', uri='/v1/models/x40org-thingpedia-models-defaultx2fen:predict', version='HTTP/1.1', remote_ip='127.0.0.1')
    Traceback (most recent call last):
      File "/usr/local/lib64/python3.8/site-packages/tornado/web.py", line 1704, in _execute
        result = await result
      File "/usr/local/lib/python3.8/site-packages/kfserving/handlers/http.py", line 79, in post
        response = (await model.predict(request)) if inspect.iscoroutinefunction(model.predict) else model.predict(request)
      File "/opt/genienlp/genienlp/kfserver.py", line 55, in predict
        results = self.server.handle_request(request)
      File "/opt/genienlp/genienlp/server.py", line 142, in handle_request
        output = generate_with_model(
      File "/opt/genienlp/genienlp/validate.py", line 60, in generate_with_model
        return generate_with_seq2seq_model(
      File "/opt/genienlp/genienlp/validate.py", line 124, in generate_with_seq2seq_model
        generated = model.generate(
      File "/opt/genienlp/genienlp/models/transformer_seq2seq.py", line 175, in generate
        generated = self.model.generate(
      File "/usr/local/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
        return func(*args, **kwargs)
      File "/usr/local/lib/python3.8/site-packages/transformers/generation_utils.py", line 970, in generate
        return self.greedy_search(
      File "/usr/local/lib/python3.8/site-packages/transformers/generation_utils.py", line 1327, in greedy_search
        if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):
    RuntimeError: CUDA error: device-side assert triggered
[E 210714 17:56:44 web:2239] 500 POST /v1/models/x40org-thingpedia-models-defaultx2fen:predict (127.0.0.1) 395.41ms
[E 210714 17:57:02 web:1789] Uncaught exception POST /v1/models/x40org-thingpedia-models-defaultx2fen:predict (127.0.0.1)
    HTTPServerRequest(protocol='http', host='x40org-thingpedia-models-defaultx2fen-predictor-default.staging.svc.cluster.local', method='POST', uri='/v1/models/x40org-thingpedia-models-defaultx2fen:predict', version='HTTP/1.1', remote_ip='127.0.0.1')
    Traceback (most recent call last):
      File "/usr/local/lib64/python3.8/site-packages/tornado/web.py", line 1704, in _execute
        result = await result
      File "/usr/local/lib/python3.8/site-packages/kfserving/handlers/http.py", line 79, in post
        response = (await model.predict(request)) if inspect.iscoroutinefunction(model.predict) else model.predict(request)
      File "/opt/genienlp/genienlp/kfserver.py", line 55, in predict
        results = self.server.handle_request(request)
      File "/opt/genienlp/genienlp/server.py", line 109, in handle_request
        extract_features_with_annotator(examples, self.bootleg_annotator, self.args, task)
      File "/opt/genienlp/genienlp/data_utils/bootleg.py", line 96, in extract_features_with_annotator
        bootleg_labels = bootleg_annotator.label_mentions(bootleg_inputs)
      File "/usr/local/lib/python3.8/site-packages/bootleg/end2end/bootleg_annotator.py", line 551, in label_mentions
        batch_example_aliases_locs_start = torch.tensor(
    RuntimeError: CUDA error: device-side assert triggered
@gcampax gcampax added the bug Something isn't working label Jul 14, 2021
@gcampax
Copy link
Contributor Author

gcampax commented Jul 14, 2021

Actually, after the first crash now any command causes a crash. I assume it is because of a CUDA error that was not recovered correctly.

@gcampax
Copy link
Contributor Author

gcampax commented Jul 14, 2021

Yeah the error doesn't seem to be Bootleg related. There is this warning though:

Token indices sequence length is longer than the specified maximum sequence length for this model (1461 > 1024). Running this sequence through the model will result in indexing errors

What's going on here?

@gcampax
Copy link
Contributor Author

gcampax commented Jul 14, 2021

This is quite interesting because we pass truncation=True when we call Tokenizer.batch_encode_plus, so that should truncate the sequence to the max length of the model (1024). Why does it not happen?

@Mehrad0711
Copy link
Member

Mehrad0711 commented Jul 14, 2021

truncation is used only for token classification task (where input words and labels need to be aligned) but not for the general encoding which happens in encode_batch method.
I think we should raise an error if any input length surpasses the model maximum length instead of truncating. This forces user to inspect their input and make sure it's not a dataset bug (missing end of line, etc.) If their task truly needs handling long sequences, e.g. document classification, QA with long history, they can add a new task with specific preprocessing (similar to what I did for ambigqa task)

@Mehrad0711
Copy link
Member

Mehrad0711 commented Jul 14, 2021

Alternatively, we can make truncation optional and add a flag for it so the user can decide what to do. Although, I prefer the first approach to avoid silent bugs.

@gcampax
Copy link
Contributor Author

gcampax commented Jul 14, 2021

I agree that if truncation is necessary it is a bug, but the current failure mode takes down the whole server until it is manually restarted. We can raise an error if we catch it in the server code and report it correctly to the API caller (not a 500 error). Otherwise, logging a warning and truncating is better than nothing.

@gcampax gcampax added the server Issues with serving and dynamic inference-time label Jul 15, 2021
@nrser nrser added the P1 We're working on it right now label Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1 We're working on it right now server Issues with serving and dynamic inference-time
Projects
None yet
Development

No branches or pull requests

3 participants