How to disable TOKENIZERS_PARALLELISM=(true | false) warning?

16,273

Solution 1

Set the environment variable to the string "false"

either by

TOKENIZERS_PARALLELISM=false

in your shell

or by:

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

in the Python script

Solution 2

I'm going to leave this comment here to help anyone wondering if it is possible to keep the parallelism and save valuable time during training. And also because it is the first stackoverflow page when searching the error directly on Google.

According to this comment on github the FastTokenizers seem to be the issue. Also according to another comment on gitmemory you shouldn't use the tokenizer before forking the process. (which basically means before iterating through your dataloader)

So the solution is to not use FastTokenizers before training/fine-tuning or use the normal Tokenizers.

Check the huggingface documentation to find out if you really need the FastTokenizer.

Solution 3

I solved this problem by downgrading huggingface's transfomers library version from 3.0.0 to 2.11.0, and tokenizers library version from 0.8.0rc4 to 0.7.0.

It seems to be a problem of the huggingface's tokenizer library version "0.8.0rc4". Currently, it seems that there is no solution to set TOKENIZERS_PARALLELISM=(true | false) as error message say.

reference : https://github.com/ThilinaRajapakse/simpletransformers/issues/515

Share:
16,273
snowzjy
Author by

snowzjy

Updated on June 19, 2022

Comments

  • snowzjy
    snowzjy almost 2 years

    I use pytorch to train huggingface-transformers model, but every epoch, always output the warning:

    The current process just got forked. Disabling parallelism to avoid deadlocks... To disable this warning, please explicitly set TOKENIZERS_PARALLELISM=(true | false)
    

    How to disable this warning?

  • Ritwik
    Ritwik almost 3 years
    so does this warning message mean that the training/fine-tuning is not happening in a parallel manner?
  • mingaflo
    mingaflo almost 3 years
    According to my experience, yes
  • Ritwik
    Ritwik over 2 years
    Not according to my experience. I ran two experiments: (a) one with this warning message (b) another without it. I just saved my dataloader from (a) and simply loaded it using torch.save() and torch.load() . Both experiments finished in approx same time (1 hour per epoch, for 3 epochs).
  • alelom
    alelom over 2 years
    Worked for me. Still, worth having a look at this answer that points out that using Fast Tokenizers may be the source of this, and that you may need to be wary of any consequences of using them.
  • Alaa M.
    Alaa M. almost 2 years
    Example how to use the FastTokenizers after training and example using "normal" Tokenizer?
  • mingaflo
    mingaflo almost 2 years
    Why do you want to use FastTokenizers after training? You should use them during training/inference. The docs tell you how to use "normal" Tokenizers.