How to disable TOKENIZERS_PARALLELISM=(true | false) warning?

python pytorch huggingface-transformers huggingface-tokenizers

16,273

Solution 1

Set the environment variable to the string "false"

either by

TOKENIZERS_PARALLELISM=false

in your shell

or by:

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

in the Python script

Solution 2

I'm going to leave this comment here to help anyone wondering if it is possible to keep the parallelism and save valuable time during training. And also because it is the first stackoverflow page when searching the error directly on Google.

According to this comment on github the FastTokenizers seem to be the issue. Also according to another comment on gitmemory you shouldn't use the tokenizer before forking the process. (which basically means before iterating through your dataloader)

So the solution is to not use FastTokenizers before training/fine-tuning or use the normal Tokenizers.

Check the huggingface documentation to find out if you really need the FastTokenizer.

Solution 3

I solved this problem by downgrading huggingface's transfomers library version from 3.0.0 to 2.11.0, and tokenizers library version from 0.8.0rc4 to 0.7.0.

It seems to be a problem of the huggingface's tokenizer library version "0.8.0rc4". Currently, it seems that there is no solution to set TOKENIZERS_PARALLELISM=(true | false) as error message say.

reference : https://github.com/ThilinaRajapakse/simpletransformers/issues/515

16,273

Author by

snowzjy

Updated on June 19, 2022

Comments

snowzjy almost 2 years

I use pytorch to train huggingface-transformers model, but every epoch, always output the warning:

The current process just got forked. Disabling parallelism to avoid deadlocks... To disable this warning, please explicitly set TOKENIZERS_PARALLELISM=(true | false)

How to disable this warning?

Ritwik almost 3 years

so does this warning message mean that the training/fine-tuning is not happening in a parallel manner?
mingaflo almost 3 years

According to my experience, yes
Ritwik over 2 years

Not according to my experience. I ran two experiments: (a) one with this warning message (b) another without it. I just saved my dataloader from (a) and simply loaded it using torch.save() and torch.load() . Both experiments finished in approx same time (1 hour per epoch, for 3 epochs).
alelom over 2 years

Worked for me. Still, worth having a look at this answer that points out that using Fast Tokenizers may be the source of this, and that you may need to be wary of any consequences of using them.
Alaa M. almost 2 years

Example how to use the FastTokenizers after training and example using "normal" Tokenizer?
mingaflo almost 2 years

Why do you want to use FastTokenizers after training? You should use them during training/inference. The docs tell you how to use "normal" Tokenizers.