How to store best models checkpoints, not only newest 5, in Tensorflow Object Detection API?

11,153

Solution 1

You can modify (hardcoding in your fork or opening a pull request and adding the options to protos) the arguments passed to tf.train.Saver in:

https://github.com/tensorflow/models/blob/master/research/object_detection/legacy/trainer.py#L376-L377

You will probably want to set:

  • max_to_keep: Maximum number of recent checkpoints to keep. Defaults to 5.
  • keep_checkpoint_every_n_hours: How often to keep checkpoints. Defaults to 10,000 hours.

Solution 2

You can change config.

in run_config.py

class RunConfig(object):
  """This class specifies the configurations for an `Estimator` run."""

  def __init__(self,
           model_dir=None,
           tf_random_seed=None,
           save_summary_steps=100,
           save_checkpoints_steps=_USE_DEFAULT,
           save_checkpoints_secs=_USE_DEFAULT,
           session_config=None,
           keep_checkpoint_max=10,
           keep_checkpoint_every_n_hours=10000,
           log_step_count_steps=100,
           train_distribute=None,
           device_fn=None,
           protocol=None,
           eval_distribute=None,
           experimental_distribute=None):

Solution 3

You may be interested by this Tf github thread that tackles the newest/best checkpoint issue. A user developed his own wrapper, chekmate, around tf.Saver to keep track of the best checkpoints.

Solution 4

You can follow up this PR. Here your best checkpoint is saved within your checkpoint directory, sub-directory named as best.

You just need to integrate the best_saver() and (method call in _run_checkpoint_once()) inside ../object_detection/eval_util.py

Additionally it will also create a json for all_evaluation_metrices.

Share:
11,153
Piotr Januszewski
Author by

Piotr Januszewski

Updated on July 28, 2022

Comments

  • Piotr Januszewski
    Piotr Januszewski almost 2 years

    I'm training MobileNet on WIDER FACE dataset and I encountered problem I couldn't solve. TF Object Detection API stores only last 5 checkpoints in train dir, but what I would like to do, is to save best models relative to mAP metric (or at least leave many more models in train dir before deletion). For example, today I've looked at Tensorboard after next night of training and I see that overnight model has over-fitted and I can't restore best checkpoint, because it was deleted already.

    EDIT: I just use Tensorflow Object Detection API, it by default saves last 5 checkpoints in train dir which I point. I look for some configuration parameter or anything that will change this behavior.

    Has anyone have some fix in code/config param to set/workaround for that? It seems like I'm missing something, it should be obvious that what's in fact important is the best model, not the newest one (which can overfit).

    Thanks!

  • Piotr Januszewski
    Piotr Januszewski about 6 years
    I found this github.com/tensorflow/models/pull/3802, so it seems that someone already done this, but its currently waiting for resolving conflicts. Thanks for your help!
  • hafiz031
    hafiz031 almost 4 years
    Which parameter to change for saving the best model? Where the file run_config.py is located?