double free or corruption when running multithreaded

13,445

Solution 1

Okay, since you've stated that it works correctly on a single-thread case, then "normal" methods won't work. You need to do the following:

  • find all variables that are accessed in parallel
  • especially take a look at those that are modified
  • don't call delete on a shared resource
  • take a look at all library functions that operate on shared resources - check if they don't do allocation/deallocation

This is the list of candidates that are double deleted:

shared(nb_examples_test, error_validation,features_test, labels_test, nb_try, ks)

Also, this code might not be thread safe:

      for (int i = 0; i < ks[j]; i++) {
         result+=_labels[ nnIdx[i] ]; 
      }    
      if (result*label<0) errors[j]++;  

Because two or more processes may try to do a write to errors array.

And a big advice -- try not to access (especially modify!) anything while in the threaded mode, that is not a parameter to the function!

Solution 2

I don't know if this is your problem, but:

void KNNClassifier::train(int nb_examples, int dim, double **features, int * labels) {
  ...
  delete _search_struct;
  if(strcmp(_search_neighbors, "brutal") == 0) {
    _search_struct = new ANNbruteForce(_dataPts, _nPts, dim);
  }else if(strcmp(_search_neighbors, "kdtree") == 0) {  
    _search_struct = new ANNkd_tree(_dataPts, _nPts, dim);
  }
}  

What happens if you don't fall into either the if or the else if clauses? You've deleted _search_struct and left it pointing to garbage. You should set it to NULL afterward.

If this isn't the problem, you could try replacing:

delete p;

with:

assert(p != NULL);
delete p;
p = NULL;

(or similarly for delete[] sites). (This probably would pose a problem for the first invocation of KNNClassifier::train, however.)

Also, obligatory: do you really need to do all of these manual allocations and deallocations? Why aren't you at least using std::vector instead of new[]/delete[] (which are almost always bad)?

Share:
13,445
Tim
Author by

Tim

Elitists are oppressive, anti-intellectual, ultra-conservative, and cancerous to the society, environment, and humanity. Please help make Stack Exchange a better place. Expose elite supremacy, elitist brutality, and moderation injustice to https://stackoverflow.com/contact (complicit community managers), in comments, to meta, outside Stack Exchange, and by legal actions. Push back and don't let them normalize their behaviors. Changes always happen from the bottom up. Thank you very much! Just a curious self learner. Almost always upvote replies. Thanks for enlightenment! Meanwhile, Corruption and abuses have been rampantly coming from elitists. Supportive comments have been removed and attacks are kept to control the direction of discourse. Outright vicious comments have been removed only to conceal atrocities. Systematic discrimination has been made into policies. Countless users have been harassed, persecuted, and suffocated. Q&amp;A sites are for everyone to learn and grow, not for elitists to indulge abusive oppression, and cover up for each other. https://softwareengineering.stackexchange.com/posts/419086/revisions https://math.meta.stackexchange.com/q/32539/ (https://i.stack.imgur.com/4knYh.png) and https://math.meta.stackexchange.com/q/32548/ (https://i.stack.imgur.com/9gaZ2.png) https://meta.stackexchange.com/posts/353417/timeline (The moderators defended continuous harassment comments showing no reading and understanding of my post) https://cs.stackexchange.com/posts/125651/timeline (a PLT academic had trouble with the books I am reading and disparaged my self learning posts, and a moderator with long abusive history added more insults.) https://stackoverflow.com/posts/61679659/revisions (homework libels) Much more that have happened.

Updated on June 09, 2022

Comments

  • Tim
    Tim almost 2 years

    I met a runtime error "double free or corruption" in my C++ program that calls a reliable library ANN and uses OpenMP to parallize a for loop.

    *** glibc detected *** /home/tim/test/debug/test: double free or corruption (!prev): 0x0000000002527260 ***     
    

    Does it mean that the memory at address 0x0000000002527260 is freed more than once?

    The error happens at "_search_struct->annkSearch(queryPt, k_max, nnIdx, dists, _eps);" inside function classify_various_k(), which is in turn inside the OpenMP for-loop inside function tune_complexity().

    Note that the error happens when there are more than one threads for OpenMP, and does not happen in single thread case. Not sure why.

    Following is my code. If it is not enough for diagnose, just let me know. Thanks for your help!

      void KNNClassifier::train(int nb_examples, int dim, double **features, int * labels) {                         
          _nPts = nb_examples;  
    
          _labels = labels;  
          _dataPts = features;  
    
          setting_ANN(_dist_type,1);   
    
        delete _search_struct;  
        if(strcmp(_search_neighbors, "brutal") == 0) {                                                                 
          _search_struct = new ANNbruteForce(_dataPts, _nPts, dim);  
        }else if(strcmp(_search_neighbors, "kdtree") == 0) {  
          _search_struct = new ANNkd_tree(_dataPts, _nPts, dim);  
          }  
    
      }  
    
    
          void KNNClassifier::classify_various_k(int dim, double *feature, int label, int *ks, double * errors, int nb_ks, int k_max) {            
            ANNpoint      queryPt = 0;                                                                                                                
            ANNidxArray   nnIdx = 0;                                                                                                         
            ANNdistArray  dists = 0;                                                                                                         
    
            queryPt = feature;     
            nnIdx = new ANNidx[k_max];                                                               
            dists = new ANNdist[k_max];                                                                                
    
            if(strcmp(_search_neighbors, "brutal") == 0) {                                                                               
              _search_struct->annkSearch(queryPt, k_max,  nnIdx, dists, _eps);    
            }else if(strcmp(_search_neighbors, "kdtree") == 0) {    
              _search_struct->annkSearch(queryPt, k_max,  nnIdx, dists, _eps); // where error occurs    
            }    
    
            for (int j = 0; j < nb_ks; j++)    
            {    
              scalar_t result = 0.0;    
              for (int i = 0; i < ks[j]; i++) {                                                                                      
                  result+=_labels[ nnIdx[i] ];    
              }    
              if (result*label<0) errors[j]++;    
            }    
    
            delete [] nnIdx;    
            delete [] dists;    
    
          }    
    
          void KNNClassifier::tune_complexity(int nb_examples, int dim, double **features, int *labels, int fold, char *method, int nb_examples_test, double **features_test, int *labels_test) {    
              int nb_try = (_k_max - _k_min) / scalar_t(_k_step);    
              scalar_t *error_validation = new scalar_t [nb_try];    
              int *ks = new int [nb_try];    
    
              for(int i=0; i < nb_try; i ++){    
                ks[i] = _k_min + _k_step * i;    
              }    
    
              if (strcmp(method, "ct")==0)                                                                                                                     
              {    
    
                train(nb_examples, dim, features, labels );// train once for all nb of nbs in ks                                                                                                
    
                for(int i=0; i < nb_try; i ++){    
                  if (ks[i] > nb_examples){nb_try=i; break;}    
                  error_validation[i] = 0;    
                }    
    
                int i = 0;    
          #pragma omp parallel shared(nb_examples_test, error_validation,features_test, labels_test, nb_try, ks) private(i)    
                {    
          #pragma omp for schedule(dynamic) nowait    
                  for (i=0; i < nb_examples_test; i++)         
                  {    
                    classify_various_k(dim, features_test[i], labels_test[i], ks, error_validation, nb_try, ks[nb_try - 1]); // where error occurs    
                  }    
                }    
                for (i=0; i < nb_try; i++)    
                {    
                  error_validation[i]/=nb_examples_test;    
                }    
              }
    
              ......
         }
    

    UPDATE:

    Thanks! I am now trying to correct the conflict of writing to same memory problem in classify_various_k() by using "#pragma omp critical":

    void KNNClassifier::classify_various_k(int dim, double *feature, int label, int *ks, double * errors, int nb_ks, int k_max) {   
      ANNpoint      queryPt = 0;    
      ANNidxArray   nnIdx = 0;      
      ANNdistArray  dists = 0;     
    
      queryPt = feature; //for (int i = 0; i < Vignette::size; i++){ queryPt[i] = vignette->content[i];}         
      nnIdx = new ANNidx[k_max];                
      dists = new ANNdist[k_max];               
    
      if(strcmp(_search_neighbors, "brutal") == 0) {// search  
        _search_struct->annkSearch(queryPt, k_max,  nnIdx, dists, _eps);  
      }else if(strcmp(_search_neighbors, "kdtree") == 0) {  
        _search_struct->annkSearch(queryPt, k_max,  nnIdx, dists, _eps);  
      }  
    
      for (int j = 0; j < nb_ks; j++)  
      {  
        scalar_t result = 0.0;  
        for (int i = 0; i < ks[j]; i++) {          
            result+=_labels[ nnIdx[i] ];  // Program received signal SIGSEGV, Segmentation fault
        }  
        if (result*label<0)  
        {  
        #pragma omp critical  
        {  
          errors[j]++;  
        }  
        }  
    
      }  
    
      delete [] nnIdx;  
      delete [] dists;  
    
    }
    

    However, there is a new segment fault error at "result+=_labels[ nnIdx[i] ];". Some idea? Thanks!