Spring Data MongoRepository save(T) not working... sometimes

14,401

Solution 1

Problem solved.
A different async call from the JS client, to a different endpoint in the Java backend, was overwriting my updated document in a different thread with the original values.

Both update operations were calling findById before saving. Problem was that they did so at the same time, so they were getting the same original values.
Each then carried on with updating their relevant fields and calling save at the end, resulting in the other thread effectively overriding my changes.
Each call was logged with just the relevant modified fields, so I didn't realize that one of them was overwriting the changes of the other.

Once I added systemLog.verbosity: 3 to MongoDB's config.cfg so it logged all operations, it was clear that 2 different WRITE operations were happening at the same time (~500 ms apart) but using different values.
Then it was just a matter of moving the findById closer to the save and ensuring that the JS calls were done in order (by making one of the promises depend on the other).

In hindsight, this probably wouldn't have happened if I used MongoOperations or MongoTemplate, which offer single update and findAndModify methods that also allow single-field operations, instead of MongoRepository where I'm forced to do it in 3 steps (find, modify returned entity, save) and to work with the full document.


EDIT: I didn't really like my first "move findById closer to save" approach, so in the end I did what I felt was right and implemented custom save methods that used MongoTemplate's finer-grained update API. Final code:

/* MongoRepository provides entity-based default Spring Data methods */
/* BookDataRepositoryCustom provides field-level update methods */
public interface BookDataRepository extends MongoRepository<BookData, String>, BookDataRepositoryCustom {

    BookData findById(String id);

}

/* Interface for the custom methods */
public interface BookDataRepositoryCustom {

    int saveCurrentPage(String id, Integer currentPage);
}

/* Custom implementation using MongoTemplate. */
@SuppressWarnings("unused")
public class BookDataRepositoryImpl implements BookDataRepositoryCustom {
    @Inject
    MongoTemplate mongoTemplate;

    @Override
    public int saveCurrentPage(String id, Integer currentPage) {
        Query query = new Query(Criteria.where("_id").is(id));
        Update update = new Update();
        update.set("currentPage", currentPage);

        WriteResult result = mongoTemplate.updateFirst(query, update, BookData.class);

        return result == null ? 0 : result.getN();
    }
}

// Old code: get entity from DB, update, save. 3 steps with plenty of room for interferences.
//        BookData bookData = bookDataRepository.findById(bookDataId);
//        bookData.setCurrentPage(currentPage);
//        bookDataRepository.save(bookData);
// New code: update single field. 1 step, 0 problems.
        bookDataRepository.saveCurrentPage(bookDataId, currentPage);

By doing so, each endpoint can update as frequently as needed via MongoTemplate without ever worrying about overwriting unrelated fields, and I still keep the entity-based MongoRepository methods for things like new entity creation, findBy methods, annotated @Querys etc.

Solution 2

MongoDB is inherently a cache store, by which I mean, the contents are not guaranteed to be latest or necessarily correct. I haven't been able to find the configuration options for flush time (but they would be configured in the DB itself), but MongoDB has added features so that you can choose fast+dirty, or slow+clean. This "freshness" factor is most likely your issue if you are seeing this kind of problem. (Even if you aren't running distributed, there is a timing difference between request acknowledge, and request committed)

Here is a link to post regarding "clean reading" (Key point in following quote)

http://www.dagolden.com/index.php/2633/no-more-dirty-reads-with-mongodb/

I encourage MongoDB users to place themselves (or at least, their application activities) into one of the following groups:

"I want low latency" – Dirty reads are OK as long as things are fast. Use w=1 and read concern 'local'. (These are the default settings.) "I want consistency" – Dirty reads are not OK, even at the cost of latency or slightly out of date data. Use w='majority' and read concern 'majority. use MongoDB v1.2.0;

my $mc = MongoDB->connect(
    $uri,
    {
        read_concern_level => 'majority',
        w => 'majority',
    }
);

further reading that may or may not be useful

Update

If running in a multi-thread environment, make sure your threads aren't trampling over another's updates. You can verify if this is happening by configuring the system or query logging level to 5. https://docs.mongodb.com/manual/reference/log-messages/#log-messages-configure-verbosity

Share:
14,401

Related videos on Youtube

walen
Author by

walen

Updated on April 23, 2020

Comments

  • walen
    walen about 4 years

    So there's this little Angular + Java + Spring Boot + MongoDB app I'm working with. It's getting quite a lot of action (read: code modifications) lately, but the data access classes have gone largely untouched AFAIK.
    However, it seems like MongoRepository suddenly decided to stop persisting the changes I'm save()ing to DB.

    Inspecting mongod.log this is what I see when the save() works:

    2018-04-11T15:04:06.840+0200 I COMMAND  [conn6] command pdfviewer.bookData command: find { find: "bookData", filter: { _id: "ID_1" }, limit: 1, singleBatch: true } planSummary: IDHACK keysExamined:1 docsExamined:1 idhack:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:1 nreturned:1 reslen:716 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_query 102ms
    2018-04-11T17:30:19.615+0200 I WRITE    [conn7] update pdfviewer.bookData query: { _id: "ID_1" } update: { _class: "model.BookData", _id: "ID_1", config: { mode: "normal", offlineEnabled: true }, metadata: { title: "PDFdePrueba3pag   copia  6 ", ...}, downloaded: false, currentPageNumber: 2, availablePages: 3, bookmarks: [], stats: { _id: "c919e517-3c68-462c-8396-d4ba391762e6", dateOpen: new Date(1523460575872), dateClose: new Date(1523460575951), timeZone: "+2", ... }, ... } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:1 numYields:1 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } 315ms
    2018-04-11T17:30:19.615+0200 I COMMAND  [conn7] command pdfviewer.$cmd command: update { update: "bookData", ordered: false, updates: [ { q: { _id: "ID_1" }, u: { _class: "model.BookData", _id: "ID_1", config: { mode: "normal", offlineEnabled: true }, metadata: { title: "PDFdePrueba3pag   copia  6 ", ...}, downloaded: false, currentPageNumber: 2, availablePages: 3, bookmarks: [], stats: { _id: "c919e517-3c68-462c-8396-d4ba391762e6", dateOpen: new Date(1523460575872), dateClose: new Date(1523460575951), timeZone: "+2", ... }, ... }, upsert: true } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } protocol:op_query 316ms
    

    And this is what I see when it doesn't:

    2018-04-11T18:13:21.864+0200 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:64271 #1 (1 connection now open)
    2018-04-11T18:18:51.425+0200 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:64329 #2 (2 connections now open)
    2018-04-11T18:19:06.967+0200 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:64346 #3 (3 connections now open)
    

    By doing a tail -f1 on the log file while debugging, I've seen those connections appear right when my code calls findById() or save(), so it seems like the app can reach the DB.

    This is (more or less) the relevant Java code:

    /* BookData.java */
    @Document
    public class BookData {
    
        @Id private String id;
        // Some more non-Id Strings...
        private Config config;
        private Metadata metadata;
        private Boolean downloaded;
        private Integer currentPageNumber;
        private int availablePages;
        private List<Bookmark> bookmarks;
        private StatsModel stats;
    
        @Transient private byte[] contents;
    
        public BookData() {}
    
        // getters and setters
    }
    
    /* BookDataRepository.java */
    // MongoRepository comes from spring-boot-starter-parent-1.4.5.RELEASE
    public interface BookDataRepository extends MongoRepository<BookData, String> {
        BookData findById(String id);
    }
    
    /* BookDataServiceImpl.java */
    public BookData updateBookData(String id, BookData newData) {
        final BookData original = bookDataRepository.findById(id);
        if (original == null) {
            return null;
        }
        original.setCurrentPageNumber(Optional.ofNullable(newData.getCurrentPageNumber()).orElseGet(original::getCurrentPageNumber));
        // similar code for a couple other fields
    
        return bookDataRepository.save(original);
    }
    

    I've stepped through that part a hundred times while debugging and everything seems to be OK:

    • findById(id) correctly returns the expected BookData original object: check ✓
    • newData contains the expected values to be used for updating: check ✓
    • right before calling save(original), original has been correctly modified using newData values: check ✓
    • save() executes without errors: check ✓
    • save() returns a new BookData with correctly updated values: to my own surprise, check ✓
    • after save() returns, a db.bookData.find() query in Mongo Shell shows that the values have been updated: fail.
    • after save() returns, the BookData object retrieved by new calls to findById() contains the updated values: fail (sometimes it does, sometimes it doesn't).

    It just looks like MongoDB is waiting for some kind of flush(), but this is not a JPA repository where one could call saveAndFlush() instead.

    Any ideas why this might be happening?

    EDIT: versions (as requested):

    • Java 8
    • Spring Boot 1.4.5
    • MongoDB 3.2.6
    • Windows 10

    I also included BookData above.

    • walen
      walen about 6 years
      [1] Actually a Get-content mongod.log -Tail 10 -Wait in PowerShell, but not really relevant.
    • Paizo
      Paizo about 6 years
      I suggest you to add some logs there in updateBookData since there is a null check and in some case the original would be untouched. log the bean before the changes and right before the save could help on app errors. Can you post your BookData bean?
    • Anatoly Shamov
      Anatoly Shamov about 6 years
      Have you customized WriteResultChecking or WriteConcern policies?
    • walen
      walen about 6 years
      @Paizo Thanks for your chiming in. Using step-by-step execution I confirmed that original is being modified as expected before the save (as noted in the question), so that is not the problem. I'll add BookData as requested.
    • walen
      walen about 6 years
      @AnatolyShamov Nope, everything regarding Spring Data and MongoRepository is as vanilla as it can be.
    • Tarun Lalwani
      Tarun Lalwani about 6 years
      @LuisG., can you add information about the mongo server as well? Version, OS etc..
    • walen
      walen about 6 years
      @TarunLalwani Done.
    • Tarun Lalwani
      Tarun Lalwani about 6 years
    • Tezra
      Tezra about 6 years
      @LuisG. Do you have a test case to reliably reproduce the issue?
    • walen
      walen about 6 years
      @TarunLalwani Our production MongoDB instance is shared by several apps including mine, so upgrading to 3.4 or 3.6 would require non-negligible effort by other teams to test their apps against the new version etc. -- something I'd rather avoid for now, unless there's some documented problem with 3.2.x that could justify upgrading.
    • walen
      walen about 6 years
      @Tezra Sadly, no. I cannot get even the actual app to fail consistently, so...
    • Tarun Lalwani
      Tarun Lalwani about 6 years
      Could it be that the Mongo client version library you are using now has some problem? I thought this was a local instance and thats why suggested an upgrade
    • walen
      walen about 6 years
      @TarunLalwani Yes, for development I'm using a local instance. Both my local instance and the prod one are 3.2.6. What I mean is that if I upgrade my local instance, and then everything works... I still wouldn't know why it sometimes failed before and I'd probably need a better explanation than that before I get DevOps and the other teams involved.
    • Tarun Lalwani
      Tarun Lalwani about 6 years
      Its hard to say because a recent patch or something on your machine may have impacted the behavior. There could be a lot of combination and it could be even specific to your machine. Until unless your corelate around different environments and configs its hard to say
    • walen
      walen about 6 years
      @Tezra As said in the question and in one of my comments above, I've confirmed via step-by-step execution that updateBookData is being executed every time, and the original object is being modified before saveing, and the save call even returns an updated object. But the changes are not persisted to the actual DB. Smells like caching to me, but I'm not aware of any caching options for MongoRepository.
    • Tezra
      Tezra about 6 years
      @LuisG. Run some more tests, but add a 61 second sleep between save and return. If that works, than this is definitely a flush issue (or locking issue).
    • s7vr
      s7vr about 6 years
      Could you enable the debug level logging for both spring mongo and mongodb dependencies ? So when save fails you will have more information. Try logging.level.org.springframework.data.mongodb.core.MongoTem‌​plate=DEBUG logging.level.org.mongodb.driver.protocol.insert=DEBUG logging.level.org.mongodb.driver.protocol.query=DEBUG logging.level.org.mongodb.driver.protocol.command=DEBUG logging.level.org.mongodb.driver.protocol.update=DEBUG in application.properties and compare the logs. Hopefully this will help you narrow down the issue.
    • s7vr
      s7vr about 6 years
      logs like MongoTemplate : findOne using query: { "_id" : { "$oid" : "somevalue"}} in db.collection: db.somecollection protocol.command: Sending command {find : BsonString{value='somecollection'}} to database [connectionId{localValue:3, serverValue:67}] to server 127.0.0.1:27017 protocol.command: Command completed MongoTemplate: Calling update using query: { "_id" : { "$oid" : "somevalue"} , "version" : 4} and update: update json in collection: some collection protocol.update: Updating in namespace db.somecollection on connection serverValue:67}] protocol.update: Update completed
    • walen
      walen about 6 years
      @Veeram I found the issue, see this comment. I'm torn between making one of those despised "EDIT: I solved it, it was because of X" edits, and writing some kind of answer about logging levels and how to detect if some other thread is overwriting the changes.
    • s7vr
      s7vr about 6 years
      Glad you solved it but you should set some logging to allow you to debug the issue when it happens. What happens when you enable the logging on application side ? It should capture the logs when other thread overwrite the changes.
    • walen
      walen about 6 years
      @Veeram Both update operations were logged both at the JS client and at the Java server, but each log showed only the "relevant" modified fields, so I didn't realize the problem until I raised the DB log level.
    • Paizo
      Paizo about 6 years
      so the solution to unmask the problem was to put a conditional break point by filtering the bookId on the repository.save(BookData bookData) ? :D
    • walen
      walen about 6 years
      @Paizo I thought the problem was on MongoDB's side, so my solution was to raise the verbosity level (see my answer) and that's how I discovered the two updates. If I had already been suspicious that "two updates instead of one" on the Java side was the root cause, then I would've probably done as you say, and indeed the problem would have surfaced too since I would have seen two save calls coming from different endpoints. Hindsight is 20/20.
    • Tezra
      Tezra about 6 years
      Since this question is resolved, you should accept an answer so that others know it has been resolved.
    • walen
      walen about 6 years
      @Tezra Well, it is solved for me indeed, but (as you yourself said) other users who find this question might have a different problem. 11 people voted the question, but only 2 voted my (rather specific) answer, and only I voted yours (even though I had already figured out the thread problem by the time you mentioned it): maybe none of the answers is good enough for the community? Hence why the question and its bounty will stay open til the grace period ends, in case anybody posts a more complete / general answer (or enhances a current one ;). Don't rush things that need not be rushed.
    • Tezra
      Tezra about 6 years
      You accept based on what helped you personally. You can wait till the bounty expires, but just accept on your own judgement, not the communities. Others with a similar issue (if it ever exists) can open their own question for that scenario. :3
  • walen
    walen about 6 years
    Thanks for answering. I get that MongoDB's "eventual consistency" might lead to some dirty reads now and then when operations are just a few seconds apart. However, what I'm seeing is that changes are never persisted. I just did an update, waited for 15 minutes before querying the DB again, and the changes weren't anywhere to be seen. Surely 15 minutes should be enough for MongoDB to propagate changes in a non-sharded, single instance config?
  • walen
    walen about 6 years
    (BTW both bottom links are purple to me, checked them last week. The second one was very interesting but none of what it says helped, since that "flush every 60 s, journal every 100 ms" doesn't seem to match the behavior I'm seeing. But thanks anyway!)
  • Tezra
    Tezra about 6 years
    @LuisG. I imagine for most people who find this question, this will be their actual problem. In your case, most likely not. Of course, there are a lot of things it COULD be. If it's not a flush issue, do a properties equal compare of the pre-post save objects to verify they are the same (ie, the ID didn't change or anything), and run a pre-post dbstats (docs.mongodb.com/manual/reference/command/dbStats) to see if ANYTHING changed in the DB. If you are multi threaded, it is also possible another thread is trampling your changes.
  • walen
    walen about 6 years
    "If you are multi threaded, it is also possible another thread is trampling your changes." I increased MongoDB's systemLog.verbosity to 3 and after careful inspection, that seems to be the actual problem. Some other thread is updating the same document barely 500 ms after my update, but with different changes. So my changes are being overwritten almost instantly and that's why I cannot see them in the next call to findById() nor by running db.bookData.find() in Mongo Shell. Oh well.
  • walen
    walen about 6 years
    The question about why mongod.log stopped showing the updates is still open, though. Might have something to do with slowms in logging configuration (as in, the queries I was seeing were slower than slowms and that's the only reason I was seeing them at all, the default being not to see any queries unless verbosity is increased).
  • Tezra
    Tezra about 6 years
    @LuisG. The default log configuration is 100+ms is a "slow" query, and the logger will only log "slow" queries. (explained here docs.mongodb.com/manual/reference/method/db.setProfilingLeve‌​l) To log all queries, you need to change logLevel from 0 (min), to 5 (max) (docs.mongodb.com/manual/reference/log-messages/… (Set query verbosity)). Unless you changed the default settings, this is what you are seeing.
  • walen
    walen about 6 years
    Yes. Two promises from the JS client invoked two different endpoints from the Java backend which in turn made two findById at the same time. Each then made their own modifications and called save, the last one to do so effectively overriding the first.
  • Tezra
    Tezra about 6 years
    @LuisG. Good. ^_^ I've updated the answer to include the multi-thread safety case for future readers so they don't have to dig through the comments, and since that was the actual answer for your question. Glad I could be of help. :3
  • tinesoft
    tinesoft almost 5 years
    You could also make use of Optimistic Locking (see docs.spring.io/spring-data/mongodb/docs/current/reference/ht‌​ml/…) to prevent those conncurent updates. Basically, you'll need to add a 'version' field of type 'long' in your BookData entity, annotated with @Version, and Spring Data will take care of the rest for you!
  • Khanna111
    Khanna111 over 2 years
    If you would like to prevent dirty reads, non-repeatable reads, use Transactions. If T1 reads a document D1 and then T2 also reads and updates before T1, then T1 would get a write error.
  • walen
    walen over 2 years
    @Khanna111 MongoDB transactions didn't exist back in April 2018. They were introduced in MongoDB 4.2 -- I was using MongoDB 3.2.6.
  • Khanna111
    Khanna111 over 2 years
    k, ic. Thanks. I'll leave the comment on as a placeholder for a possible current solution along with your comment on the version that introduced transactions.