Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

at the moment I develop a Spring Boot application which mainly pulls product review data from a message queue (~5 concurrent consumer) and stores them to a MySQL DB. Each review can be uniquely identified by its reviewIdentifier (String), which is the primary key and can belong to one or more product (e.g. products with different colors). Here is an excerpt of the data-model:

public class ProductPlacement implements Serializable{
   private static final long serialVersionUID = 1L;
   @GeneratedValue(strategy = GenerationType.AUTO)
   @Column(name = "product_placement_id")
   private long id;
   @ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy="productPlacements")
   private Set<CustomerReview> customerReviews;
public class CustomerReview implements Serializable{
   private static final long serialVersionUID = 1L;
   @Column(name = "customer_review_id")
   private String reviewIdentifier;
   @ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
   @JoinTable(
        name = "tb_miner_review_to_product",
           joinColumns = @JoinColumn(name = "customer_review_id"),
           inverseJoinColumns = @JoinColumn(name = "product_placement_id")
   private Set<ProductPlacement> productPlacements;

One message from the queue contains 1 - 15 reviews and a productPlacementId. Now I want an efficient method to persist the reviews for the product. There are basically two cases which need to be considered for each incomming review:

  • The review is not in the database -> insert review with reference to the product that is contained in the message
  • The review is already in the database -> just add the product reference to the Set productPlacements of the existing review.
  • Currently my method for persisting the reviews is not optimal. It looks as follows (uses Spring Data JpaRespoitories):

    @Override
    @Transactional
    public void saveAllReviews(List<CustomerReview> customerReviews, long productPlacementId) {
        ProductPlacement placement = productPlacementRepository.findOne(productPlacementId);
        for(CustomerReview review: customerReviews){
            CustomerReview cr = customerReviewRepository.findOne(review.getReviewIdentifier());
            if (cr!=null){
                cr.getProductPlacements().add(placement);
                customerReviewRepository.saveAndFlush(cr);
            else{
                Set<ProductPlacement> productPlacements = new HashSet<>();
                productPlacements.add(placement);
                review.setProductPlacements(productPlacements);
                cr = review;
                customerReviewRepository.saveAndFlush(cr);
    

    Questions:

  • I sometimes get constraintViolationExceptions because of violating the unique constraint on the "reviewIndentifier". This is obviously because I (concurrently) look if the review is already present and than insert or update it. How can I avoid that?
  • Is it better to use save() or saveAndFlush() in my case. I get ~50-80 reviews per secound. Will hibernate flush automatically if I just use save() or will it result in greatly increased memory usage?
  • Update to question 1: Would a simple @Lock on my Review-Repository prefent the unique-constraint exception?

    @Lock(LockModeType.PESSIMISTIC_WRITE)
    CustomerReview findByReviewIdentifier(String reviewIdentifier);
    

    What happens when the findByReviewIdentifier returns null? Can hibernate lock the reviewIdentifier for a potential insert even if the method returns null?

    Thank you!

    to get rid of race conditions, either make saveAllReviews() synchronized or implement explicit locking based on the key of the review (property which is constrained). In our organization, we also need to deal with such situations. Over 3+ years of trying and testing we're unable to find a method better than locking by key... maybe there is another practice, and I'd also like to learn it. – Alex Salauyou Apr 1, 2016 at 12:44 Thank you for your response. Do you think there is a difference in making the method synchronized and locking the key (performance-wise) – JuHarm89 Apr 1, 2016 at 12:52 of course key-locking will be much more effective, because you can safely allow concurrent writes for different keys. But this approach will require implementation efforts. You may first try synchronized, then think about more advanced techinque if performance doesn't satisfy. – Alex Salauyou Apr 1, 2016 at 12:55

    From a performance point of view, I will consider evaluating the solution with the following changes.

  • Changing from bidirectional ManyToMany to bidirectional OneToMany
  • I had a same question on which one is more efficient from DML statements that gets executed. Quoting from Typical ManyToMany mapping versus two OneToMany.

    The option one might be simpler from a configuration perspective, but it yields less efficient DML statements.

    Use the second option because whenever the associations are controlled by @ManyToOne associations, the DML statements are always the most efficient ones.

    Enabling the batching support would result in less number of round trips to the database to insert/update the same number of records.

    Quoting from batch INSERT and UPDATE statements

    hibernate.jdbc.batch_size = 50
    hibernate.order_inserts = true
    hibernate.order_updates = true
    hibernate.jdbc.batch_versioned_data = true

    The current code gets the ProductPlacement and for each review it does a saveAndFlush, which results in no batching of DML statements.

    Instead I would consider loading the ProductPlacement entity and adding the List<CustomerReview> customerReviews to the Set<CustomerReview> customerReviews field of ProductPlacement entity and finally call the merge method once at the end, with these two changes:

  • Making ProductPlacement entity owner of the association i.e., by moving mappedBy attribute onto Set<ProductPlacement> productPlacements field of CustomerReview entity.
  • Making CustomerReview entity implement equals and hashCode method by using reviewIdentifier field in these method. I believe reviewIdentifier is unique and user assigned.
  • Finally, as you do performance tuning with these changes, baseline your performance with the current code. Then make the changes and compare if the changes are really resulting in the any significant performance improvement for your solution.

    While this all improves performance, no doubt, how it helps to avoid race conditions on concurrent find-insert cycles? – Alex Salauyou Apr 1, 2016 at 17:27 @SashaSalauyou That is true. This mainly address the performance aspect of the problem. For the race condition, I would incline towards the synchronized approach as but I am wondering if there could be any better way, but not sure at this point. – Madhusudana Reddy Sunnapu Apr 1, 2016 at 17:49 @MadhusudanaReddySunnapu Thanks for your input. I also thought about making the productPlacement the owner side of the relationship, but lets say one product has 2.5k reviews. Wouldn't that result in fetching 2.5k reviews for adding 10 to the set? Is it possbile to add items to a lazy loaded collection? – JuHarm89 Apr 1, 2016 at 18:50 @JuHarm89 yeah,would result in fetching all the reviews in that case. How about - Since the reviewIdentifier is manually assigned for the new Customerreview at some point in the code, we can add boolean isNew transient field in the Customerreview that will be set to true/false based on whether it is a new review or not. And while saving the reviews in the above code we could use HQL to perform insert review for new and make the mapping row. Looks more efficient think could solve the constraintViolationException as well.One downside though, is SLC is invalidated with HQL. – Madhusudana Reddy Sunnapu Apr 2, 2016 at 2:50

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.