Cascaded Deletes (again)

Jakob_Hatzl · August 22, 2022, 1:03pm

I’ve initially asked the question about cascade-deleting in the google group more than 2 years ago and come a log way since then (including working with @Morlack at a similar cascading use-case in a 1-on-1 session during AxonIQ consultancy, which - I think, if I guess it right - he also indirectly references in his post ).

I’m coincidentially at the moment working on refactoring some patterns of the cascade deletion I created back then, so this is all quite fresh in my mind. There are two things I like to mention on the discussion:

Regarding the last question of @Morlack (separate child aggregates vs aggregate members): for us it was also the performance requirement, as 1 project could contain 100s of images and moreover 1 image 10.000s of annotations which would add up to a single aggregate having a huge number of events pretty fast. Since @Morlack you state that this level of performance requirement is very rare, can this be a sign of a bad design/modeling of the domain? And to the whole Axon community: how do you handle large parent-child relationships with Axon Framework differently? Are there other patterns how to model large parent-child relationships properly in DDD?

The second thing is about propagating commands down a tree using a command → event → command propagation: We encountered some serious issues with that pattern.

Every time a command is sent from within an existing unit of work, the cleanup phase of that command is attached to the UoW and (if routed to the same application instance) the parent unit of work is only finished when all child UoWs are finished - this blocked the first parent for the whole process. This hit us fully, because we were using only a single aplication instance and CommandGateway#sendAndWait for propagating - but even if you’re using multiple application instances, you cannot know which id-‘segments’ are processed on which instance. We ran out of database connections for the axon framework db connection pool pretty fast because of this behaviour when processing a large tree of objects. To be honest, I think I never fully understood everything that went on in the depths of the framework regarding that part - as of now we refactored that whole cascading use-case and avoid the pattern at all.
there is a 10k limit on the command queue (at least for axon server standard edition we experienced this), so your commands will be rejected if the queue in axon server grows beyond that limit (most probably if the processing side is not fast enough). As a workaround we throttled command sending like suggested here Perfomance tuning initial load from large context - #3 by allardbz
be aware that when sending a large number of commands you need to make sure that the opposing side must be fast enought to process them as well, otherwise you’ll get command-timeouts (defaults to 5min I think) - however I’m not sure if the commands really get cancelled if already fetched by the processing side, or if timeout is only reported back to the sending side from axon server (which i think is true).

I’m currently in the middle of outlining our approach for cascade deleting child aggregates and if I come to a final conclusion would be happy to share it here. Roughly I plan to validate creating child aggregates against a command model (check if the parent exists) and once an aggregate is deleted, mark it and all children as deleted in this same command model to block creating more children of the deleted parent or any child below. Then I can simply collect a flat list of all child entities and send a (cascade-)delete command to each one.

@rhubarb, I am very courious about how you solved the cascade-deletion finally and would be happy if you would share it.

Best Regards,
Jakob