top of page

Fabric Surge Protection Testing Notes

This Blog is a summary of some learnings from my first and second surge protection experiments.



what is so perfect about friday evenings?

🤓 you have the chance to kill your capacity, because it will recover until monday


...so I did one week after Surge Protection was released


Use-Case

Surge protection is meant to protect your capacity from being overloaded by background events, so users cannot use the data products anymore.


There are two paramaters in the Capacity Settings > Surge Protection




Setup

I set the settings to 75 % background threshold and 50 % recovery threshold

(after the test I understood that have originally misunderstood something about their interdependence)


I created a Dataflow Gen2 "Capacity Crasher" - that mostly contains of permutations that produce heavy load on the capacity.




First Steps

I started the Dataflow 3 times in a row with about 30 min runtime per run and the Capacity reached 75 %



Other background jobs started to fail with a message that told me about the Surge Protection Limits ✅ 



First Night Run

I expected the CU to recover fast and sceduled a pipeline for 2 AM to start the Dataflow again and again until 1 PM on Saturday - to get more test results


When I woke up, I noticed what you can see in the chart:


- the CU usage did not gone down in the night, neither it has gone up


- the starts of the Dataflow in the night failed, unfortunatley with no messages or useful logs



...and the most strange thing:


at 7:45 AM the Dataflow started successfully and ended about 30 min later



How could this be possibe when the load was already about 95% ?



🔍 I looked closer at the detail logs and noticed:


Even in the morning, the Dataflow runs from the evening were in the list with about 18% CU usage each.




Conclusions

Smoothing was active all the time, even when I started and the CU usage was under 75%


Every background job creates "recovery debt" and the 95% at 8:45 AM were mostly recovery usage, so the other background usage WAS under 75% and the Dataflow started


Background and recovery must be seen separated and add up when it comes to total consumption. Even if the recovery is in the background, it is not part of background jobs when it comes to CU consumption.


...before the experiment I had expected the recovery to start when the load gets over 75% and have expected the 50% to be a part of the 75%


but it turned out that the 75% and 50% add up and the usage still can get over 100%


...at least this is my conclusion - I appreciate feedback if anyone has other ideas 🙂 



Key Learning

If you want to save CU for interactive events, the sum of background and recovery threshold shall be less than 100%




Second Night Run

To verify my conclusion I did a second run with the 60% / 30% Setting.


Above 90% the new runs were rejected, keeping the CU consumption of the capacity under the 100% line.





 
 
 

1 comentario

Obtuvo 0 de 5 estrellas.
Aún no hay calificaciones

Agrega una calificación
Gabriel Melo
14 abr

Hallo. I would like to know if you managed to find a good Background Rejection -> Background usage relation. In my power bi capacities i set 50% on surge protection to limit background to 80%

Me gusta
bottom of page