We're seeing a massive performance drop in our Map...
# performance
m
We're seeing a massive performance drop in our Map/Reduce scripts this week. Going from a few minutes to over 30. It's all hanging up in the MAP phase. We didn't make any changes, but did get the 24.1 pre-patch update on Sunday. It's the only thing I can think of that is different. Anyone else seeing performance issues?
watching following 1
i
Did you get any resolution on this? Been having some issues with Scheduled and Map/Reduce scripts. I checked APM and there is an exponential performance difference from the week before and the week after our 24.1 pre upgrade patch.
m
I think I have something. Prior to patch, back in the day, they recommended that I only use one processor to avoid potential issues with dupes and 'others'. A contact of mine there reached out to the support team and suggested that we turn off (as in blank) the concurrency limit on the deployment and let them manage it. (yeah, that sounds frightening to me too). But tested it twice this morning so far, and we're back down to 5 minutes... So that seems to be the answer. At least it's worth trying.
i
Thanks, Our issue is a little different but seems like it may be related. Our scheduled and map/reduce scripts are getting stuck in the initially Pending/Wait for 5-15 minutes whereas previously they were closer to 5-15 seconds. we have enough processors so it's not an issue of them maxing out.
m
For us it wasn't maxing out, it was limiting. I have no evidence of this, but I think that if we assigned one processor and others were free, it used them. But now (again, no proof) it locks to the number of concurrent that we deployed against.
t
@Israel Gonzalez I've had this issue before during upgrades and my hypothsesis was that during upgrades the entire cluster for you and other people in that will have cpu downgraded because its being consumed by the upgrade Leading to capacity that is not being used that are stuck in pending with times 5-15 minutes form jobs date created and the start date of the job So you will get lower performance than usual
i
@Timothy Wong does this continue on after the upgrade or are you referring to the time while the upgrade is happening? our upgrade was two weeks ago and we are still seeing wait times of ~15 minutes. Already have a NS case open and is currently being investigated but still no update.