What is the best way to handle this case?... a scr...
# suitescript
m
What is the best way to handle this case?... a script with multiple deployments, all accumulating certain amounts from 100K records and saving it in the same journal? Right now, they are overriding each other. Is there something similar to transactions in NS? Or do you recommend using N/cache? (which I fear a little bit)
r
Wouldn't optimizing the script and using a single deployment an option ? What exactly happens in each stage can you elaborate?
m
That's what we were doing initially, but it was taking a lot of time. The script goes over a record called shipment, and the amounts of the shipment that get calculated and added to the journal (we create one journal per customer per day) differ based on the type of customer and other factors. So what we are doing now (we actually rolled it back, but this was the idea)... is creating a deployment for customers with odd internal IDs, and another for the ones with even IDs, and we do that with shipments IDs too. But we kind of forgot that the deployments will keep overriding each other while saving to the same journal.
We just thought of a solution... We will just divide the customers into groups. And create a deployment for each group. No deployment would override the other in this way. We will just have to divide the groups so that the average shipments of the customers in all groups per day are closer to each other. This will also scale well, without the need to handle any parallel calculations.
a
you haven't even told us what type of script this is? are you using a scheduled script? a map reduce?
its not obvious to me what is taking the time and making it slow... 100k read operations, and single write operation? how long is the script taking? what is "too slow" ?
m
Sorry, it is a map/reduce. (the one that creates the journals) The script fetches 400 shipments... maps them to customers, calculates the amounts, and upserts the journal. There are also around 5 scheduled scripts that usually run at the same time, some of them process new shipments (just creating them, not calculating anything), and some of them send data to the client's server. All of that usually took around 8 hours. Now the number of shipments sometimes is more than 200K per day, and the script that aggregates the amounts runs all day... and still some shipments do not get processed, and we have to change their dates to get processed the next day. (using another scheduled script)
r
How many journals it's creating updating ? Create/update of those journals is happening in map or reduce stage ? I feel like you are over complicating your solution. And ideally it could happen in a single map reduce script deployment and should not take longer than an hour (depending upon the volume of transaction affected and not on the number of customers you have to deal with). Unless other things are slowing the operations down.
a
8 hours? what? that's insane something you're doing is wildly inefficient
m
It is straightforward. Can I share it in private and take some of your time to take a look at it?
a
no thanks, its my Sunday, I'm not volunteering for work 😄
m
Still thankful... I will try to find the issue.
I tried to check... in a certain run, it fetched 134 shipments, and ended up upserting 62 journals, in 3 minutes.
Will do some debugging...
a
wait i thought you said it only ever created 1 JE?
I guess I misunderstood
m
No, 1 journal for each customer. Customers here are the companies that our client delivers their shipments. So they used to deliver or ship around 100K shipments per day, and now it is more than the double. So 200K shipments,... I don't remember how many customers... let's say 5K... we need to process the shipments and create a journal for each customer.
The script in a single run can't handle this amount for sure. So we have to accumulate it.
a
why does each customer need its own JE? customer is at the line level on a JE, you can have multiple customers per JE
m
Would that make a difference?
That would be even harder.
And you can't split the script into multiple deployments anymore.
a
I'm wondering if you could do 1 JE per shipment - with multiple customers, and then just have it be a UE on whatever this shipment record is instead of a a scheduled MR process
m
That's what we were doing at the beginning, but we started to hit the transaction line limit.
a
the transaction line limit on a JE? with one JE per shipment?
m
The tier lines limit. With a journal for each shipment, you get around (200K * 5) lines With a journal for each customer, you get around (5K * 5) lines
Way less
a
right account tiers, wasn't thinking of that
m
Do you still think that this is slow?
Copy code
it fetched 134 shipments, and ended up upserting 62 journals, in 3 minutes
a
not at all, I was thinking a single write operation... you have 62 write operations, that's actually pretty performant
m
Great! You made me doubt everything. 😅
😂 1
a
sorry!
m
No... thanks very much for your time. Hearing another opinion about it is always good.
a
so what are these JEs doing from a business/accounting perspective? moving funds associated with shipping costs from one account to another?
m
I am not aware of all of it. But it records how much money they should keep (shipping costs) and how much money they should send back to the customers. And maybe some other stuff that I am not aware of.
btw, the script doesn't upsert journals only, it also adds the result of the calculation on the shipments. So in our case, in those 3 mins, it wasn't only 62 writes, it was around 190.
a
okay then yeah that's performing well it seems, I don't think you're gonna gain efficiency by changing the code itself. I'd want to take a step back and look more broadly at the whole process... what are we trying to do? is this approach the best way to handle it?
I'm not sure that will give you anything either, it really sounds like its just a bitch of a process, but maybe when you look at the bigger picture you realize - oh hey we thought we need this level of detail, but we actually don't we can summarize instead and not do all this work at all... or something
but you'll be getting into tradeoff territory most likely, which means convincing some business person that they dont actually need something they currently insist that they need 😉
m
Yeah, exactly. But this is more of functional work than technical work. But I remember telling them at the beginning to keep these calculations on their system, and send us only the results... they refused, they want NS to do this work. When you ship something through a shipping company, the shipping company takes the money from the buyer and returns it to you. And it may cut their cost from this money, or give it all back to you, and then ask for their costs. That depends on the contract. They also issue these payouts based on the contracts. So some customers want their money the next day. And some want it once per week, etc. And there are other factors too. You need to create journals based on these factors and decide how much money should be sent to customers today. Make the back transfer. And send the information back to the client's system so the customer can see on their dashboards that their shipments got delivered, and the bank transfer was made, and it is linked to these specific shipments.
This is a summary of this process. There are other factors too, as I mentioned, like whether they were delivering something or the buyer was returning something? Did the buyer pay online?... these are conditions that affects the journals at the end.
I wonder whether N/cache would help to make it even faster or not. I usually think twice before I rely on cache.
a
depends what you're doing... if you're querying the same thing over and over somewhere then yeah putting in cache will make the access to it much faster
m
Important A cached value is not guaranteed to stay in the cache for the full duration of the ttl value. The ttl value represents the maximum time that the cached value may be stored. Cached data is not persistent, so you should consider using the Cache.get(options) method with the options.loader parameter to set and retrieve data.
Ignore it 🙃
...
a
that's just saying the best practice for cache is to get it from the cache but if its not in the cache, do the search and load it INTO the cache... you'd never NOT setup cache that way so its weird they express it in the help like that
like its some kinda gotcha
m
depends what you're doing
Yeah, it could be useful later if I wanted to speed up some parts, but not for the solution I had in mind right now. I was thinking of relying 100% on it. Not to save the journal until all the accumulation ends... not gonna work I guess.
Hmm, got it.
Makes sense.
a
oh wow yeah I would only ever store search/query results in cache to make them quickly accessible, I wouldn't write and update the values to it mid process
m
Got it.
Thanks ❤️
Will keep it in mind, and maybe use it in a different way latter to speed up small parts of the code.
👍 1