Hey smart people I have a question about a Map Reduce I have NetSuite Professionals #suitescript

Hey smart people, I have a question about a Map Re...

jen

03/23/2023, 9:35 PM

Hey smart people, I have a question about a Map Reduce. I have a list of about 70,000 things that I need to compare nightly with another list of about 1000 things, using some fuzzy string matching logic. The second list (of 1000 things) is a record type that has a couple different fields to compare against the bigger list. I’m thinking of using a MR to go through each entry in the big list, but would rather only retrieve the smaller list once (rather than retrieve it from database each time for each of the 70,000 things). Is there a way to do this, get that smaller list once and just pass it around the MR? I suppose I could dump it into every entry of the key-value pairs but that seems….extreme.

Anthony OConnor

03/23/2023, 9:39 PM

you could maybe use the cache module. I've never used it for anything like a list of 1000 things so I'm not sure what limitations it might have, but that's the first thing I'd look into.

👍 1

Anthony OConnor

03/23/2023, 9:43 PM

most of my other suggestions would be around make the list sizes smaller 🙂 which isn't what you asked for, and I don't know enough about your data to know how viable that would be, presumably you've thought of this already.

Edgar Valdés

03/23/2023, 9:45 PM

Can't we use a global variable in the map reduce script?

Anthony OConnor

03/23/2023, 9:45 PM

nope, if you have a global it will only be available in the GIS and the summary stages, not the map or reduce stages

Anthony OConnor

03/23/2023, 9:48 PM

could you use a script param? get the list of ~1000 data in a scheduled script or another MR, and then call your main MR with the stringified data as a script param?

Anthony OConnor

03/23/2023, 9:52 PM

oh and your initial "extreme" solution, I think is totally fine. I've done similar before, but again never that size I don' think. so not sure at what point you'll run into memory issues

Anthony OConnor

03/23/2023, 9:56 PM

also... why not just do 70k db reads? reads are cheap compared to writes, i know it seems inefficient but if this is an overnight batch process, not sure that would be an issue... actually if the 1000 list can be gotten from search, you could create the search in the UI, I think NS automatically optimizes those searches based on usage so yours would likely qualify 😄

jen

03/23/2023, 10:06 PM

I’m doing all the searches w/SuiteQL

jen

03/23/2023, 10:06 PM

the 70k isn’t actually 70k records, it’s a UNION of three different SELECT DISTINCTs that has about 70k results

jen

03/23/2023, 10:15 PM

I’m thinking

N/cache

might be the way to go, though this would be my first try with that module. If I can cache my 1000 list at the start that should work, I think.

👍 1

Anthony OConnor

03/23/2023, 10:18 PM

its been a while, but if i remember rightly there's limits on the # of keys you have in a cache, but you can do something really dumb like nest it, and you can have as many as you want... something goofy like that i remeber, NS may have "fixed" it

tech_ph2019

03/24/2023, 12:35 AM

make sure to also use the loader function when getting the cache so that it would repopulate the cache if ever it returns null. Also take note of the max size limit(500kb) of the cache you are putting.

jen

03/24/2023, 4:51 PM

Thanks guys

Watz

03/24/2023, 8:16 PM

My experience is that a single map instance shares the global variables with each iteration until it yields. Load the list in the beginning of the Map stage and it can probably be used many times until it yields for out of governance points/time. Regarding fuzzy search, we've been using string similarity with good results.

Anthony OConnor

03/24/2023, 9:51 PM

...if you're loading the list at the beginning of the map stage, and there's 70,000 map stages, you're going to load the list 70,000 times?

Watz

03/24/2023, 9:51 PM

Watz

03/24/2023, 9:51 PM

One map instance will run multiple key-value pairs.

Anthony OConnor

03/24/2023, 9:52 PM

how do i write code to execute in the map instance, but not for each key-value pair?

Watz

03/24/2023, 9:55 PM

We have this function in a map/reduce.

fetchEmployees()

is called in the beginning of the map-function and then we access the EMPLOYEE_LIST throughout the map-instance.

Copy code

const EMPLOYEE_LIST = {
    fetched: false,
    employees : [],
}
const fetchEmployees = () => {
    if(!EMPLOYEE_LIST.fetched) {
        search.create({
            type: search.Type.EMPLOYEE,
            filters:[ ['isinactive', <http://search.Operator.IS|search.Operator.IS>, 'F'] ],
            columns: [ EMPLOYEE.FIELDS.ENTITYID ],
        }).run().each(result => {
            const entityId = result.getValue(EMPLOYEE.FIELDS.ENTITYID) as string
            EMPLOYEE_LIST.employees.push({
                id: result.id,
                [EMPLOYEE.FIELDS.ENTITYID]: entityId.toLowerCase(),
            })
            return true
        })
        EMPLOYEE_LIST.fetched = true
    }
}

Anthony OConnor

03/24/2023, 9:58 PM

so your

EMPLOYEE_LIST

declaration is outside the map? and when a new instance is created it will reset the global to empty?

Watz

03/24/2023, 9:59 PM

Yes

Watz

03/24/2023, 10:00 PM

Technically, I'm thinking that when a new instance is created, the global doesn't exist in that context, so it's not reset per-se.

Anthony OConnor

03/24/2023, 10:01 PM

right, what I mean is, the declaration will be ran again in the instance...

Watz

03/24/2023, 10:01 PM

Yes, once per instance.

Anthony OConnor

03/24/2023, 10:02 PM

good to know, presumably that works in the reduce stage too

Watz

03/24/2023, 10:02 PM

I'd assume so

Open in Slack

Previous Next