I’m trying to reconstruct the transaction detail r...
# suiteql
a
I’m trying to reconstruct the transaction detail report for a given period using the query in 🧵, but I’m getting way more line items back than what shows up if I just pull the report directly from the UI. Does anyone know why? The main difference comes from a greater than expected number
CustInvc
type rows, which makes me think the query is pulling in draft objects that the transaction detail hides. How can I exclude those?
Copy code
SELECT
    tl.id as transaction_line_id,
    tal.amount,
    tal.account,
    BUILTIN.DF(tal.account) as account_name,
    t.id,
    type,
    trandate,
    tranid,
    t.memo as memo,
    trandisplayname as name,
    currency,
    tal.exchangerate,
    t.postingperiod,
    tl.subsidiary,
    BUILTIN.DF( t.status ) AS status_description,
FROM transactionLine tl
LEFT JOIN transaction t
    ON tl.transaction = t.id
LEFT JOIN transactionAccountingLine tal
    ON (tl.id = tal.transactionline AND tl.transaction = tal.transaction)
where t.posting = 'T' and postingperiod = '152'
order by transaction_line_id
c
What do you mean by "draft object"? Are you seeing multiple tal.ids per tl.id?
a
Multiple `tal.id`s per
t.id
are expected, I actually am not counting `tal.id`s (you can see I am not selecting them) but I could. By draft I mean transactions that have not actually posted
c
Whether you're selecting
tal.id
you're getting back results from multiple
tal
lines, correct?
a
yeah, but that’s expected right? as in we want all line items in every transaction
c
It is expected, just checking to see whether this was contributing to the extra line count you're experiencing.
The environment I'm familiar with working in does not have any kind of Draft status for customer invoices -- what's that about?
a
I am not sure - that’s what one of the accountants on the team mentioned could be a reason for the discrepancy
c
Can you compare the difference between the number of lines you expect and the number you're getting?
Like you expect to see x but you get y?
a
yeah, It’s like 100k over, and most of them come from invoices, as mentioned. A few types match 1-1. data counts by type from txn detail:
Copy code
Bill	2213
Bill Credit	50
Bill Payment	1610
Credit Memo	16471
Currency Revaluation	166
Customer Refund	2080
Deposit	103188
Invoice	279945
Journal	14831
Payment	182136
Transfer	20
Grand Total	602711
vs from the API:
Copy code
CustCred	22206
CustInvc	374884
CustPymt	182134
CustRfnd	2082
Deposit	103188
FxReval	166
Journal	14829
Transfer	18
VendBill	2038
VendCred	32
VendPymt	805
Grand Total	702382
If I create a composite key of the t.id and t.transaction_line_id I can tell there are some uniqueness problems here, too… total rows = 702382 unique rows = 700753
t
Is it possible that some of the transaction accounting lines aren't posting? If so, then adding this additional filter might help:
AND ( TAL.Posting = 'T' )
👀 1
a
That helped a lot @tdietrich! down to
603422
though still not there… I am not running another batch with
tl.posting = 'T'
to see if that gets rid of the remaining ones, since ultimately this could come from simple join explosions
Which by the way makes me wonder… could this be further optimized by using CTEs? I tried a couple of things and kept getting syntax errors so ultimately decided to let it be. Adding these `and`s to the
where
clause made it go from ~25s/1k records to ~35s/1k Are CTEs and their behavior in SuiteQL well documented anywhere?
t
Glad that helped. You might also want to filter out voided transactions:
Copy code
AND ( T.Voided = 'F' )
AND ( T.Void = 'F' )
I don't know if CTEs will be of much help with what you're trying to do. I would optimize your query by doing a few simple things, such as starting with the Transaction table and using INNER JOINs to get the related transaction lines and the transaction accounting lines. For the WHERE clause, add the most restrictive filters first. For example, I'd specify
postingperiod = '152'
first.
❤️ 1
We don't have access to query execution plans, so optimizing queries is tricky.
a
Yeah, I did some research around that, too… my guess was that filtering in CTEs would reduce both sides of the join, but then again no query plan
filtering on
tl.posting
actually didn’t make any difference, by the way
t
If you get desperate, use SELECT DISTINCT. Not ideal, but sometimes it comes to that.
a
flipping the filter order and moving to `INNER JOIN`s did seem to speed things up, back to ~20-25 seconds
@tdietrich I ended up taking your idea of running the
DISTINCT
(I do want to get to the bottom of it, but wanted to get the ball rolling…) and interestingly I got less results as compared to without the distinct, but I still got some dupes. It looks like there’s missing values on both sides. Is it possible that I’m hitting some kind of internal cache limit? Maybe it’s because I’m doing an order by
transaction_line_id
and those are not granular enough? I am using your rownum pagination trick, so maybe there’s something going on with the cursor at the boundaries? I spun up another one ordering by transaction id, but that didn’t seem to fix it either.
Still haven’t really closed the loop all the way here, but as a heads up for folks, one of the issues I was getting was that lines are being duplicated with different formatting… so for example getting back a row with the transaction date =
2023-03-10
and another with
2023-3-10
which then I have to dedupe… adding
to_char(trandate, 'YYYY-MM-DD') as transaction_date,
dropped the error rate quite a bit because when deduping those rows get deduped correctly. Now I’m down to the last .1% of errors, which are mostly AR lines missing on payment records… All of these are a single line missing in a two line transaction. Feels almost like an off by one.
@tdietrich I found the bug, and it was an off by one hiding in plain sight in your
ROWNUM
pagination recommendation. I was iterating 1k records at a time, so going
BETWEEN 0 AND 1000
to
BETWEEN 1000 and 2000
etc. This causes an off by one when the 1000th is the first line in a two leg transaction, which then gets skipped by the cursor. The solution is to paginate instead from 0 to 999 and from 1000 to 1999, ensuring you pull 1k records on each run, and no duplication. Also, FWIW seems like this trick breaks once you get to 1M records in a query, after that you have to introduce additional partitioning.