0

Apologies in advance if this is a common question, I tried researching it but can't seem to find something that fits.

I have a query that pulls data the way I like but would like to add a parameter that will tell me only of any values that occur 5 times or more in a 60 second period;

select from_unixtime(dateTimeOrigination), callingPartyNumber,
        originalCalledPartyNumber, finalCalledPartyNumber, duration, origDeviceName, destDeviceName
 from cdr_records
 where (from_unixtime(dateTimeOrigination) like '2016-05-20%') and 
       (callingPartyNumber not like 'b00%') and
       (originalCalledPartyNumber not like 'b00%') and 
       (finalCalledPartyNumber not like 'b00%')
 order by originalCalledPartyNumber, dateTimeOrigination;

This query already filters for results in a specified day and orders the results the way I like, but it pulls everything. Can someone tell me how I can say, "only tell me about value originalCalledPartyNumber if it shows up 5 times or more in any 60 second period."?

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
Kimomaru
  • 993
  • 4
  • 14
  • 30

1 Answers1

1

If we want to filter out the rows where there aren't at least four preceding rows within the past 60 seconds, assuming that dateTimeOrigination is integer type, a 32-bit unix-style timestamp, we can do something like this:

SELECT FROM_UNIXTIME(r.dateTimeOrigination) AS dateTimeOrigination
     , r.callingPartyNumber
     , r.originalCalledPartyNumber
     , r.finalCalledPartyNumber
     , r.duration
     , r.origDeviceName
     , r.destDeviceName
  FROM cdr_records r
 WHERE r.dateTimeOrigination >= UNIX_TIMESTAMP('2016-05-20')
   AND r.dateTimeOrigination  < UNIX_TIMESTAMP('2016-05-21')
   AND r.callingPartyNumber NOT LIKE 'b00%'
   AND r.originalCalledPartyNumber NOT LIKE 'b00%'
   AND r.finalCalledPartyNumber NOT LIKE 'b00%'

   AND ( SELECT COUNT(1)
           FROM cdr_records c
          WHERE c.originalCalledPartyNumber = r.originalCalledPartyNumber
            AND c.dateTimeOrigination       > r.dateTimeOrigination - 60
            AND c.dateTimeOrigination      <= r.dateTimeOrigination
       ) > 4

 ORDER
    BY r.originalCalledPartyNumber
     , r.dateTimeOrigination

NOTE: For performance, we prefer to have predicates on bare columns.

With a form like this, with the column wrapped in an expression:

 WHERE FROM_UNIXTIME(r.dateTimeOrigination) LIKE '2016-05-20%'

MySQL will evaluate the function for every row in the table, and then compare the return from the function to the literal.

With a form like this:

 WHERE r.dateTimeOrigination >= UNIX_TIMESTAMP('2016-05-20')
   AND r.dateTimeOrigination  < UNIX_TIMESTAMP('2016-05-21')

MySQL will evaluate the expressions on the right side one time, as literals. Which allows MySQL to make effective use of a range scan operation on a suitable index.

FOLLOWUP

For best performance of the outer query, the best index would likely be an index with leading column of dateTimeOrigination, preferably containing

... ON cdr_records (dateTimeOrigination
    ,callingPartyNumber,originalCalledPartyNumber,finalCalledPartyNumber)

For best performance, a covering index, to avoid lookups to the pages in the underlying table. For example:

... ON cdr_records (dateTimeOrigination
    ,callingPartyNumber,originalCalledPartyNumber,finalCalledPartyNumber
    ,duration,origDeviceName,destDeviceName)

With that, we'd expect EXPLAIN to show "Using index".

For the correlated subquery, we'd want an index with leading columns like this:

... ON cdr_records (originalCalledPartyNumber,dateTimeOrigination)

I strongly recommend you look at the output from EXPLAIN to see which indexes MySQL is using for the query.

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • thank you for the reply. I've tried this query, it has been running for 4 hours. Is there, maybe, a query that will run a little faster? – Kimomaru May 22 '16 at 01:10
  • 1
    I recommend you get the EXPLAIN output, and make sure the execution plan is using suitable indexes. Answer updated with recommended indexes. – spencer7593 May 22 '16 at 03:19
  • the recommended indexes above don't seem to work (some of this pretty new territory for me. I've set up indexes in the past, but not like this). MySQL seems to not like r.duration or anything that is r. Is there a step I'm skipping? – Kimomaru May 22 '16 at 17:43
  • 1
    The "r." qualifier was mistakenly included in the last three columns of the covering index. (Those got copied from the SELECT statement.) I've corrected the statement. – spencer7593 May 22 '16 at 17:48
  • Ahhh, okay. Thanks, spencer. I've added the second index to the table (the corrected version) and am waiting for it to complete before adding the third. The first had already been added for other operations. Hopefully this will speed things up. – Kimomaru May 22 '16 at 17:57
  • Hi @spencer7593 - thank you for your response on this question. In my case, I'm running two indexes on the table and adding two new ones caused extreme slowness - hardware wasn't originally spec'd to do this much. We're offloading this query to another system. Thanks again for your help. – Kimomaru May 26 '16 at 17:31