1

Can anyone give me some pointers with regard PHP command execution and best practice?

Im currently trying to parse some netbackup data, but i am running into issues related to the massive amount of data the system call is returning. In order to cut down the amount of data im retreiving I'm doing something like this:

$awk_command = "awk -F, '{print $1\",\"$2\",\"$3\",\"$4\",\"$5\",\"$6\",\"$7\",\"$9\",\"$11\",\"$26\",\"$32\",\"$33\",\"$34\",\"$35\",\"$36\",\"$37\",\"$38\",\"$39\",\"$40}'";
exec("sudo /usr/openv/netbackup/bin/admincmd/bpdbjobs -report -M $master_name -all_columns | $awk_command", $get_backups, $null);
foreach ($get_backups as $backup_detail)
    {
    process_the_data();
    write_data_to_db();
    }

Im using awk to limit the amount of data be received. Without it i end up receiving nearly ~150mb of data, and with it, i get a much more manageable ~800k of data.

You don't need to tell me that the awk shit is nasty - i know that already... But in the interests of bettering myself (and my code) can anyone suggest an alternative?

I was thinking of something like proc_open but really not sure if that is going to provide any benefits.

Mark V
  • 185
  • 2
  • 9
  • There is a considerable benefit to using `proc_open()` because you can process the data one line at a time in PHP and you don't need to load the whole 800K into memory at once. You would probably do better to use the simpler [`popen()`](http://php.net/popen) here since you don't need 2-way communication. I personally don't see anything wrong with using `awk` here if it is doing what you want - it will be much more efficient that doing the same job in PHP. Using a stream (from `proc_open()`/`popen()`) also enables you to use `fgetcsv()` to retrieve arrays instead of having to do it yourself. – DaveRandom Aug 02 '12 at 09:07

1 Answers1

1

Use exec to write the data to a file instead of reading it whole into your script.

exec("sudo /usr/openv/netbackup/bin/admincmd/bpdbjobs -report -M $master_name -all_columns | $awk_command > /tmp/output.data");

Then use any memory efficient method to read the file in parts.

Have a look here: Least memory intensive way to read a file in PHP

Community
  • 1
  • 1
DhruvPathak
  • 42,059
  • 16
  • 116
  • 175
  • I see no reason for dumping to disk, it just adds overhead. If `proc_open()`/`popen()` are available, why not just cut out the disk write/reads and read `awk`s STDOUT directly? – DaveRandom Aug 02 '12 at 09:09
  • @DaveRandom That would help in reducing peak memory usage done by the script. Its better to not have whole 800 K in memory, having 8 KB per line in a loop seems better inspite of added overhead of file seek. – DhruvPathak Aug 02 '12 at 09:10
  • I'd have to benchmark it, but I'm fairly confident keeping it all in memory will be noticeably faster, enough to make it worth doing. If we were talking about more data then I'd agree with you 100%, but 800k is nothing these days. After all, the server must have a reasonably large amount of spare memory when this is done - `awk` just managed to use 150MB! I'd rather just get the job done as quick as possible so the OS can have all its memory back, but there are many factors to consider I suppose - bus speeds, server load etc etc – DaveRandom Aug 02 '12 at 09:16
  • 1
    @DaveRandom Yes memory would not be a constraint. But to me it seemed OP was more interested in cutting down memory usage. That might help if there are multiple concurrent instances of this script running.However,if its a standalone or few instance script, then your suggestion makes more sense, it should do fine with any server with decent memory. – DhruvPathak Aug 02 '12 at 09:19
  • 1
    Agreed, and regardless, I think you deserve a +1 for your troubles :-) – DaveRandom Aug 02 '12 at 09:21
  • Your conversations have given me some (intelligent) food for thought, thanks. ill give both methods a go and see which is the most suitable ;) – Mark V Aug 02 '12 at 10:38
  • @MarkV Thats great. Do share your findings. – DhruvPathak Aug 02 '12 at 10:44
  • @DhruvPathak Ended up sticking with the above implementation. Discovered that in this instance, there is no point. The output isnt a stream, its just dumping a bucketload of output. I played around with setting each line to "null' after processing, in an effort to release resources as quickly as possible, but there little/no gain in it, so i just ended up cutting it out. – Mark V Mar 28 '13 at 06:03