18

So I wrote a script that accesses a bunch of servers using nc on the command line, and originally I was using Python's commands module and calling commands.getoutput() and the script ran in about 45 seconds. Since commands is deprecated, I want to change everything over to using the subprocess module, but now the script takes 2m45s to run. Anyone have an idea of why this would be?

What I had before:

output = commands.getoutput("echo get file.ext | nc -w 1 server.com port_num")

now I have

p = Popen('echo get file.ext | nc -w 1 server.com port_num', shell=True, stdout=PIPE)
output = p.communicate()[0]

Thanks in advance for the help!

yasar
  • 13,158
  • 28
  • 95
  • 160
thedrick
  • 211
  • 1
  • 2
  • 6
  • 1
    how long does the command take, when you run from the shell? – wroniasty Jun 04 '12 at 21:49
  • As long as the server doesn't time out, the result is almost instantaneous real 0m0.274s user 0m0.071s sys 0m0.134s – thedrick Jun 04 '12 at 21:55
  • do other commands like 'ls', 'uname -a' etc. also take a long time to execute with the subprocess module? – wroniasty Jun 04 '12 at 21:58
  • No, not from what I am seeing. Also when I run the same things (inside the python interpreter) as above, but with time in front, subprocess is actually a bit faster. Not sure why it would be slower once incorporated into my code... – thedrick Jun 04 '12 at 22:01
  • 1
    it seems the problem is not with the subprocess module then... – wroniasty Jun 04 '12 at 22:03
  • Yeah, the only thing that is a little strange is that when I run Popen in the interpreter, it hangs and waits for me to hit enter before returning to the ">>>" prompt – thedrick Jun 04 '12 at 22:04
  • The real question is why aren't you using the socket module to communicate with the server instead of shelling out to nc? – Kamil Kisiel Jun 04 '12 at 23:41
  • @KamilKisiel it's the UNIX way :) – wroniasty Jun 05 '12 at 08:03

2 Answers2

20

I would expect subprocess to be slower than command. Without meaning to suggest that this is the only reason your script is running slowly, you should take a look at the commands source code. There are fewer than 100 lines, and most of the work is delegated to functions from os, many of which are taken straight from c posix libraries (at least in posix systems). Note that commands is unix-only, so it doesn't have to do any extra work to ensure cross-platform compatibility.

Now take a look at subprocess. There are more than 1500 lines, all pure Python, doing all sorts of checks to ensure consistent cross-platform behavior. Based on this, I would expect subprocess to run slower than commands.

I timed the two modules, and on something quite basic, subprocess was almost twice as slow as commands.

>>> %timeit commands.getoutput('echo "foo" | cat')
100 loops, best of 3: 3.02 ms per loop
>>> %timeit subprocess.check_output('echo "foo" | cat', shell=True)
100 loops, best of 3: 5.76 ms per loop

Swiss suggests some good improvements that will help your script's performance. But even after applying them, note that subprocess is still slower.

>>> %timeit commands.getoutput('echo "foo" | cat')
100 loops, best of 3: 2.97 ms per loop
>>> %timeit Popen('cat', stdin=PIPE, stdout=PIPE).communicate('foo')[0]
100 loops, best of 3: 4.15 ms per loop

Assuming you are performing the above command many times in a row, this will add up, and account for at least some of the performance difference.

In any case, I am interpreting your question as being about the relative performance of subprocess and command, rather than being about how to speed up your script. For the latter question, Swiss's answer is better.

Community
  • 1
  • 1
senderle
  • 145,869
  • 36
  • 209
  • 233
  • Ah, thank you! I'm glad I'm not just going crazy. I'm on python 2.6 so I don't even have the option of using check_output. – thedrick Jun 04 '12 at 22:19
  • 2
    I suspect this is not actually the problem here. – Swiss Jun 04 '12 at 22:27
  • @Swiss, I agree that not using the shell would be better. I guess I was assuming that user1436110 was doing so out of necessity, and answered accordingly. Looking more closely, I see that that's probably not the case. But even after applying the improvements you suggest, `subprocess` is slower. See the new timings above. – senderle Jun 04 '12 at 23:11
  • 1
    @senderle: It may be true that subprocess is slower. However, a difference of 1 or 2 ms is extremely minor, and does not account for the 2+ minute difference the original question mentions. – Swiss Jun 04 '12 at 23:27
  • @user1436110, just to clarify: are you downloading large files? About what size? And are you calling `Popen` multiple times in your script? Is the runtime you mention in the comments the runtime for one call to `nc` or many? – senderle Jun 05 '12 at 02:45
20

There seems to be at least two separate issues here.

First, you are improperly using Popen. Here are the problems I see:

  1. Spawning multiple processes with one Popen.
  2. Passing one string in as args instead of splitting args.
  3. Using the shell to pass text to process rather than the builtin communicate method.
  4. Using shell rather than directly spawning processes.

Here is a corrected version of your code

from subprocess import PIPE

args = ['nc', '-w', '1', 'server.com', 'port_num']
p = subprocess.Popen(args, stdin=PIPE, stdout=PIPE)
output = p.communicate("get file.ext")
print output[0]

Second, the fact that you suggest it ends faster when manually run than when run through subprocess suggests that the issue here is that you are not passing the correct string to nc. What is probably happening is that the server is waiting for a terminating string to end the connection. If you are not passing this, then the connection probably remains open until it times out.

Run nc manually, figure out what the terminating string is, then update the string passed to communicate. With these changes it should run much faster.

Mel
  • 5,837
  • 10
  • 37
  • 42
Swiss
  • 5,556
  • 1
  • 28
  • 42
  • 2
    Is there a reason for 2? Why is the string version deemed inefficient? – Mooncrater Jun 16 '18 at 16:24
  • 1
    Hi @Mooncrater, I'm not the OP but I believe the recommendation for 2) is because you want to avoid using Shell=True. So you need to add the command as a list, rather than string. – AllynH Jan 15 '19 at 16:09