It's been a while since I asked this question. To simplify, I just want a lifecycle configuration in AWS SageMaker which can successfully install a private GitHub repo.
I'm trying to install a private github repo with a bash script. The script does the following:
- makes sure there's an ssh agent active
- adds the ssh key from a persistent portion of memory
- attempts to install the github repo
This is all happening in a SageMaker AWS EC2 instance via a lifecycle configuration. The implementation looks something like this:
HOME=/home/ec2-user/
ENVPIP=$HOME/anaconda3/envs/tensorflow2_p36/bin/pip
eval "$(ssh-agent -s)"
ssh-add ${HOME}SageMaker/Setup/id_rsa
yes | $ENVPIP install git+ssh://git@github.com/...
Running this, I get the following error:
ERROR: Command errored out with exit status 128: git clone -q 'ssh://****@github.com/...' /tmp/pip-req-build-ysacff_l Check the logs for full command output.
Here's all the pertinent output from cloudwatch:
Agent pid 5146
Identity added: /home/ec2-user/SageMaker/Setup/id_rsa (/home/ec2user/SageMaker/Setup/id_rsa)
2020-09-07T17:11:00.605-04:00
Collecting git+ssh://****@github.com/********1/*****-*****Library
Cloning ssh://****@github.com/********1/*****-*****Library to /tmp/pip-req-build-ysacff_l
2020-09-07T17:11:00.605-04:00
Copy
ERROR: Command errored out with exit status 128: git clone -q 'ssh://****@github.com/********1/*****-*****Library' /tmp/pip-req-build-ysacff_l Check the logs for full command output.
looking into it, this seems like an issue with the cloning protocol, but I couldn't find anything pertinent to ssh.
P.s.
- running the same few lines in the terminal works
- I sanity checked the url to the repo, went right to it, so I don't think its a problem with anything after the
...
Updates:
- tried updating git with
yum install git
. Apparently my version is up to date, so doing this resulted in the same error. - I commented out the pip install so that the EC2 Instance would start up successfully, then ran
curl http://www.google.com
, which resulted in a bunch of html. So it appears, at least after the EC2 instance boots, outbound traffic is allowed. - running
curl http://www.google.com
within the bash script (lifecycle configuration, with the problematic code commented out) results in the same html output, and the instance started up perfectly. this leads me to believe that there is, indeed, outbound traffic allowed on instance startup - a lot of people have viewed this question, and no one has answered it. I'm not married to the specific way I'm trying to install the repo, so if there are any working alternatives I'll gladly take them.
- Is it possible that I'm encountering a race condition with some other system? this is happening close to when the instance starts. Are their any way to check that all dependent systems are running?
- while doing some other stuff, in console I got the same error. I reinitialized the ssh agent, added the key, and it worked. I wonder if it's a race condition between
eval "$(ssh-agent -s)"
andyes | $ENVPIP install git+ssh://git@github.com/...
?