0

I have a DataFrame common_ips containing IPs as shown below.

df

I need to achieve two basic tasks:

  1. Identify private and public IPs.
  2. Check organisation for public IPs.

Here is what I am doing:

import json
import urllib
import re
baseurl = 'http://ipinfo.io/'    # no HTTPS supported (at least: not without a plan)

def isIPpublic(ipaddress):
    return not isIPprivate(ipaddress)

def isIPprivate(ipaddress):
    if ipaddress.startswith("::ffff:"): 
        ipaddress=ipaddress.replace("::ffff:", "")
    # IPv4 Regexp from https://stackoverflow.com/questions/30674845/
    if re.search(r"^(?:10|127|172\.(?:1[6-9]|2[0-9]|3[01])|192\.168)\..*", ipaddress):
        # Yes, so match, so a local or RFC1918 IPv4 address
        return True
    if ipaddress == "::1":
        # Yes, IPv6 localhost
        return True
    return False

def getipInfo(ipaddress):
    url = '%s%s/json' % (baseurl, ipaddress)
    try:
        urlresult = urllib.request.urlopen(url)
        jsonresult = urlresult.read()          # get the JSON
        parsedjson = json.loads(jsonresult)    # put parsed JSON into dictionary
        return parsedjson
    except:
        return None

def checkIP(ipaddress):
    if (isIPpublic(ipaddress)):
        if bool(getipInfo(ipaddress)):
            if 'bogon' in getipInfo(ipaddress).keys():
                return 'Private IP'
            elif bool(getipInfo(ipaddress).get('org')):
                return getipInfo(ipaddress)['org']
            else:
                return 'No organization data'
        else:
            return 'No data available'
    else:
        return 'Private IP'

And applying it to my common_ips DataFrame with

common_ips['Info'] = common_ips.IP.apply(checkIP)

But it's taking longer than I expected. And for some IPs, it's giving incorrect Info.

For instance:

wrong ip

where it should have been AS19902 Department of Administrative Services as I cross-checked it by

check ip

and

organisation ip

What am I missing here ? And how can I achieve these tasks in a more Pythonic way ?

Mohammad Zain Abbas
  • 728
  • 1
  • 11
  • 24
  • 1
    Please paste the data as text, not as images of text. – tripleee Feb 11 '19 at 07:54
  • "Organization" is not well-defined. You seem to be looking for the Autonomous System number ...? – tripleee Feb 11 '19 at 07:55
  • How are you calling the `checkIP` function? – tripleee Feb 11 '19 at 07:57
  • @tripleee I am calling/applying `checkIP` function on my `common_ips` df via `common_ips['Info'] = common_ips.IP.apply(checkIP)`. And to cross-check I have called `checkIP` function like `checkIP()` – Mohammad Zain Abbas Feb 11 '19 at 08:02
  • @tripleee I am sorry I didn't get it. What do you mean by 'Organization" is not well-defined ?? – Mohammad Zain Abbas Feb 11 '19 at 08:03
  • It's a dataframe? I see nothing in the code to support this. How are you reading it into the dataframe? Are you sure it doesn't have trailing control characters or something like that? – tripleee Feb 11 '19 at 08:04
  • 1
    An IP address simultaneously belongs to a number of enclosing netblocks. The AS is a well-defined subdivision of this, but certainly not the only way to interpret "organization". – tripleee Feb 11 '19 at 08:05
  • @tripleee `common_ips` is a DataFrame. I have only shared the code for IP address check here. And I am positive that it doesn't have trailing control characters as I have extracted these ips from the corpus with this regex `re.compile(r"\b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b")` – Mohammad Zain Abbas Feb 11 '19 at 08:13

1 Answers1

0

A blanket except: is basically always a bug. You are returning None instead of handling any anomalous or error response from the server, and of course the rest of your code has no way to recover.

As a first debugging step, simply take out the try/except handling. Maybe then you can find a way to put back a somewhat more detailed error handler for some cases which you know how to recover from.

def getipInfo(ipaddress):
    url = '%s%s/json' % (baseurl, ipaddress)
    urlresult = urllib.request.urlopen(url)
    jsonresult = urlresult.read()          # get the JSON
    parsedjson = json.loads(jsonresult)    # put parsed JSON into dictionary
    return parsedjson

Perhaps the calling code in checkIP should have a try/except instead, and e.g. retry after sleeping for a bit if the server indicates that you are going too fast.

(In the absence of an authorization token, it looks like you are using the free version of this service, which is probably not in any way guaranteed anyway. Also maybe look at using their recommended library -- I haven't looked at it in more detail, but I would imagine it at the very least knows better how to behave in the case of a server-side error. It's almost certainly also more Pythonic, at least in the sense that you should not reinvent things which already exist.)

tripleee
  • 175,061
  • 34
  • 275
  • 318