一个Python多进程解析域名的例子
工作要求,需要知道上万个域名的解析IP,并判断指向是否正确。最开始想到的是Ping命令,但其结果不容易处理。经过一番查找,最终选择了socket.gethostbyname()
方法。一开始因为是普通的编程方法,一万多条数据处理下来居然花了好几个小时,效率很低。这时主要的瓶颈其实在于gethostbyname
,但一时没找到其他更好用的解析IP的方法。后来得到一个同事的启发,用Python的多进程处理,处理时间缩短了一大半,这样多多少少弥补了gethostbyname
的缺陷。完整案例如下(数据是虚假的):
需要判断的ip(txt格式,一行一个ip)
...
192.168.0.2
192.168.9.2
...
原始域名数据(txt格式,一行一个域名)
...
xxx.cn
xxxx.com
...
处理后的数据(txt格式,一行一个域名+ip+判断词)
...
xxx.cn 192.168.0.1 in
xx2x.cn 192.168.0.2 not in
xx3x.cn unresolved unresolved
...
处理程序如下:
#coding:utf-8
import socket
from multiprocessing import Pool
# IPs
ipList = []
with open("/path/to/ip.txt", "r") as fip:
for ip in fip.readlines():
ip = ip.strip()
ipList.append(ip)
def URL2IP(url):
url = url.strip()
# urlList = url.split("\t");
try:
ip = socket.gethostbyname("www." + str(url))
if ip in ipList:
tip = "in"
else:
tip = "no in"
except:
print url + " this URL 2 IP ERROR "
ip = "unresolved"
tip = "unresolved"
return url + "\t" + str(ip) + "\t" + str(tip)
if __name__ == '__main__':
# domains
allUrls = []
with open("/path/to/domain.txt", "r", encoding='utf-8') as urllist:
allUrls = urllist.readlines()
p = Pool(8) # 建议设置成CPU核数
resultList = p.map(URL2IP, allUrls)
p.close()
p.join()
# write the result to file
with open("/path/to/resolve.txt", "w") as resovelist:
resovelist.writelines("\n" . join(resultList))
print "complete !"
关于如何使用Python多进程,大家可以自行搜索。