Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am new to python coming from php so I'm having some difficulties decoding this json script

import json
import mechanize
# Create a list of extensions and tehir page numbrs
extensions = {'com':'512','net':'55','co':'21','org':'62'}
# run a loop through the extension assosciative array
for (ext, pages) in extensions.items():
# set up arrays and future variables
visited_urls = []
found = 0
member = 0
not_found = 0
repeated_url = 0
added = 0
# Set the loop for page numbers
page_number = 1
while page_number <= 1:
    #set the target url
    target = "http://punkspider.hyperiongray.com/service/search/domain/? searchkey=url&searchvalue=."+ext+"&pagesize=10&pagenumber="+str(page_number)+"&filtertype=A ND&sqli=1"
    br = mechanize.Browser()
    html = br.open(target).read()
    json_data = json.loads(html)
    for (key, val) in json_data.items():
        print val['id']
    page_number +=1

The target is just a json request the page makes according to the search query here is the json

{"data":{"numberOfPages":512,"domainSummaryDTOs":[{"id":"http://www.cdfdmy.com/","timestamp":"Tue May 14 12:59:28 GMT 2013","title":"【推荐】成都空压机|四川空压机|成都空气压缩机|四川空气压缩机|成都螺杆空压机|四川螺杆空压机|成都双螺杆空压机|四川双螺杆空压机|成都福道贸易有限公司","exploitabilityLevel":4,"bsqli":2,"sqli":2,"url":"http://www.cdfdmy.com/","xss":0},{"id":"http://www.chushijob.com/","timestamp":"Tue May 14 12:59:28 GMT 2013","title":"餐饮世界人才网-中国厨师人才网-中国酒店人才网","exploitabilityLevel":5,"bsqli":3,"sqli":2,"url":"http://www.chushijob.com/","xss":2},{"id":"http://www.hbenshi.com/","timestamp":"Tue May 14 12:59:28 GMT 2013","title":"恩施旅游网--恩施大峡谷 腾龙洞 利川 清江闯滩 土司城 欢迎您!","exploitabilityLevel":5,"bsqli":3,"sqli":4,"url":"http://www.hbenshi.com/","xss":1},{"id":"http://bbs.laiyb.com/","timestamp":"Mon Apr 29 03:30:09 GMT 2013","title":"莱阳论坛_莱阳吧_莱阳人的网络社区 -","exploitabilityLevel":4,"bsqli":4,"sqli":1,"url":"http://bbs.laiyb.com/","xss":0},{"id":"http://photostudio-town.com/","timestamp":"Mon Apr 29 03:30:09 GMT 2013","title":"フォトスタジオ・タウン-就職証明写真・お受験写真・オーディション写真-","exploitabilityLevel":5,"bsqli":1,"sqli":1,"url":"http://photostudio-town.com/","xss":1},{"id":"http://sp.sosfang.com/","timestamp":"Mon Apr 29 03:30:09 GMT 2013","title":"上海商铺出租/转让,上海门面房出租信息/上海门面转让-上海商铺网","exploitabilityLevel":2,"bsqli":0,"sqli":1,"url":"http://sp.sosfang.com/","xss":0},{"id":"http://www.msdssafe.com/","timestamp":"Sat Apr 06 11:03:33 GMT 2013","title":"MSDS查询网 英文MSDS查询网 MSDS MSDS报告 MSDS下载 msds是什么意思 MSDS安全网","exploitabilityLevel":4,"bsqli":15,"sqli":3,"url":"http://www.msdssafe.com/","xss":0},{"id":"http://www.tiananjidian.com/","timestamp":"Sat Apr 06 11:15:03 GMT 2013","title":"上海精工阀门厂总代理★上海精工阀门|上工牌阀门|精工阀门厂|上海阀门|精工阀门|广东阀门|广州阀门|惠州阀门|东莞阀门|佛山阀门|深圳阀门|中山阀门|潮州阀门|珠海阀门|河源阀门|汕头阀门|肇庆阀门|","exploitabilityLevel":3,"bsqli":0,"sqli":2,"url":"http://www.tiananjidian.com/","xss":1},{"id":"http://www.ywscocie.com/","timestamp":"Sat Apr 06 11:20:46 GMT 2013","title":"","exploitabilityLevel":2,"bsqli":0,"sqli":2,"url":"http://www.ywscocie.com/","xss":0},{"id":"http://bookingsbarbados.com/","timestamp":"Wed May 15 00:54:31 GMT 2013","title":"Bookings Caribbean | Barbados Bookings Center. Book barbados Hotels and Activities. Search, tourism ","exploitabilityLevel":5,"bsqli":4,"sqli":2,"url":"http://bookingsbarbados.com/","xss":18}],"rowsFound":5115,"qTime":1}}

I'm trying to the 'id' key from the json file which represents a url howver it gives me the error that key 'id' does not exist

Since you'd have exactly the same behavior in PHP, or JavaScript or any other language, I'm not sure why you think differences between Python and PHP are at all relevant here. – abarnert Dec 24, 2013 at 23:45 And in Python you can just run a for loop and use the value['id']. It's exactly the same thing. The problem isn't that you don't know how to write loops in Python, it's that you're looping over the wrong thing. – abarnert Dec 24, 2013 at 23:53

Indeed, the only top-level key that exists is 'data', and the value associated with that key does not have an 'id' key:

>>> json_data['data'].keys()
[u'numberOfPages', u'domainSummaryDTOs', u'rowsFound', u'qTime']

The id keys are found in the json_data['data']['domainSummaryDTOs'] list of dictionaries:

for entry in json_data['data']['domainSummaryDTOs']:
    print entry['id']

Demo:

>>> import json
>>> json_data = json.loads('''{"data":{"numberOfPages":512,"domainSummaryDTOs":[{"id":"http://www.cdfdmy.com/","timestamp":"Tue May 14 12:59:28 GMT 2013","title":"【推荐】成都空压机|四川空压机|成都空气压缩机|四川空气压缩机|成都螺杆空压机|四川螺杆空压机|成都双螺杆空压 机|四川双螺杆空压机|成都福道贸易有限公司","exploitabilityLevel":4,"bsqli":2,"sqli":2,"url":"http://www.cdfdmy.com/","xss":0},{"id":"http://www.chushijob.com/","timestamp":"Tue May 14 12:59:28 GMT 2013","title":"餐饮世界人才网-中国厨师人才网-中国酒店人才网","exploitabilityLevel":5,"bsqli":3,"sqli":2,"url":"http://www.chushijob.com/","xss":2},{"id":"http://www.hbenshi.com/","timestamp":"Tue May 14 12:59:28 GMT 2013","title":"恩施旅游网--恩施大峡谷 腾龙洞 利川 清江闯滩 土司城 欢迎您!","exploitabilityLevel":5,"bsqli":3,"sqli":4,"url":"http://www.hbenshi.com/","xss":1},{"id":"http://bbs.laiyb.com/","timestamp":"Mon Apr 29 03:30:09 GMT 2013","title":"莱阳论坛_莱阳吧_莱阳人的网络社区 -","exploitabilityLevel":4,"bsqli":4,"sqli":1,"url":"http://bbs.laiyb.com/","xss":0},{"id":"http://photostudio-town.com/","timestamp":"Mon Apr 29 03:30:09 GMT 2013","title":"フォトスタジオ・タウン-就職証明写真・お受験写真・オーディション写真-","exploitabilityLevel":5,"bsqli":1,"sqli":1,"url":"http://photostudio-town.com/","xss":1},{"id":"http://sp.sosfang.com/","timestamp":"Mon Apr 29 03:30:09 GMT 2013","title":"上海商铺出租/转让,上海门面房出租信息/上海门面转让-上海商铺网","exploitabilityLevel":2,"bsqli":0,"sqli":1,"url":"http://sp.sosfang.com/","xss":0},{"id":"http://www.msdssafe.com/","timestamp":"Sat Apr 06 11:03:33 GMT 2013","title":"MSDS查 询网 英文MSDS查询网 MSDS MSDS报告 MSDS下载 msds是什么意思 MSDS安全网","exploitabilityLevel":4,"bsqli":15,"sqli":3,"url":"http://www.msdssafe.com/","xss":0},{"id":"http://www.tiananjidian.com/","timestamp":"Sat Apr 06 11:15:03 GMT 2013","title":"上海精工阀门厂总代理★上海精工阀门|上工牌阀门|精工阀门厂|上海阀门|精工阀门|广东阀门|广州阀门|惠州阀门|东莞阀门|佛山阀门|深圳阀门|中山阀门|潮州阀门|珠海阀门|河源阀门|汕头阀门|肇庆阀门|","exploitabilityLevel":3,"bsqli":0,"sqli":2,"url":"http://www.tiananjidian.com/","xss":1},{"id":"http://www.ywscocie.com/","timestamp":"Sat Apr 06 11:20:46 GMT 2013","title":"","exploitabilityLevel":2,"bsqli":0,"sqli":2,"url":"http://www.ywscocie.com/","xss":0},{"id":"http://bookingsbarbados.com/","timestamp":"Wed May 15 00:54:31 GMT 2013","title":"Bookings Caribbean | Barbados Bookings Center. Book barbados Hotels and Activities. Search, tourism ","exploitabilityLevel":5,"bsqli":4,"sqli":2,"url":"http://bookingsbarbados.com/","xss":18}],"rowsFound":5115,"qTime":1}}
... ''')
>>> for entry in json_data['data']['domainSummaryDTOs']:
...     print entry['id']
http://www.cdfdmy.com/
http://www.chushijob.com/
http://www.hbenshi.com/
http://bbs.laiyb.com/
http://photostudio-town.com/
http://sp.sosfang.com/
http://www.msdssafe.com/
http://www.tiananjidian.com/
http://www.ywscocie.com/
http://bookingsbarbados.com/

It generally helps to prettify your JSON first into a more readable tree. You can use the online JSONLint service, or you can use the python json module as a command line on a file:

python -m json.tool filename.json

For your input, JSONLint produces:

"data": { "numberOfPages": 512, "domainSummaryDTOs": [ "id": "http://www.cdfdmy.com/", "timestamp": "Tue May 14 12:59:28 GMT 2013", "title": "【推荐】成都空压机|四川空压机|成都空气压缩机|四川空气压缩机|成都螺杆空压机|四川螺杆空压机|成都双螺杆空压机|四川双螺杆空压机|成都福道贸易有限公司", "exploitabilityLevel": 4, "bsqli": 2, "sqli": 2, "url": "http://www.cdfdmy.com/", "xss": 0 "id": "http://www.chushijob.com/", "timestamp": "Tue May 14 12:59:28 GMT 2013", "title": "餐饮世界人才网-中国厨师人才网-中国酒店人才网", "exploitabilityLevel": 5, "bsqli": 3, "sqli": 2, "url": "http://www.chushijob.com/", "xss": 2 "id": "http://www.hbenshi.com/", "timestamp": "Tue May 14 12:59:28 GMT 2013", "title": "恩施旅游网--恩施大峡谷 腾龙洞 利川 清江闯滩 土司城 欢迎您!", "exploitabilityLevel": 5, "bsqli": 3, "sqli": 4, "url": "http://www.hbenshi.com/", "xss": 1 "id": "http://bbs.laiyb.com/", "timestamp": "Mon Apr 29 03:30:09 GMT 2013", "title": "莱阳论坛_莱阳吧_莱阳人的网络社区 -", "exploitabilityLevel": 4, "bsqli": 4, "sqli": 1, "url": "http://bbs.laiyb.com/", "xss": 0 "id": "http://photostudio-town.com/", "timestamp": "Mon Apr 29 03:30:09 GMT 2013", "title": "フォトスタジオ・タウン-就職証明写真・お受験写真・オーディション写真-", "exploitabilityLevel": 5, "bsqli": 1, "sqli": 1, "url": "http://photostudio-town.com/", "xss": 1 "id": "http://sp.sosfang.com/", "timestamp": "Mon Apr 29 03:30:09 GMT 2013", "title": "上海商铺出租/转让,上海门面房出租信息/上海门面转让-上海商铺网", "exploitabilityLevel": 2, "bsqli": 0, "sqli": 1, "url": "http://sp.sosfang.com/", "xss": 0 "id": "http://www.msdssafe.com/", "timestamp": "Sat Apr 06 11:03:33 GMT 2013", "title": "MSDS查询网 英文MSDS查询网 MSDS MSDS报告 MSDS下载 msds是什么意思 MSDS安全网", "exploitabilityLevel": 4, "bsqli": 15, "sqli": 3, "url": "http://www.msdssafe.com/", "xss": 0 "id": "http://www.tiananjidian.com/", "timestamp": "Sat Apr 06 11:15:03 GMT 2013", "title": "上海精工阀门厂总代理★上海精工阀门|上工牌阀门|精工阀门厂|上海阀门|精工阀门|广东阀门|广州阀门|惠州阀门|东莞阀门|佛山阀门|深圳阀门|中山阀门|潮州阀门|珠海阀门|河源阀门|汕头阀门|肇庆阀门|", "exploitabilityLevel": 3, "bsqli": 0, "sqli": 2, "url": "http://www.tiananjidian.com/", "xss": 1 "id": "http://www.ywscocie.com/", "timestamp": "Sat Apr 06 11:20:46 GMT 2013", "title": "", "exploitabilityLevel": 2, "bsqli": 0, "sqli": 2, "url": "http://www.ywscocie.com/", "xss": 0 "id": "http://bookingsbarbados.com/", "timestamp": "Wed May 15 00:54:31 GMT 2013", "title": "Bookings Caribbean | Barbados Bookings Center. Book barbados Hotels and Activities. Search, tourism ", "exploitabilityLevel": 5, "bsqli": 4, "sqli": 2, "url": "http://bookingsbarbados.com/", "xss": 18 "rowsFound": 5115, "qTime": 1

which is perhaps a little easier to decypher.

thanks man i voted your answer up but checked the other one just becausei saw it first thank you though worked perfect – user3051232 Dec 24, 2013 at 23:47 Thanks! If it makes you feel any better about it, I was 30 seconds faster with posting my answer as well. :-P – Martijn Pieters Dec 24, 2013 at 23:55

Your "id" elements are in a list within domainSummaryDTOs

print json_data['data']['domainSummaryDTOs'][0]['id'];

...will get you the first element, you'll need to fix your loop to follow that structure:

for item in json_data['data']['domainSummaryDTOs']:
    print item['id']
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.