Item Crawler¶

进行商品信息和评论的爬取

用法¶

在项目中引入爬虫,示例

生成评论爬虫实例¶

keywords = ['手机', 'Phone']
from taobao_crawler.crawler.item import ItemCrawler
crawler = ItemCrawler(keywords, db)

ItemCrawler(keywords, db) 中的 db 参见 DB

运行商品信息爬虫¶

crawler.run()

数据示例¶

{
    "is_crawled" : true,
    "seller_id" : "360622108",
    "sellerLoc" : "广东 深圳",
    "location" : "广东 深圳",
    "title" : "4+64G指纹识别!全网通4G智能手机5.5寸大屏",
    "item_id" : "561319321061",
    "price" : "529.00",
    "area" : "深圳",
    "sold" : "0"
}

类属性¶

class crawler.item.ItemCrawler(keywords, db, timeout=3)¶

Bases: object

爬取淘宝手机商品记录，插入到 mongodb 数据库中。插入数据示例：{ “is_crawled” : true, “seller_id” : “360622108”, “sellerLoc” : “广东深圳”, “location” : “广东深圳”, “title” : “4+64G指纹识别!全网通4G智能手机5.5寸大屏”, “item_id” : “561319321061”, “price” : “529.00”, “area” : “深圳”, “sold” : “0” }

__init__(keywords, db, timeout=3)¶

初始化 ItemCrawler 实例

参数:	keywords – 搜索的关键词 list，如 [‘手机’,’Phone’] db – 一个 pymongo.MongoClient.db 的实例 timeout – 爬取超时时间, 默认值为 3

run()¶: 运行商品信息爬虫，插入至数据库中。