如何获取终端中MySQL数据库中显示的抓取数据?

时间:2019-05-22 13:55:48

标签: python mysql web-scraping scrapy mysql-python

我正试图将抓取的数据显示在MySQL数据库中。我正在学习一个课程,但没有用。

我确保数据(标题,等级,upc,product_type)与csv文件中的数据顺序相同。

这是我的代码,

在代码编辑器中:

# -*- coding: utf-8 -*-
import os
import csv
import glob
import MySQLdb
from scrapy import Spider
from scrapy.http import Request

def product_info(response, value):
    return response.xpath('//th[text()="' + value + '"]/following-sibling::td/text()').extract()[0]

class BooksSpider(Spider):
    name = 'books'
    allowed_domains = ['books.toscrape.com']
    start_urls = ['http://books.toscrape.com']

    def parse(self, response):
        books = response.xpath('//h3/a/@href').extract()
        for book in books:
            absolute_url = response.urljoin(book)
            yield Request(absolute_url, callback=self.parse_book)

    def parse_book(self, response):
        title = response.css('h1::text').extract_first()
        rating = response.xpath('//*[contains(@class, "star-rating")]/@class').extract()[0]
        rating = rating.replace('star-rating ', '')

        # product information data points
        upc = product_info(response, 'UPC')
        product_type = product_info(response, 'Product Type')

        yield{
        'title': title,
        'rating': rating,
        'upc': upc,
        'product_type': product_type,
        }

        def close(self, reason):
            csv_file = max(glob.iglob('*.csv'), key=os.path.getctime)
            mydb = MySQLdb.connect(host='localhost',
                                   user='shay',
                                   passwd='foo',
                                   db='books_db')

            cursor = mydb.cursor()

            csv_data = csv.reader(file(csv_file))

            row_count = 0
            for row in csv_data:
                if row_count != 0:
                    cursor.execute('INSERT IGNORE INTO books_table(title, rating, upc, product_type) VALUES(%s, %s, %s, $s)', row)
                row_count += 1

            mydb.commit()
            cursor.close()

在终端机中:

mysql -u shay -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.7.26-0ubuntu0.18.04.1 (Ubuntu)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE DATABASE books_db;
Query OK, 1 row affected (0.00 sec)

mysql> USE books_db;
Database changed

mysql> CREATE TABLE books_table(
    -> title VARCHAR(20), 
    -> rating VARCHAR(20),
    -> upc VARCHAR(20), 
    -> product_type VARCHAR(20));
Query OK, 0 rows affected (0.30 sec)

mysql> SELECT * FROM books_table;
Empty set (0.00 sec)

mysql> SELECT * FROM books_table;
Empty set (0.00 sec)

预期结果是终端中已抓取数据的表。

我运行代码,然后运行第二行(SELECT * FROM books_table;),该表应该显示,但它仍然是一个空集。

非常感谢您的帮助,谢谢!

0 个答案:

没有答案
相关问题