MySQL - 使用'load data infile'导入HUGE csv

时间:2016-05-29 02:35:20

标签: mysql csv

我需要一些帮助。 我创建了我的表结构如下:

CREATE TABLE `my_data` (
`Date` VARCHAR(45) NOT NULL, 
`test1` double,
`check1` int,
`test2` double,
`check2` int,
`No` INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(No)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

我的csv数据是5gb及以上的巨大文件。它每秒捕获数据。每秒的数据可能相同,但信息有效。如何导入所有重复项?当我尝试使用下面的命令时,系统不断删除重复项。

LOAD DATA LOCAL INFILE 'D:/mydatatable.csv' INTO TABLE my_data FIELDS TERMINATED BY ',' enclosed by '"' lines terminated by '\n' IGNORE 1 LINES

这是csv的样本记录

<style>
	.demo {
		border:1px solid #C0C0C0;
		border-collapse:collapse;
		padding:5px;
	}
	.demo th {
		border:1px solid #C0C0C0;
		padding:5px;
		background:#F0F0F0;
	}
	.demo td {
		border:1px solid #C0C0C0;
		padding:5px;
	}
</style>
<table class="demo">
	<caption>Table 1</caption>
	<thead>
	<tr>
		<th>date/time</th>
		<th>A</th>
		<th>B</th>
		<th>C</th>
	</tr>
	</thead>
	<tbody>
	<tr>
		<td>2/23/2015 0:42</td>
		<td>3</td>
		<td>4</td>
		<td>2</td>
	</tr>
	<tr>
		<td>2/23/2015 0:42</td>
		<td>3</td>
		<td>4</td>
		<td>2</td>
	</tr>
	<tr>
		<td>2/23/2015 0:42</td>
		<td>3</td>
		<td>4</td>
		<td>2</td>
	</tr>
	<tr>
		<td>&nbsp;</td>
		<td>&nbsp;</td>
		<td>&nbsp;</td>
		<td>&nbsp;</td>
	</tr>
	</tbody>
</table>​

CSV数据:

2/23/2015 0:42,3,4,2
2/23/2015 0:42,3,4,2
2/23/2015 0:42,3,4,2

1 个答案:

答案 0 :(得分:0)

就我个人而言,我无法重现此问题,您确定您的行已被\n而不是\r\n终止吗?

首先,我会尝试将auto_increment列作为第一列移动。

ALTER TABLE `my_data`
    CHANGE COLUMN `NO` `NO` INT(11) NOT NULL AUTO_INCREMENT FIRST;

然后我明确定义列,以便从导入的数据中正确表示它们,而不是暗示。

LOAD DATA LOCAL INFILE 'D:/mydatatable.csv' 
    IGNORE INTO TABLE `my_data` 
    FIELDS TERMINATED BY ',' 
    OPTIONALLY ENCLOSED BY '"' 
    LINES TERMINATED BY '\n' 
    IGNORE 1 LINES (`DATE`, `test1`, `check1`, `test2`, `check2`);

这将确保您的CSV中是否有任何额外的列数据,它将被忽略,以确保auto_increment列不会被''0等污染

最终结果

mysql> LOAD DATA LOCAL INFILE 'D:/mydatatable.csv'
    -> IGNORE INTO TABLE `test`.`my_data`
    -> FIELDS TERMINATED BY ','
    -> OPTIONALLY ENCLOSED BY '"'
    -> LINES TERMINATED BY '\n'
    -> IGNORE 1 LINES (`DATE`, `test1`, `check1`, `test2`, `check2`);
Query OK, 3 rows affected, 3 warnings (0.04 sec)
Records: 3  Deleted: 0  Skipped: 0  Warnings: 3
mysql> SHOW WARNINGS;
+---------+------+--------------------------------------------+
| Level   | Code | Message                                    |
+---------+------+--------------------------------------------+
| Warning | 1261 | Row 1 doesn't contain data for all columns |
| Warning | 1261 | Row 2 doesn't contain data for all columns |
| Warning | 1261 | Row 3 doesn't contain data for all columns |
+---------+------+--------------------------------------------+
3 rows in set (0.00 sec)
mysql> SELECT * FROM my_data;
+----+----------------+-------+--------+-------+--------+
| NO | DATE           | test1 | check1 | test2 | check2 |
+----+----------------+-------+--------+-------+--------+
|  1 | 2/23/2015 0:42 |     3 |      4 |     2 |   NULL |
|  2 | 2/23/2015 0:42 |     3 |      4 |     2 |   NULL |
|  3 | 2/23/2015 0:42 |     3 |      4 |     2 |   NULL |
+----+----------------+-------+--------+-------+--------+
3 rows in set (0.00 sec)