Ruby - 处理CSV数据,插入数据库

时间:2017-11-18 21:46:05

标签: ruby-on-rails ruby database csv

我正在使用rake任务,要求Ruby的CSV类导入属性数据行,并希望在将数据插入数据库之前对其进行操作。

CSV

PID,City,Address,Sold Date,Sold Price
100-200-300,Vancouver,510 1700 Nelson Street,01/01/2017,"$500,000 "
200-300-400,Vancouver,304 68 Smithe Street,02/02/2017,"600,000"

居住表(为简洁起见缩短)

+-----+------+------+---------------+-------------+
| pid | city | unit | street_number | street_name |
+-----+------+------+---------------+-------------+
|     |      |      |               |             |
+-----+------+------+---------------+-------------+

耙任务(我到目前为止)

require 'csv'

desc 'Upload CSV data into database'
task residences: :environment do
  residences = Array.new
  counter    = 0
  csv_file   = "#{Rails.root}/public/spreadsheets/unformatted-addresses.csv"

  CSV.foreach(csv_file, headers: true, header_converters: :symbol, converters: :all, skip_blanks: true, encoding: 'UTF-8') do |row|

    #is this the right place to create the hash?
    residences << row.to_hash

    #is this the right way to format each cell?
    residences[counter][:pid]
    residences[counter][:city].downcase
    residences[counter][:address].downcase.split(" ")
    residences[counter][:sold_date]
    residences[counter][:sold_price].delete('$ ,').to_i

    Residence.create( #what to put here? )

    counter += 1
  end

  puts "Imported #{counter} rows."
end

我想要实现的是单独格式化单元格内容然后插入适当的列,例如地址格式应为:

“单位”,“街道号码”,“街道名称”

非常感谢任何帮助!

3 个答案:

答案 0 :(得分:2)

添加到我之前的答案,您应该能够做到这样的事情:

require 'csv'

address_regex = /(^\d+[a-z]?)+\s+(\d+)+\s+(.*)/i

desc 'Upload CSV data into database'
task residences: :environment do
  counter    = 0
  csv_file   = "#{Rails.root}/public/spreadsheets/unformatted-addresses.csv"

  CSV.foreach(csv_file, headers: true, header_converters: :symbol, converters: :all, skip_blanks: true, encoding: 'UTF-8') do |row|

    address = address_regex.match(row[:address])

    Residence.create(
      pid:           row[:pid],
      city:          row[:city],
      unit:          address[1],
      street_number: address[2],
      street_name:   address[3]
    )

    counter += 1
  end

  puts "Imported #{counter} rows."
end

答案 1 :(得分:1)

最终结果如下。

require 'csv'
require 'time'

namespace :csv do
  desc 'Upload CSV data into database'
  task residences: :environment do
    residences    = []
    counter       = 0
    csv_file      = "#{Rails.root}/public/spreadsheets/unformatted-addresses.csv"
    address_regex = /^(\d+[a-z]?)+\s+(\d+)+\s+(.+(?=\W))+\s+(.*)/i


    CSV.foreach(csv_file, headers: true, header_converters: :symbol, converters: :all, skip_blanks: true, encoding: 'UTF-8') do |row|
        address       = address_regex.match(row[:address])
        unit          = address[1]
        street_number = address[2]
        street_name   = address[3]
        street_type   = address[4]
        pid           = row[:pid].strip
        city          = row[:city].strip.downcase
        date          = Date.parse(row[:sold_date])
        sold_date     = date.strftime("%m-%d-%Y")
        sold_price    = row[:sold_price].strip.delete('$ ,').to_i

        puts "#{address}, #{pid}, #{city}, #{sold_date}, #{sold_price}"

        Residence.create(
          pid:           pid,
          city:          city,
          unit:          unit,
          street_number: street_number,
          street_name:   street_name,
          street_type:   street_type,
          sold_date:     sold_date,
          sold_price:    sold_price
        )

        counter += 1
    end

    puts "Imported #{counter} rows."
  end
end

答案 2 :(得分:0)

这应该适用于你想要做的事情,假设每个地址都有一个单位(它还包括任何带有'12A'等字符的单位:

address_regex = /(^\d+[a-z]?)+\s+(\d+)+\s+(.*)/i

matches = address_regex.match(residences[counter][:address])

unit          = matches[1]
street_number = matches[2]
street_name   = matches[3]

Codepad Example

请注意,这不是最有效的代码,我只是为了清晰

相关问题