Question

我有一个来自专有存档格式的大文件。解压缩此存档会提供一个没有扩展名的文件，但其中的数据是逗号分隔的。添加.csv扩展名或只使用Excel打开文件即可。

我有大约375-400个这些文件，我正在尝试从关键字“Point A”和另一个关键字“Point B”之间提取一大块行（大约13,500个1.2M +行）。

我在网站上发现了一些我认为正确提取数据的代码，但是我收到了错误：

AttributeError: 'list' object has no attribute 'rows' 当试图保存文件时。有人可以帮我把这些数据保存到csv吗？

import re
import csv
import time

print(time.ctime())

file = open('C:/Users/User/Desktop/File with No Extension That\'s Very Similar to CSV', 'r')
data = file.read()
x = re.findall(r'Point A(.*?)Point B', data,re.DOTALL)

name = "C:/Users/User/Desktop/testoutput.csv"
with open(name, 'w', newline='') as file2:
    savefile = csv.writer(file2)
    for i in x.rows:
        savefile.writerow([cell.value for cell in i])

print(time.ctime())

提前致谢，我们非常感谢任何帮助。

Answer 1

以下应该很好用。如上所述，您的正则表达式几乎是正确的。通过将找到的文本转换为StringIO对象并将其传递给CSV阅读器，仍然可以使用Python CSV库进行CSV处理：

import re
import csv
import time
import StringIO

print(time.ctime())

input_name = "C:/Users/User/Desktop/File with No Extension That's Very Similar to CSV"
output_name = "C:/Users/User/Desktop/testoutput.csv"

with open(input_name, 'r') as f_input, open(output_name, 'wb') as f_output:
    # Read whole file in
    all_input = f_input.read()  

    # Extract interesting lines
    ab_input = re.findall(r'Point A(.*?)Point B', all_input, re.DOTALL)[0]

    # Convert into a file object and parse using the CSV reader
    fab_input = StringIO.StringIO(ab_input)
    csv_input = csv.reader(fab_input)
    csv_output = csv.writer(f_output)

    # Iterate a row at a time from the input
    for input_row in csv_input:
        # Skip any empty rows
        if input_row:
            # Write row at a time to the output
            csv_output.writerow(input_row)

print(time.ctime())

您还没有从CSV文件中提供示例，因此如果出现问题，您可能需要配置CSV＆＃39; dialect＆＃39;更好地处理它。

使用Python 2.7进行测试

Answer 2

这里有2个问题：第一个与正则表达式有关，另一个与列表语法有关。

获得您想要的东西

使用正则表达式的方式将返回一个包含单个值的列表（所有行都是唯一的字符串）。

可能有一种更好的方法可以做到这一点，但我现在应该这样做：
```
Terminate
```
这不是很好但会返回一个包含这两点之间所有值的列表。
使用列表

列表中没有with open('bla', 'r') as input: data = input.read() x = re.findall(r'Point A(.*?)Point B', data, re.DOTALL)[0] x = x.splitlines(False)[1:]属性。你只需要迭代它：
```
rows
```
请参阅，我对for i in x: do what you have to do库并不熟悉，但看起来您必须对csv值执行一些操作才能将其添加到库中。

恕我直言，我会避免使用CSV格式，因为它有点“依赖于语言环境”，因此根据最终用户在操作系统上的设置，它可能无法正常工作。

Answer 3

更新代码以便@Martin Evans回答适用于最新的Python版本。

'wt'

此外，使用'wb'代替public class EncryptionHelper { private Cipher ecipher; private Cipher dcipher; private SecretKey key; private byte iv[] = { 8, 7, 6, 5, 4, 3, 2, 1 }; private static EncryptionHelper instance; public static EncryptionHelper getInstance( String defKey ) { synchronized( EncryptionHelper.class ) { if ( null == instance ) { try { instance = new EncryptionHelper( defKey ); } catch( Exception e ) { } } } return instance; } private EncryptionHelper( String defKey ) throws Exception { DESedeKeySpec keyspec = new DESedeKeySpec( defKey.getBytes() ); SecretKeyFactory keyfactory = SecretKeyFactory.getInstance( "TripleDES" ); key = keyfactory.generateSecret( keyspec ); ecipher = Cipher.getInstance( "TripleDES/CBC/PKCS5Padding" ); dcipher = Cipher.getInstance( "TripleDES/CBC/PKCS5Padding" ); ecipher.init( Cipher.ENCRYPT_MODE, key, new IvParameterSpec( iv ) ); dcipher.init( Cipher.DECRYPT_MODE, key, new IvParameterSpec( iv ) ); } public String encrypt( String str ) { try { byte[] utf8 = str.getBytes( "UTF-8" ); byte[] enc = ecipher.doFinal( utf8 ); String val = Base64.getEncoder().encodeToString( enc ); return val; } catch( UnsupportedEncodingException | IllegalBlockSizeException | BadPaddingException e ) { } return str; } public String decrypt( String str ) { try { byte[] dec = Base64.getDecoder().decode( str ); byte[] utf8 = dcipher.doFinal( dec ); return new String( utf8, "UTF8" ); } catch( IllegalBlockSizeException | BadPaddingException | IOException e ) { } return str; } }可以避免

“TypeError：需要类似字节的对象，而不是'str'”

使用Python从{CSV}类文件中提取数据行

3 个答案: