Python拆分不识别连字符

时间:2017-07-07 03:47:43

标签: python

我进口了一张桌子,每年都有一名教练担任足球教练。列出的部分年份如下:" 1903-1910,1917,1919"

我的目标是[1903,1904,1905,1906,1907,1908,1909,1910,1917,1919]

在我的原始DataFrame中,此列表是一个对象。

我试过了:

private void _serialPort_DataReceived(object sender, SerialDataReceivedEventArgs e) { while (serialPort.BytesToRead > 0) { // Initialize a buffer to hold the received data byte[] buffer = new byte[serialPort.ReadBufferSize]; //// There is no accurate method for checking how many bytes are read //// unless you check the return from the Read method int bytesRead = serialPort.Read(buffer, 0, buffer.Length); String asd = System.Text.ASCIIEncoding.ASCII.GetString(buffer, 0, bytesRead); //// For the example assume the data we are received is ASCII data. tString += Encoding.ASCII.GetString(buffer, 0, bytesRead); temp += System.Text.Encoding.Unicode.GetString(buffer, 0, bytesRead); temp2 += System.Text.Encoding.UTF32.GetString(buffer, 0, bytesRead); System.IO.File.WriteAllText(@"C:\OutputTextFiles\WriteLines.txt", tString); System.IO.File.WriteAllText(@"C:\OutputTextFiles\WriteLines2.txt", temp); System.IO.File.WriteAllText(@"C:\OutputTextFiles\WriteLines3.txt", temp2); } }

x = "1903–1910, 1917, 1919"

x[0].split('-')

re.split(r'\s|-', x[0])

我一直在:

x[0].replace('-', ' ').split(' ')

我做错了什么?为什么没有python找到连字符?

3 个答案:

答案 0 :(得分:3)

你看到的连字符并不是连字符。它可能是一些其他角色,比如看起来非常相似的unicode en-dash。

尝试将实际字符复制粘贴到拆分字符串中。

查看您发布的文字,区别在于:

➜  ~ echo '1903–1910' | xxd
00000000: 3139 3033 e280 9331 3931 300a            1903...1910.
➜  ~ echo '1903-1910' | xxd
00000000: 3139 3033 2d31 3931 300a                 1903-1910.

第一种情况中的字符是:https://unicode-table.com/en/2013/

答案 1 :(得分:1)

你的角色不是hyfen,它是一个冲刺:

define

答案 2 :(得分:0)

这可行,但不是最佳

# -*- coding: utf-8 -*-
x = "1903–1910, 1917, 1919"
endash = '–'
years = x.split(', ')
new_list = []
for year in years:
    if endash in year:
        start, finish = year.split(endash)
        new_list.extend(range(int(start), int(finish)+1))
    else:
        new_list.append(int(year))
print new_list

输出:[1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1917, 1919]