unicode.isdigit()和unicode.isnumeric()之间的区别

时间:2014-06-24 10:57:46

标签: python unicode

unicode.isdigit()和unicode.isnumeric()方法有什么区别?

5 个答案:

答案 0 :(得分:26)

Python 3 documentation比Python 2文档更清晰:

  

str.isdigit()
  [...]数字包括需要特殊处理的十进制字符和数字,例如兼容性上标数字。形式上,数字是具有属性值Numeric_Type = Digit或Numeric_Type = Decimal的字符。

     

str.isnumeric()
  数字字符包括数字字符,以及具有Unicode数值属性的所有字符,例如, U + 2155,VULGAR FRACTION ONE FIFTH。形式上,数字字符是属性值为Numeric_Type = Digit,Numeric_Type = Decimal或Numeric_Type = Numeric的字符。

因此isnumeric()另外测试Numeric_Type = Numeric。引用official numeric type definitions的历史性提案:

  

<强> Numeric_Type =十进制
  位置十进制系统中使用的字符,标准的基数为10的基数系统,连续数字为0..9,并且是最重要的数字(后备存储顺序)。根据定义,它们与General_Category = Decimal_Number。

共同扩展      

<强> Numeric_Type =数字   位置十进制字符的变体(Numeric_Type = Decimal)或其序列。这些包括超级/下标,通过添加括号,圆点或逗号等字符来封闭或装饰。

     

<强> Numeric_Type =数字   具有数值的字符,但既不是十进制也不是数字。

所有任何数字字符,但十进制或其变体。想想分数,罗马数字,组合数字的字形,以及任何非基于小数的编号系统。

包括:

>>> import unicodedata
>>> for codepoint in range(2**16):
...     chr = unichr(codepoint)
...     if chr.isnumeric() and not chr.isdigit():
...         print u'{:04x}: {} ({})'.format(codepoint, chr, unicodedata.name(chr, 'UNNAMED'))
... 
00bc: ¼ (VULGAR FRACTION ONE QUARTER)
00bd: ½ (VULGAR FRACTION ONE HALF)
00be: ¾ (VULGAR FRACTION THREE QUARTERS)
09f4: ৴ (BENGALI CURRENCY NUMERATOR ONE)
09f5: ৵ (BENGALI CURRENCY NUMERATOR TWO)
09f6: ৶ (BENGALI CURRENCY NUMERATOR THREE)
09f7: ৷ (BENGALI CURRENCY NUMERATOR FOUR)
09f8: ৸ (BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR)
09f9: ৹ (BENGALI CURRENCY DENOMINATOR SIXTEEN)
0bf0: ௰ (TAMIL NUMBER TEN)
0bf1: ௱ (TAMIL NUMBER ONE HUNDRED)
0bf2: ௲ (TAMIL NUMBER ONE THOUSAND)
0c78: ౸ (TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR)
0c79: ౹ (TELUGU FRACTION DIGIT ONE FOR ODD POWERS OF FOUR)
0c7a: ౺ (TELUGU FRACTION DIGIT TWO FOR ODD POWERS OF FOUR)
0c7b: ౻ (TELUGU FRACTION DIGIT THREE FOR ODD POWERS OF FOUR)
0c7c: ౼ (TELUGU FRACTION DIGIT ONE FOR EVEN POWERS OF FOUR)
0c7d: ౽ (TELUGU FRACTION DIGIT TWO FOR EVEN POWERS OF FOUR)
0c7e: ౾ (TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR)
0d70: ൰ (MALAYALAM NUMBER TEN)
0d71: ൱ (MALAYALAM NUMBER ONE HUNDRED)
0d72: ൲ (MALAYALAM NUMBER ONE THOUSAND)
0d73: ൳ (MALAYALAM FRACTION ONE QUARTER)
0d74: ൴ (MALAYALAM FRACTION ONE HALF)
0d75: ൵ (MALAYALAM FRACTION THREE QUARTERS)
0f2a: ༪ (TIBETAN DIGIT HALF ONE)
0f2b: ༫ (TIBETAN DIGIT HALF TWO)
0f2c: ༬ (TIBETAN DIGIT HALF THREE)
0f2d: ༭ (TIBETAN DIGIT HALF FOUR)
0f2e: ༮ (TIBETAN DIGIT HALF FIVE)
0f2f: ༯ (TIBETAN DIGIT HALF SIX)
0f30: ༰ (TIBETAN DIGIT HALF SEVEN)
0f31: ༱ (TIBETAN DIGIT HALF EIGHT)
0f32: ༲ (TIBETAN DIGIT HALF NINE)
0f33: ༳ (TIBETAN DIGIT HALF ZERO)
1372: ፲ (ETHIOPIC NUMBER TEN)
1373: ፳ (ETHIOPIC NUMBER TWENTY)
1374: ፴ (ETHIOPIC NUMBER THIRTY)
1375: ፵ (ETHIOPIC NUMBER FORTY)
1376: ፶ (ETHIOPIC NUMBER FIFTY)
1377: ፷ (ETHIOPIC NUMBER SIXTY)
1378: ፸ (ETHIOPIC NUMBER SEVENTY)
1379: ፹ (ETHIOPIC NUMBER EIGHTY)
137a: ፺ (ETHIOPIC NUMBER NINETY)
137b: ፻ (ETHIOPIC NUMBER HUNDRED)
137c: ፼ (ETHIOPIC NUMBER TEN THOUSAND)
16ee: ᛮ (RUNIC ARLAUG SYMBOL)
16ef: ᛯ (RUNIC TVIMADUR SYMBOL)
16f0: ᛰ (RUNIC BELGTHOR SYMBOL)
17f0: ៰ (KHMER SYMBOL LEK ATTAK SON)
17f1: ៱ (KHMER SYMBOL LEK ATTAK MUOY)
17f2: ៲ (KHMER SYMBOL LEK ATTAK PII)
17f3: ៳ (KHMER SYMBOL LEK ATTAK BEI)
17f4: ៴ (KHMER SYMBOL LEK ATTAK BUON)
17f5: ៵ (KHMER SYMBOL LEK ATTAK PRAM)
17f6: ៶ (KHMER SYMBOL LEK ATTAK PRAM-MUOY)
17f7: ៷ (KHMER SYMBOL LEK ATTAK PRAM-PII)
17f8: ៸ (KHMER SYMBOL LEK ATTAK PRAM-BEI)
17f9: ៹ (KHMER SYMBOL LEK ATTAK PRAM-BUON)
2150: ⅐ (VULGAR FRACTION ONE SEVENTH)
2151: ⅑ (VULGAR FRACTION ONE NINTH)
2152: ⅒ (VULGAR FRACTION ONE TENTH)
2153: ⅓ (VULGAR FRACTION ONE THIRD)
2154: ⅔ (VULGAR FRACTION TWO THIRDS)
2155: ⅕ (VULGAR FRACTION ONE FIFTH)
2156: ⅖ (VULGAR FRACTION TWO FIFTHS)
2157: ⅗ (VULGAR FRACTION THREE FIFTHS)
2158: ⅘ (VULGAR FRACTION FOUR FIFTHS)
2159: ⅙ (VULGAR FRACTION ONE SIXTH)
215a: ⅚ (VULGAR FRACTION FIVE SIXTHS)
215b: ⅛ (VULGAR FRACTION ONE EIGHTH)
215c: ⅜ (VULGAR FRACTION THREE EIGHTHS)
215d: ⅝ (VULGAR FRACTION FIVE EIGHTHS)
215e: ⅞ (VULGAR FRACTION SEVEN EIGHTHS)
215f: ⅟ (FRACTION NUMERATOR ONE)
2160: Ⅰ (ROMAN NUMERAL ONE)
2161: Ⅱ (ROMAN NUMERAL TWO)
2162: Ⅲ (ROMAN NUMERAL THREE)
2163: Ⅳ (ROMAN NUMERAL FOUR)
2164: Ⅴ (ROMAN NUMERAL FIVE)
2165: Ⅵ (ROMAN NUMERAL SIX)
2166: Ⅶ (ROMAN NUMERAL SEVEN)
2167: Ⅷ (ROMAN NUMERAL EIGHT)
2168: Ⅸ (ROMAN NUMERAL NINE)
2169: Ⅹ (ROMAN NUMERAL TEN)
216a: Ⅺ (ROMAN NUMERAL ELEVEN)
216b: Ⅻ (ROMAN NUMERAL TWELVE)
216c: Ⅼ (ROMAN NUMERAL FIFTY)
216d: Ⅽ (ROMAN NUMERAL ONE HUNDRED)
216e: Ⅾ (ROMAN NUMERAL FIVE HUNDRED)
216f: Ⅿ (ROMAN NUMERAL ONE THOUSAND)
2170: ⅰ (SMALL ROMAN NUMERAL ONE)
2171: ⅱ (SMALL ROMAN NUMERAL TWO)
2172: ⅲ (SMALL ROMAN NUMERAL THREE)
2173: ⅳ (SMALL ROMAN NUMERAL FOUR)
2174: ⅴ (SMALL ROMAN NUMERAL FIVE)
2175: ⅵ (SMALL ROMAN NUMERAL SIX)
2176: ⅶ (SMALL ROMAN NUMERAL SEVEN)
2177: ⅷ (SMALL ROMAN NUMERAL EIGHT)
2178: ⅸ (SMALL ROMAN NUMERAL NINE)
2179: ⅹ (SMALL ROMAN NUMERAL TEN)
217a: ⅺ (SMALL ROMAN NUMERAL ELEVEN)
217b: ⅻ (SMALL ROMAN NUMERAL TWELVE)
217c: ⅼ (SMALL ROMAN NUMERAL FIFTY)
217d: ⅽ (SMALL ROMAN NUMERAL ONE HUNDRED)
217e: ⅾ (SMALL ROMAN NUMERAL FIVE HUNDRED)
217f: ⅿ (SMALL ROMAN NUMERAL ONE THOUSAND)
2180: ↀ (ROMAN NUMERAL ONE THOUSAND C D)
2181: ↁ (ROMAN NUMERAL FIVE THOUSAND)
2182: ↂ (ROMAN NUMERAL TEN THOUSAND)
2185: ↅ (ROMAN NUMERAL SIX LATE FORM)
2186: ↆ (ROMAN NUMERAL FIFTY EARLY FORM)
2187: ↇ (ROMAN NUMERAL FIFTY THOUSAND)
2188: ↈ (ROMAN NUMERAL ONE HUNDRED THOUSAND)
2189: ↉ (VULGAR FRACTION ZERO THIRDS)
2469: ⑩ (CIRCLED NUMBER TEN)
246a: ⑪ (CIRCLED NUMBER ELEVEN)
246b: ⑫ (CIRCLED NUMBER TWELVE)
246c: ⑬ (CIRCLED NUMBER THIRTEEN)
246d: ⑭ (CIRCLED NUMBER FOURTEEN)
246e: ⑮ (CIRCLED NUMBER FIFTEEN)
246f: ⑯ (CIRCLED NUMBER SIXTEEN)
2470: ⑰ (CIRCLED NUMBER SEVENTEEN)
2471: ⑱ (CIRCLED NUMBER EIGHTEEN)
2472: ⑲ (CIRCLED NUMBER NINETEEN)
2473: ⑳ (CIRCLED NUMBER TWENTY)
247d: ⑽ (PARENTHESIZED NUMBER TEN)
247e: ⑾ (PARENTHESIZED NUMBER ELEVEN)
247f: ⑿ (PARENTHESIZED NUMBER TWELVE)
2480: ⒀ (PARENTHESIZED NUMBER THIRTEEN)
2481: ⒁ (PARENTHESIZED NUMBER FOURTEEN)
2482: ⒂ (PARENTHESIZED NUMBER FIFTEEN)
2483: ⒃ (PARENTHESIZED NUMBER SIXTEEN)
2484: ⒄ (PARENTHESIZED NUMBER SEVENTEEN)
2485: ⒅ (PARENTHESIZED NUMBER EIGHTEEN)
2486: ⒆ (PARENTHESIZED NUMBER NINETEEN)
2487: ⒇ (PARENTHESIZED NUMBER TWENTY)
2491: ⒑ (NUMBER TEN FULL STOP)
2492: ⒒ (NUMBER ELEVEN FULL STOP)
2493: ⒓ (NUMBER TWELVE FULL STOP)
2494: ⒔ (NUMBER THIRTEEN FULL STOP)
2495: ⒕ (NUMBER FOURTEEN FULL STOP)
2496: ⒖ (NUMBER FIFTEEN FULL STOP)
2497: ⒗ (NUMBER SIXTEEN FULL STOP)
2498: ⒘ (NUMBER SEVENTEEN FULL STOP)
2499: ⒙ (NUMBER EIGHTEEN FULL STOP)
249a: ⒚ (NUMBER NINETEEN FULL STOP)
249b: ⒛ (NUMBER TWENTY FULL STOP)
24eb: ⓫ (NEGATIVE CIRCLED NUMBER ELEVEN)
24ec: ⓬ (NEGATIVE CIRCLED NUMBER TWELVE)
24ed: ⓭ (NEGATIVE CIRCLED NUMBER THIRTEEN)
24ee: ⓮ (NEGATIVE CIRCLED NUMBER FOURTEEN)
24ef: ⓯ (NEGATIVE CIRCLED NUMBER FIFTEEN)
24f0: ⓰ (NEGATIVE CIRCLED NUMBER SIXTEEN)
24f1: ⓱ (NEGATIVE CIRCLED NUMBER SEVENTEEN)
24f2: ⓲ (NEGATIVE CIRCLED NUMBER EIGHTEEN)
24f3: ⓳ (NEGATIVE CIRCLED NUMBER NINETEEN)
24f4: ⓴ (NEGATIVE CIRCLED NUMBER TWENTY)
24fe: ⓾ (DOUBLE CIRCLED NUMBER TEN)
277f: ❿ (DINGBAT NEGATIVE CIRCLED NUMBER TEN)
2789: ➉ (DINGBAT CIRCLED SANS-SERIF NUMBER TEN)
2793: ➓ (DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN)
2cfd: ⳽ (COPTIC FRACTION ONE HALF)
3007: 〇 (IDEOGRAPHIC NUMBER ZERO)
3021: 〡 (HANGZHOU NUMERAL ONE)
3022: 〢 (HANGZHOU NUMERAL TWO)
3023: 〣 (HANGZHOU NUMERAL THREE)
3024: 〤 (HANGZHOU NUMERAL FOUR)
3025: 〥 (HANGZHOU NUMERAL FIVE)
3026: 〦 (HANGZHOU NUMERAL SIX)
3027: 〧 (HANGZHOU NUMERAL SEVEN)
3028: 〨 (HANGZHOU NUMERAL EIGHT)
3029: 〩 (HANGZHOU NUMERAL NINE)
3038: 〸 (HANGZHOU NUMERAL TEN)
3039: 〹 (HANGZHOU NUMERAL TWENTY)
303a: 〺 (HANGZHOU NUMERAL THIRTY)
3192: ㆒ (IDEOGRAPHIC ANNOTATION ONE MARK)
3193: ㆓ (IDEOGRAPHIC ANNOTATION TWO MARK)
3194: ㆔ (IDEOGRAPHIC ANNOTATION THREE MARK)
3195: ㆕ (IDEOGRAPHIC ANNOTATION FOUR MARK)
3220: ㈠ (PARENTHESIZED IDEOGRAPH ONE)
3221: ㈡ (PARENTHESIZED IDEOGRAPH TWO)
3222: ㈢ (PARENTHESIZED IDEOGRAPH THREE)
3223: ㈣ (PARENTHESIZED IDEOGRAPH FOUR)
3224: ㈤ (PARENTHESIZED IDEOGRAPH FIVE)
3225: ㈥ (PARENTHESIZED IDEOGRAPH SIX)
3226: ㈦ (PARENTHESIZED IDEOGRAPH SEVEN)
3227: ㈧ (PARENTHESIZED IDEOGRAPH EIGHT)
3228: ㈨ (PARENTHESIZED IDEOGRAPH NINE)
3229: ㈩ (PARENTHESIZED IDEOGRAPH TEN)
3251: ㉑ (CIRCLED NUMBER TWENTY ONE)
3252: ㉒ (CIRCLED NUMBER TWENTY TWO)
3253: ㉓ (CIRCLED NUMBER TWENTY THREE)
3254: ㉔ (CIRCLED NUMBER TWENTY FOUR)
3255: ㉕ (CIRCLED NUMBER TWENTY FIVE)
3256: ㉖ (CIRCLED NUMBER TWENTY SIX)
3257: ㉗ (CIRCLED NUMBER TWENTY SEVEN)
3258: ㉘ (CIRCLED NUMBER TWENTY EIGHT)
3259: ㉙ (CIRCLED NUMBER TWENTY NINE)
325a: ㉚ (CIRCLED NUMBER THIRTY)
325b: ㉛ (CIRCLED NUMBER THIRTY ONE)
325c: ㉜ (CIRCLED NUMBER THIRTY TWO)
325d: ㉝ (CIRCLED NUMBER THIRTY THREE)
325e: ㉞ (CIRCLED NUMBER THIRTY FOUR)
325f: ㉟ (CIRCLED NUMBER THIRTY FIVE)
3280: ㊀ (CIRCLED IDEOGRAPH ONE)
3281: ㊁ (CIRCLED IDEOGRAPH TWO)
3282: ㊂ (CIRCLED IDEOGRAPH THREE)
3283: ㊃ (CIRCLED IDEOGRAPH FOUR)
3284: ㊄ (CIRCLED IDEOGRAPH FIVE)
3285: ㊅ (CIRCLED IDEOGRAPH SIX)
3286: ㊆ (CIRCLED IDEOGRAPH SEVEN)
3287: ㊇ (CIRCLED IDEOGRAPH EIGHT)
3288: ㊈ (CIRCLED IDEOGRAPH NINE)
3289: ㊉ (CIRCLED IDEOGRAPH TEN)
32b1: ㊱ (CIRCLED NUMBER THIRTY SIX)
32b2: ㊲ (CIRCLED NUMBER THIRTY SEVEN)
32b3: ㊳ (CIRCLED NUMBER THIRTY EIGHT)
32b4: ㊴ (CIRCLED NUMBER THIRTY NINE)
32b5: ㊵ (CIRCLED NUMBER FORTY)
32b6: ㊶ (CIRCLED NUMBER FORTY ONE)
32b7: ㊷ (CIRCLED NUMBER FORTY TWO)
32b8: ㊸ (CIRCLED NUMBER FORTY THREE)
32b9: ㊹ (CIRCLED NUMBER FORTY FOUR)
32ba: ㊺ (CIRCLED NUMBER FORTY FIVE)
32bb: ㊻ (CIRCLED NUMBER FORTY SIX)
32bc: ㊼ (CIRCLED NUMBER FORTY SEVEN)
32bd: ㊽ (CIRCLED NUMBER FORTY EIGHT)
32be: ㊾ (CIRCLED NUMBER FORTY NINE)
32bf: ㊿ (CIRCLED NUMBER FIFTY)
3405: 㐅 (CJK UNIFIED IDEOGRAPH-3405)
3483: 㒃 (CJK UNIFIED IDEOGRAPH-3483)
382a: 㠪 (CJK UNIFIED IDEOGRAPH-382A)
3b4d: 㭍 (CJK UNIFIED IDEOGRAPH-3B4D)
4e00: 一 (CJK UNIFIED IDEOGRAPH-4E00)
4e03: 七 (CJK UNIFIED IDEOGRAPH-4E03)
4e07: 万 (CJK UNIFIED IDEOGRAPH-4E07)
4e09: 三 (CJK UNIFIED IDEOGRAPH-4E09)
4e5d: 九 (CJK UNIFIED IDEOGRAPH-4E5D)
4e8c: 二 (CJK UNIFIED IDEOGRAPH-4E8C)
4e94: 五 (CJK UNIFIED IDEOGRAPH-4E94)
4e96: 亖 (CJK UNIFIED IDEOGRAPH-4E96)
4ebf: 亿 (CJK UNIFIED IDEOGRAPH-4EBF)
4ec0: 什 (CJK UNIFIED IDEOGRAPH-4EC0)
4edf: 仟 (CJK UNIFIED IDEOGRAPH-4EDF)
4ee8: 仨 (CJK UNIFIED IDEOGRAPH-4EE8)
4f0d: 伍 (CJK UNIFIED IDEOGRAPH-4F0D)
4f70: 佰 (CJK UNIFIED IDEOGRAPH-4F70)
5104: 億 (CJK UNIFIED IDEOGRAPH-5104)
5146: 兆 (CJK UNIFIED IDEOGRAPH-5146)
5169: 兩 (CJK UNIFIED IDEOGRAPH-5169)
516b: 八 (CJK UNIFIED IDEOGRAPH-516B)
516d: 六 (CJK UNIFIED IDEOGRAPH-516D)
5341: 十 (CJK UNIFIED IDEOGRAPH-5341)
5343: 千 (CJK UNIFIED IDEOGRAPH-5343)
5344: 卄 (CJK UNIFIED IDEOGRAPH-5344)
5345: 卅 (CJK UNIFIED IDEOGRAPH-5345)
534c: 卌 (CJK UNIFIED IDEOGRAPH-534C)
53c1: 叁 (CJK UNIFIED IDEOGRAPH-53C1)
53c2: 参 (CJK UNIFIED IDEOGRAPH-53C2)
53c3: 參 (CJK UNIFIED IDEOGRAPH-53C3)
53c4: 叄 (CJK UNIFIED IDEOGRAPH-53C4)
56db: 四 (CJK UNIFIED IDEOGRAPH-56DB)
58f1: 壱 (CJK UNIFIED IDEOGRAPH-58F1)
58f9: 壹 (CJK UNIFIED IDEOGRAPH-58F9)
5e7a: 幺 (CJK UNIFIED IDEOGRAPH-5E7A)
5efe: 廾 (CJK UNIFIED IDEOGRAPH-5EFE)
5eff: 廿 (CJK UNIFIED IDEOGRAPH-5EFF)
5f0c: 弌 (CJK UNIFIED IDEOGRAPH-5F0C)
5f0d: 弍 (CJK UNIFIED IDEOGRAPH-5F0D)
5f0e: 弎 (CJK UNIFIED IDEOGRAPH-5F0E)
5f10: 弐 (CJK UNIFIED IDEOGRAPH-5F10)
62fe: 拾 (CJK UNIFIED IDEOGRAPH-62FE)
634c: 捌 (CJK UNIFIED IDEOGRAPH-634C)
67d2: 柒 (CJK UNIFIED IDEOGRAPH-67D2)
6f06: 漆 (CJK UNIFIED IDEOGRAPH-6F06)
7396: 玖 (CJK UNIFIED IDEOGRAPH-7396)
767e: 百 (CJK UNIFIED IDEOGRAPH-767E)
8086: 肆 (CJK UNIFIED IDEOGRAPH-8086)
842c: 萬 (CJK UNIFIED IDEOGRAPH-842C)
8cae: 貮 (CJK UNIFIED IDEOGRAPH-8CAE)
8cb3: 貳 (CJK UNIFIED IDEOGRAPH-8CB3)
8d30: 贰 (CJK UNIFIED IDEOGRAPH-8D30)
9621: 阡 (CJK UNIFIED IDEOGRAPH-9621)
9646: 陆 (CJK UNIFIED IDEOGRAPH-9646)
964c: 陌 (CJK UNIFIED IDEOGRAPH-964C)
9678: 陸 (CJK UNIFIED IDEOGRAPH-9678)
96f6: 零 (CJK UNIFIED IDEOGRAPH-96F6)
a6e6: ꛦ (BAMUM LETTER MO)
a6e7: ꛧ (BAMUM LETTER MBAA)
a6e8: ꛨ (BAMUM LETTER TET)
a6e9: ꛩ (BAMUM LETTER KPA)
a6ea: ꛪ (BAMUM LETTER TEN)
a6eb: ꛫ (BAMUM LETTER NTUU)
a6ec: ꛬ (BAMUM LETTER SAMBA)
a6ed: ꛭ (BAMUM LETTER FAAMAE)
a6ee: ꛮ (BAMUM LETTER KOVUU)
a6ef: ꛯ (BAMUM LETTER KOGHOM)
a830: ꠰ (NORTH INDIC FRACTION ONE QUARTER)
a831: ꠱ (NORTH INDIC FRACTION ONE HALF)
a832: ꠲ (NORTH INDIC FRACTION THREE QUARTERS)
a833: ꠳ (NORTH INDIC FRACTION ONE SIXTEENTH)
a834: ꠴ (NORTH INDIC FRACTION ONE EIGHTH)
a835: ꠵ (NORTH INDIC FRACTION THREE SIXTEENTHS)
f96b: 參 (CJK COMPATIBILITY IDEOGRAPH-F96B)
f973: 拾 (CJK COMPATIBILITY IDEOGRAPH-F973)
f978: 兩 (CJK COMPATIBILITY IDEOGRAPH-F978)
f9b2: 零 (CJK COMPATIBILITY IDEOGRAPH-F9B2)
f9d1: 六 (CJK COMPATIBILITY IDEOGRAPH-F9D1)
f9d3: 陸 (CJK COMPATIBILITY IDEOGRAPH-F9D3)
f9fd: 什 (CJK COMPATIBILITY IDEOGRAPH-F9FD)

但是,Numeric_Type = Digit和Numeric_Type = Numeric之间的区别不再被认为是有用的,并且自Unicode 6.3.0起,Numeric_Type = Digit不再用于新字符。引用Unicode Standard Annex #44

  

从Unicode 6.3.0开始,不会给新编码的数字字符赋予Numeric_Type = Digit,也不会将Numeric_Type = Numeric的现有字符更改为Numeric_Type = Digit。这两种类型之间的区别并不有用。

因此,(DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO)和曾经被赋予Numeric_Type = Digit的其他字符已被赋予Numeric_Type = Numeric,并且他们报告了False { {1}}:

isdigit

答案 1 :(得分:5)

unicode.isnumeric()

如果S中只有数字字符,则返回True,否则返回False。数字字符包括数字字符,以及具有Unicode数值属性的所有字符,例如, U + 2155,VULGAR FRACTION ONE FIFTH。

str.isdigit()

如果字符串中的所有字符都是数字且至少有一个字符,则返回true,否则返回false。

对于8位字符串,此方法取决于语言环境。

答案 2 :(得分:2)

来自manual

  

方法isnumeric()检查字符串是否仅包含   数字字符。此方法仅存在于unicode对象上。

     

数字包括十进制字符和需要特殊的数字   处理,例如兼容性上标数字。正式地,a   digit是具有属性值Numeric_Type = Digit或的字符   Numeric_Type =十进制。

答案 3 :(得分:1)

来自python内置文档,

>>> unicode.isdigit.__doc__
'S.isdigit() -> bool\n\nReturn True if all characters in S are digits\nand there is at least one character in S, False otherwise.'
>>> unicode.isnumeric.__doc__
'S.isnumeric() -> bool\n\nReturn True if there are only numeric characters in S,\nFalse otherwise.'

答案 4 :(得分:0)

在撰写此答案时,@ Martijn Pieters提供的代码段不适用于最新的Python版本(即3.7)。

这是更新的代码段。

import unicodedata

count = 0
for codepoint in range(2**16):
    ch = chr(codepoint)
    if ch.isnumeric() and not ch.isdigit():
        print(u'{:04x}: {} ({})'.format(codepoint, ch, unicodedata.name(ch, 'UNNAMED')))
        count = count + 1
print(f'Total Number of Numeric and Non-Digit Unicode Characters = {count}')

输出:

...
f9d1: 六 (CJK COMPATIBILITY IDEOGRAPH-F9D1)
f9d3: 陸 (CJK COMPATIBILITY IDEOGRAPH-F9D3)
f9fd: 什 (CJK COMPATIBILITY IDEOGRAPH-F9FD)
Total Number of Numeric and Non-Digit Unicode Characters = 335

注意:我正在使用f字符串进行格式化。这是一种非常酷的格式化字符串的新方法,它是在PEP-498下的Python 3.6中引入的。也称为文字字符串插值。您可以here了解更多信息,也可以查看Official Documentation

相关问题