python中perluniprops的等价性是什么?

时间:2016-01-27 15:06:30

标签: python regex perl unicode

s/(\p{Open_Punctuation})/ $1 /g; s/(\p{Close_Punctuation})/ $1 /g; 中,有{7}的python索引,http://perldoc.perl.org/perluniprops.html我可以执行以下操作来填充打开和关闭标点符号:

{{1}}

使用perl时填充的打开/关闭标点符号的完整列表是什么? {{1}}中的等价性是什么?

相关问题:Padding multiple character with space - python Padding multiple character with space - python;这个问题是由回答者的投票分开询问的,它应该是分开的。

1 个答案:

答案 0 :(得分:1)

您是否在询问如何确定给定打开标点符号的相应结束标点符号是什么? Unicode没有定义这个。事实上,甚至没有1:1的关系。

$ unichars '\p{Open_Punctuation}' | wc -l
75

$ unichars '\p{Close_Punctuation}' | wc -l
73

但是,构建自己的映射应该相对容易。

$ unichars '\p{Open_Punctuation}' | cat
 (  U+0028 LEFT PARENTHESIS
 [  U+005B LEFT SQUARE BRACKET
 {  U+007B LEFT CURLY BRACKET
 ༺  U+0F3A TIBETAN MARK GUG RTAGS GYON
 ༼  U+0F3C TIBETAN MARK ANG KHANG GYON
 ᚛  U+169B OGHAM FEATHER MARK
 ‚  U+201A SINGLE LOW-9 QUOTATION MARK
 „  U+201E DOUBLE LOW-9 QUOTATION MARK
 ⁅  U+2045 LEFT SQUARE BRACKET WITH QUILL
 ⁽  U+207D SUPERSCRIPT LEFT PARENTHESIS
 ₍  U+208D SUBSCRIPT LEFT PARENTHESIS
 ⌈  U+2308 LEFT CEILING
 ⌊  U+230A LEFT FLOOR
 〈 U+2329 LEFT-POINTING ANGLE BRACKET
 ❨  U+2768 MEDIUM LEFT PARENTHESIS ORNAMENT
 ❪  U+276A MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
 ❬  U+276C MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
 ❮  U+276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
 ❰  U+2770 HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
 ❲  U+2772 LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
 ❴  U+2774 MEDIUM LEFT CURLY BRACKET ORNAMENT
 ⟅  U+27C5 LEFT S-SHAPED BAG DELIMITER
 ⟦  U+27E6 MATHEMATICAL LEFT WHITE SQUARE BRACKET
 ⟨  U+27E8 MATHEMATICAL LEFT ANGLE BRACKET
 ⟪  U+27EA MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
 ⟬  U+27EC MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET
 ⟮  U+27EE MATHEMATICAL LEFT FLATTENED PARENTHESIS
 ⦃  U+2983 LEFT WHITE CURLY BRACKET
 ⦅  U+2985 LEFT WHITE PARENTHESIS
 ⦇  U+2987 Z NOTATION LEFT IMAGE BRACKET
 ⦉  U+2989 Z NOTATION LEFT BINDING BRACKET
 ⦋  U+298B LEFT SQUARE BRACKET WITH UNDERBAR
 ⦍  U+298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
 ⦏  U+298F LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
 ⦑  U+2991 LEFT ANGLE BRACKET WITH DOT
 ⦓  U+2993 LEFT ARC LESS-THAN BRACKET
 ⦕  U+2995 DOUBLE LEFT ARC GREATER-THAN BRACKET
 ⦗  U+2997 LEFT BLACK TORTOISE SHELL BRACKET
 ⧘  U+29D8 LEFT WIGGLY FENCE
 ⧚  U+29DA LEFT DOUBLE WIGGLY FENCE
 ⧼  U+29FC LEFT-POINTING CURVED ANGLE BRACKET
 ⸢  U+2E22 TOP LEFT HALF BRACKET
 ⸤  U+2E24 BOTTOM LEFT HALF BRACKET
 ⸦  U+2E26 LEFT SIDEWAYS U BRACKET
 ⸨  U+2E28 LEFT DOUBLE PARENTHESIS
 ⹂  U+2E42 DOUBLE LOW-REVERSED-9 QUOTATION MARK
 〈 U+3008 LEFT ANGLE BRACKET
 《 U+300A LEFT DOUBLE ANGLE BRACKET
 「 U+300C LEFT CORNER BRACKET
 『 U+300E LEFT WHITE CORNER BRACKET
 【 U+3010 LEFT BLACK LENTICULAR BRACKET
 〔 U+3014 LEFT TORTOISE SHELL BRACKET
 〖 U+3016 LEFT WHITE LENTICULAR BRACKET
 〘 U+3018 LEFT WHITE TORTOISE SHELL BRACKET
 〚 U+301A LEFT WHITE SQUARE BRACKET
 〝 U+301D REVERSED DOUBLE PRIME QUOTATION MARK
 ﴿  U+FD3F ORNATE RIGHT PARENTHESIS
 ︗ U+FE17 PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET
 ︵ U+FE35 PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
 ︷ U+FE37 PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
 ︹ U+FE39 PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
 ︻ U+FE3B PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
 ︽ U+FE3D PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
 ︿ U+FE3F PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
 ﹁ U+FE41 PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
 ﹃ U+FE43 PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
 ﹇ U+FE47 PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET
 ﹙ U+FE59 SMALL LEFT PARENTHESIS
 ﹛ U+FE5B SMALL LEFT CURLY BRACKET
 ﹝ U+FE5D SMALL LEFT TORTOISE SHELL BRACKET
 ( U+FF08 FULLWIDTH LEFT PARENTHESIS
 [ U+FF3B FULLWIDTH LEFT SQUARE BRACKET
 { U+FF5B FULLWIDTH LEFT CURLY BRACKET
 ⦅ U+FF5F FULLWIDTH LEFT WHITE PARENTHESIS
 「  U+FF62 HALFWIDTH LEFT CORNER BRACKET

$ unichars '\p{Close_Punctuation}' | cat
 )  U+0029 RIGHT PARENTHESIS
 ]  U+005D RIGHT SQUARE BRACKET
 }  U+007D RIGHT CURLY BRACKET
 ༻  U+0F3B TIBETAN MARK GUG RTAGS GYAS
 ༽  U+0F3D TIBETAN MARK ANG KHANG GYAS
 ᚜  U+169C OGHAM REVERSED FEATHER MARK
 ⁆  U+2046 RIGHT SQUARE BRACKET WITH QUILL
 ⁾  U+207E SUPERSCRIPT RIGHT PARENTHESIS
 ₎  U+208E SUBSCRIPT RIGHT PARENTHESIS
 ⌉  U+2309 RIGHT CEILING
 ⌋  U+230B RIGHT FLOOR
 〉 U+232A RIGHT-POINTING ANGLE BRACKET
 ❩  U+2769 MEDIUM RIGHT PARENTHESIS ORNAMENT
 ❫  U+276B MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT
 ❭  U+276D MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT
 ❯  U+276F HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT
 ❱  U+2771 HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT
 ❳  U+2773 LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT
 ❵  U+2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT
 ⟆  U+27C6 RIGHT S-SHAPED BAG DELIMITER
 ⟧  U+27E7 MATHEMATICAL RIGHT WHITE SQUARE BRACKET
 ⟩  U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET
 ⟫  U+27EB MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET
 ⟭  U+27ED MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET
 ⟯  U+27EF MATHEMATICAL RIGHT FLATTENED PARENTHESIS
 ⦄  U+2984 RIGHT WHITE CURLY BRACKET
 ⦆  U+2986 RIGHT WHITE PARENTHESIS
 ⦈  U+2988 Z NOTATION RIGHT IMAGE BRACKET
 ⦊  U+298A Z NOTATION RIGHT BINDING BRACKET
 ⦌  U+298C RIGHT SQUARE BRACKET WITH UNDERBAR
 ⦎  U+298E RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
 ⦐  U+2990 RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER
 ⦒  U+2992 RIGHT ANGLE BRACKET WITH DOT
 ⦔  U+2994 RIGHT ARC GREATER-THAN BRACKET
 ⦖  U+2996 DOUBLE RIGHT ARC LESS-THAN BRACKET
 ⦘  U+2998 RIGHT BLACK TORTOISE SHELL BRACKET
 ⧙  U+29D9 RIGHT WIGGLY FENCE
 ⧛  U+29DB RIGHT DOUBLE WIGGLY FENCE
 ⧽  U+29FD RIGHT-POINTING CURVED ANGLE BRACKET
 ⸣  U+2E23 TOP RIGHT HALF BRACKET
 ⸥  U+2E25 BOTTOM RIGHT HALF BRACKET
 ⸧  U+2E27 RIGHT SIDEWAYS U BRACKET
 ⸩  U+2E29 RIGHT DOUBLE PARENTHESIS
 〉 U+3009 RIGHT ANGLE BRACKET
 》 U+300B RIGHT DOUBLE ANGLE BRACKET
 」 U+300D RIGHT CORNER BRACKET
 』 U+300F RIGHT WHITE CORNER BRACKET
 】 U+3011 RIGHT BLACK LENTICULAR BRACKET
 〕 U+3015 RIGHT TORTOISE SHELL BRACKET
 〗 U+3017 RIGHT WHITE LENTICULAR BRACKET
 〙 U+3019 RIGHT WHITE TORTOISE SHELL BRACKET
 〛 U+301B RIGHT WHITE SQUARE BRACKET
 〞 U+301E DOUBLE PRIME QUOTATION MARK
 〟 U+301F LOW DOUBLE PRIME QUOTATION MARK
 ﴾  U+FD3E ORNATE LEFT PARENTHESIS
 ︘ U+FE18 PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET
 ︶ U+FE36 PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS
 ︸ U+FE38 PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET
 ︺ U+FE3A PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET
 ︼ U+FE3C PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET
 ︾ U+FE3E PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET
 ﹀ U+FE40 PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET
 ﹂ U+FE42 PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET
 ﹄ U+FE44 PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET
 ﹈ U+FE48 PRESENTATION FORM FOR VERTICAL RIGHT SQUARE BRACKET
 ﹚ U+FE5A SMALL RIGHT PARENTHESIS
 ﹜ U+FE5C SMALL RIGHT CURLY BRACKET
 ﹞ U+FE5E SMALL RIGHT TORTOISE SHELL BRACKET
 ) U+FF09 FULLWIDTH RIGHT PARENTHESIS
 ] U+FF3D FULLWIDTH RIGHT SQUARE BRACKET
 } U+FF5D FULLWIDTH RIGHT CURLY BRACKET
 ⦆ U+FF60 FULLWIDTH RIGHT WHITE PARENTHESIS
 」  U+FF63 HALFWIDTH RIGHT CORNER BRACKET

使用unichars安装cpan Unicode::Tussle后,在python:

>>> import subprocess
>>> cmd = "unichars '\p{Open_Punctuation}' | cut -f2 -d' ' | tr -d '\n'"
>>> open_punct = subprocess.check_output(cmd, shell=True).decode('utf8')
Smartmatch is experimental at /usr/local/bin/unichars line 546.
>>> print (open_punct)
([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦓⦕⦗⧘⧚⧼⸢⸤⸦⸨〈《「『【〔〖〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅「
相关问题