从数据库字段中删除特殊字符

时间:2011-05-20 02:43:01

标签: mysql sql database

我有一个包含数千条记录的数据库,我需要删除其中一个字段以确保它只包含某些字符(字母数字,空格和单引号)。我可以使用什么SQL从整个数据库中的该字段中去除任何其他字符(例如斜杠等)?

10 个答案:

答案 0 :(得分:22)

update mytable
set FieldName = REPLACE(FieldName,'/','')

这是一个很好的起点。

答案 1 :(得分:5)

Replace()函数是首选。但是,在控制台中编写特殊字符有时会很棘手。对于那些你可以组合替换与Char()函数。

e.g。删除€

Update products set description = replace(description, char(128), '');

您可以找到所有Ascii values here

理想情况下,您可以使用正则表达式查找所有特殊字符,但显然that's not possible with MySQL.

除此之外,你需要通过自己喜欢的脚本语言来运行它。

答案 2 :(得分:3)

这也可能有用。

首先,您必须知道数据库和/或表的字符集。例如,假设您有一个UTF-8环境,并且您想从字段中删除/删除带圆圈的注册符号,带圆圈的版权符号和注册商标符号等符号,然后通过bing或yahoo或google搜索互联网。 UTF-8系统中这些符号的十六进制代码值:

Symbol                       Utf-8 Hex
=======                      =========
circled copyright              C2A9
circled registered             C2AE
Trademark (i.e., TM)           E284A2

然后使用hex / unhex工具和replace函数从表t1中清除选择字段f1的sql,很可能看起来像这样:

SELECT cast(unhex(replace(replace(replace(hex(f1),'C2A9',''),'C2AE',''),'E284A2','')) AS char) AS cleanf1 FROM t1 ;

在上面,注意要擦洗/清理的原始字段是f1,表是t1,输出标题是cleanf1。 “as char”转换是必要的,因为没有它,我测试的mysql 5.5.8返回blob。希望这有帮助

答案 3 :(得分:2)

阐述Vinnies答案......你可以使用以下内容(注意最后两个陈述中的逃避......

update table set column = REPLACE(column,"`","");
update table set column = REPLACE(column,"~","");
update table set column = REPLACE(column,"!","");
update table set column = REPLACE(column,"@","");
update table set column = REPLACE(column,"#","");
update table set column = REPLACE(column,"$","");
update table set column = REPLACE(column,"%","");
update table set column = REPLACE(column,"^","");
update table set column = REPLACE(column,"&","");
update table set column = REPLACE(column,"*","");
update table set column = REPLACE(column,"(","");
update table set column = REPLACE(column,")","");
update table set column = REPLACE(column,"-","");
update table set column = REPLACE(column,"_","");
update table set column = REPLACE(column,"=","");
update table set column = REPLACE(column,"+","");
update table set column = REPLACE(column,"{","");
update table set column = REPLACE(column,"}","");
update table set column = REPLACE(column,"[","");
update table set column = REPLACE(column,"]","");
update table set column = REPLACE(column,"|","");
update table set column = REPLACE(column,";","");
update table set column = REPLACE(column,":","");
update table set column = REPLACE(column,"'","");
update table set column = REPLACE(column,"<","");
update table set column = REPLACE(column,",","");
update table set column = REPLACE(column,">","");
update table set column = REPLACE(column,".","");
update table set column = REPLACE(column,"/","");
update table set column = REPLACE(column,"?","");
update table set column = REPLACE(column,"\\","");
update table set column = REPLACE(column,"\"","");

答案 4 :(得分:2)

我已为此

创建了简单的功能
DROP FUNCTION IF EXISTS `regex_replace`$$

CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000)) RETURNS VARCHAR(1000) CHARSET utf8mb4
    DETERMINISTIC
BEGIN    
    DECLARE temp VARCHAR(1000); 
    DECLARE ch VARCHAR(1); 
    DECLARE i INT;
    SET i = 1;
    SET temp = '';
    IF original REGEXP pattern THEN 
        loop_label: LOOP 
            IF i>CHAR_LENGTH(original) THEN
                LEAVE loop_label;  
            END IF;

            SET ch = SUBSTRING(original,i,1);

            IF NOT ch REGEXP pattern THEN
                SET temp = CONCAT(temp,ch);
            ELSE
                SET temp = CONCAT(temp,replacement);
            END IF;

            SET i=i+1;
        END LOOP;
    ELSE
        SET temp = original;
    END IF;

    RETURN temp;
END

用法示例:

SELECT <field-name> AS NormalText, regex_replace('[^A-Za-z0-9 ]', '', <field-name>)AS RegexText FROM 
<table-name>

答案 5 :(得分:1)

查看需要编译到MySQL服务器中的LIB_MYSQLUDF_PREG,但是具有高级正则表达式功能,例如preg_replace,这将有助于您完成任务。

答案 6 :(得分:1)

Adeel的答案是最好的和最简单的。

OP需要更新数据库,这也是我需要的。所以我想我会把它放在这里,像我这样的下一个可怜的鞋底,而不必重做我所做的事情。

先仔细检查,然后进行选择并对其进行扫描,以确保在更新之前,您得到的行正确。

SELECT REGEXP_REPLACE(columnName, '[^\\x20-\\x7E]', '') from tableName;

计数以进行安全检查...

SELECT count(*) from tableName WHERE columnName REGEXP '[^\\x20-\\x7E]';

对于某些名称,我不得不做另一个映射,以免失去它们的含义,如拉蒙到拉姆,因为o具有变音符号,重音符号或抑扬音符号。所以我用它来映射... https://theasciicode.com.ar

然后更新:此更新是映射更新之后的全部内容。将限制数更改为高于...的计数值...

UPDATE tablename SET columnName = REGEXP_REPLACE(columnName, '[^\\x20-\\x7E]', '') WHERE columnName REGEXP '[^\\x20-\\x7E]' LIMIT 1;

答案 7 :(得分:0)

我的MySQL版本没有REGEXP_REPLACE()。我使用以下两种解决方法: 1.删​​除指定的字符(如果您知道要删除的字符)

    create function fn_remove_selected_characters
        (v_input_string varchar(255),
         v_unacceptable_characters varchar(255))
    RETURNS varchar(255)
    BEGIN

    -- declare variables
    declare i int;
    declare unacceptable_values varchar(255);
    declare this_character char(1);
    declare output_string varchar(255);
    declare input_length int;
    declare boolean_value int;
    declare space varchar(3);

    -- Set variable values
    set input_length = char_length(v_input_string);
    set i = 0;
    set unacceptable_values = v_unacceptable_characters;
    set output_string = '';
    set boolean_value = 0;
    set space = 'no';

    begin
    -- Leave spaces if they aren't in the exclude list
    if instr( unacceptable_values, ' ') = 0 then
        begin
        while i < input_length do
            SET this_character = SUBSTRING( v_input_string, i, 1 );
                -- If the current character is a space, 
                -- then concatenate a space to the output
                -- Although it seems redundant to explicitly add a space,
                -- SUBSTRING() equates a space to the empty string
                if this_character = ' ' then
                    set output_string = concat(output_string, ' ');
                -- if the current character is not a space, remove it if it's unwanted
                elseif instr(unacceptable_values, this_character) then
                    set output_string = concat(output_string, '');
                -- otherwise include the character
                else set output_string = concat(output_string, this_character);
                end if;
            set i = i + 1;
        end while;
        end;
    else
        begin
        while i < input_length do
            begin
            SET this_character = SUBSTRING( v_input_string, i, 1 );
            if instr(unacceptable_values, this_character) > 0 then
                set output_string = concat(output_string, '');
            else set output_string = concat(output_string, this_character);
            end if;
            end;
            set i = i + 1;
        end while;
        end;
    end if;
    end;
        RETURN output_string;
  1. 只保留想要的字符:
    create function fn_preserve_selected_characters
        (v_input_string varchar(255),
         v_acceptable_characters varchar(255))
    returns varchar(255)

    begin
    declare i int;
    declare acceptable_values varchar(255);
    declare this_character char(1);
    declare output_string varchar(255);
    declare input_length int;
    declare boolean_value int;
    declare space varchar(3);

    set input_length = char_length(v_input_string);
    set i = 0;
    set acceptable_values = v_acceptable_characters;
    set output_string = '';
    set boolean_value = 0;
    set space = 'no';

    begin

    -- check for existence of spaces
    if instr( acceptable_values, ' ') then
        begin
        while i < input_length do
            -- SUBSTRING() treats spaces as empty strings
            -- so handle them specially
            SET this_character = SUBSTRING( v_input_string, i, 1 );
                if this_character = ' ' then
                    set output_string = concat(output_string, ' ');
                elseif instr(acceptable_values, this_character) then
                    set output_string = concat(output_string, this_character);
                else set output_string = concat(output_string, '');
                end if;
            set i = i + 1;
        end while;
        end;
    -- if there are no spaces in input string
    -- then this section is complete
    else 
        begin
        while i <= input_length do
            SET this_character = SUBSTRING( v_input_string, i, 1 );
            -- if the current character exists in the punctuation string
            if LOCATE( this_character, acceptable_values ) > 0 THEN
                set output_string = concat(output_string, this_character);
            end if;
            set i = i+1;
        end while;
        end;
    end if;
    end;
        RETURN output_string;

答案 8 :(得分:0)

没有正则表达式替换。使用以下代码将所有特殊字符替换为“-”。

UPDATE <table> SET <column> = REPLACE ( REPLACE ( REPLACE ( REPLACE ( REPLACE ( REPLACE ( REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (<column>, '/', '-'), ',', '-'), '.', '-'), '<', '-'), '>', '-'), '?', '-'), ';', '-'), ':', '-'), '"', '-'), "'", '-'), '|', '-'), '\\', '-'), '=', '-'), '+', '-'), '*', '-'), '&', '-'), '^', '-'), '%', '-'), '$', '-'), '#', '-'), '@', '-'), '!', '-'), '~', '-'), '`', '-'), '', '-'), '{', '-' ), '}', '-' ), '[', '-' ), ']', '-' ), '(', '-' ), ')', '-' )

代码格式

UPDATE
    <table>
SET
    <column> =
REPLACE
    (
    REPLACE
        (
        REPLACE
            (
            REPLACE
                (
                REPLACE
                    (
                    REPLACE
                        (
                        REPLACE
                            (
                            REPLACE
                                (
                                REPLACE
                                    (
                                    REPLACE
                                        (
                                        REPLACE
                                            (
                                            REPLACE
                                                (
                                                REPLACE
                                                    (
                                                    REPLACE
                                                        (
                                                        REPLACE
                                                            (
                                                            REPLACE
                                                                (
                                                                REPLACE
                                                                    (
                                                                    REPLACE
                                                                        (
                                                                        REPLACE
                                                                            (
                                                                            REPLACE
                                                                                (
                                                                                REPLACE
                                                                                    (
                                                                                    REPLACE
                                                                                        (
                                                                                        REPLACE
                                                                                            (
                                                                                            REPLACE
                                                                                                (
                                                                                                REPLACE
                                                                                                    (
                                                                                                    REPLACE
                                                                                                        (
                                                                                                        REPLACE
                                                                                                            (
                                                                                                            REPLACE
                                                                                                                (
                                                                                                                REPLACE
                                                                                                                    (
                                                                                                                    REPLACE
                                                                                                                        (
                                                                                                                    REPLACE
                                                                                                                        (<column>, '/', '-'),
                                                                                                                        ',',
                                                                                                                        '-'
                                                                                                                    ),
                                                                                                                    '.',
                                                                                                                    '-'
                                                                                                                ),
                                                                                                                '<',
                                                                                                                '-'
                                                                                                            ),
                                                                                                            '>',
                                                                                                            '-'
                                                                                                        ),
                                                                                                        '?',
                                                                                                        '-'
                                                                                                    ),
                                                                                                    ';',
                                                                                                    '-'
                                                                                                ),
                                                                                                ':',
                                                                                                '-'
                                                                                            ),
                                                                                            '"',
                                                                                            '-'
                                                                                        ),
                                                                                        "'",
                                                                                        '-'
                                                                                    ),
                                                                                    '|',
                                                                                    '-'
                                                                                ),
                                                                                '\\',
                                                                                '-'
                                                                            ),
                                                                            '=',
                                                                            '-'
                                                                        ),
                                                                        '+',
                                                                        '-'
                                                                    ),
                                                                    '*',
                                                                    '-'
                                                                ),
                                                                '&',
                                                                '-'
                                                            ),
                                                            '^',
                                                            '-'
                                                        ),
                                                        '%',
                                                        '-'
                                                    ),
                                                    '$',
                                                    '-'
                                                ),
                                                '#',
                                                '-'
                                            ),
                                            '@',
                                            '-'
                                        ),
                                        '!',
                                        '-'
                                    ),
                                    '~',
                                    '-'
                                ),
                                '`',
                                '-'
                            ),
                            '',
                            '-'
                        ),
                        '{',
                        '-'
                    ),
                    '}',
                    '-'
                ),
                '[',
                '-'
            ),
            ']',
            '-'
        ),
        '(',
        '-'
    ),
    ')',
    '-'
)

答案 9 :(得分:0)

这可能有用。

此解决方案不涉及在替换过程中创建过程或函数或长时间使用替换。相反,我们知道所有不涉及特殊字符的ASCII字符都位于ASCII代码\ x20- \ x7E(十六进制表示形式)内。资源 ASCII From Wikipedia, the free encyclopedia以下是该间隔中的所有这些字符。

Hex: 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E
Glyph:  space ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ↑ ← @ a b c d e f g h i j k l m n o p q r s t u v w x y z { ACK } ESC

以简单的正则表达式替换即可完成工作

SELECT REGEXP_REPLACE(columnName, '[^\\x20-\\x7E]', '') from tableName;

PHP 自定义查询字符串

$query = "select REGEXP_REPLACE(columnName, '(.*)[(].*[)](.*)', CONCAT('\\\\1', '\\\\2')) `Alias` FROM table_Name";

以上语句替换了括号之间以及括号之间的内容。

PS:我正在使用存储过程中的prepare语句或通过PHP(创建自定义查询字符串)进行任何DML(选择,更新...)操作;然后记住要逃脱斜线

SET @sql = CONCAT("SELECT REGEXP_REPLACE(columnName, '[^\\\\x20-\\\\x7E]', '') from tableName");
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

上面的SQL语句执行一个简单的正则表达式替换(实际上删除了)所有特殊字符;即在SQL中,REGEX模式提到了所有不用替换的特殊字符。

模式说明

一个字符组以方括号开始。第一个字符是插入号,表示;否定组中提到的所有字符(即方括号中的)。这只是意味着要对组中所有字符进行选择称赞(选择的字符除外)。

仅对以上陈述进行总结

不变:所有字母数字字符,标点符号,算术运算符。

删除所有Unicode字符(拉丁字母除外)或特殊字符。