保存非英语字符时发出问题

时间:2015-08-04 16:25:24

标签: mysql spring hibernate spring-mvc utf-8

我们正在使用一个应用程序,我们需要使用语言Gujarati保存数据。

Applcation中使用的技术如下所示。

  • Spring MVC Version 4.1.6.RELEASE
  • Hibernate版本4.3.5.Final
  • MySQL 6.0.11

我的JSP配置了

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Hibernate配置

<prop key="hibernate.connection.useUnicode">true</prop>
<prop key="hibernate.connection.characterEncoding">UTF-8</prop>
<prop key="hibernate.connection.charSet">UTF-8</prop>

MySQL网址

jdbc:mysql://host:port/dbName?useUnicode=true&connectionCollation=utf8_general_ci&characterSetResults=utf8

Pojo有String字段来存储该数据。

MySQL有VARCHAR数据类型来存储数据 charset = utf8 Collat​​ion = utf8_general_ci

当我试图保存任何非英语(古吉拉特语)字符时,它会显示一些垃圾字符,例如“{1}}为”ગુજ“。

我在这里错过了其他配置。

5 个答案:

答案 0 :(得分:7)

我在将“ tamil ”字符插入数据库时​​遇到了同样的问题。经过大量的冲浪后,我得到了一个更好的工作解决方案,它解决了我的问题。在这里我与我分享我的解决方案你希望它可以帮助你清除你对非英语角色的怀疑。

INSERT INTO 
STUDENT(name,address) 
VALUES 
(N'பெயர்', N'முகவரி');

我正在使用示例,因为您没有向我提供任何表格和字段名称的结构。

答案 1 :(得分:5)

I am assuming you want ગુજ (GA JA with Vowel sign U)?

I think you somehow specified "latin5". (Yes I see you have UTF-8 everywhere, but "latin5" is the only way I can make things work.)

CONVERT(CONVERT(UNHEX('C3A0C2AAC297C3A0C2ABC281C3A0C2AAC29C')
       USING utf8) USING latin5) = 'ગુજ'

Plus you ended up with "double encoding"; I suspect this is what happened:

  • The client had characters encoded as utf8 (good); and
  • SET NAMES latin5 was used, but it lied by claiming that the client had latin5 encoding; and
  • The column in the table declared CHARACTER SET utf8 (good).

If possible, it would be better to start over -- empty the tables, be sure to have SET NAMES utf8 or establish utf8 when connecting from your client to the database. Then repopulate the tables.

If you would rather try to recover the existing data, this might work:

UPDATE ... SET col = CONVERT(BINARY(CONVERT(
                         CONVERT(UNHEX(col) USING utf8)
                         USING latin5)) USING utf8);

But you would need to do that for each messed up column in each table.

A partial test of that code is to do

SELECT CONVERT(BINARY(CONVERT(
                         CONVERT(UNHEX(col) USING utf8)
                         USING latin5)) USING utf8)
     FROM table;

I say "partial test" because looking right may not prove that is right.

After the UPDATE, SELECT HEX(col) get E0AA97E0AB81E0AA9C for ગુજ. Note that most Gujarati hex should be of the form E0AAyy or E0AByy. You might also find 20 for a blank space.

I apologize for not being more certain. I have been tackling Character Set issues for a decade, but this is a new variant.

答案 2 :(得分:4)

可能有一些你可能错过的东西。我在linux上遇到了与mysql相同的问题,我要做的就是像这样编辑my.cnf

[client]
default-character-set = utf8

[mysqld]
character-set-server = utf8

例如在Centos上,此文件位于Windows(我的电脑)/etc/my.cnfC:\ProgramData\MySQL\MySQL Server 5.5\my.ini的位置。请注意,ProgramData可能会被隐藏。

如果您使用Tomcat,另一件事是您必须选择UTF-8进行URI编码。只需修改server.xml并修改主Connector元素:

<Connector port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           URIEncoding="UTF-8"
           redirectPort="8443" />

还要确保在应用程序中添加了字符编码过滤器:

@WebFilter(filterName = "CharacterEncodingFilter", urlPatterns = {"/*"})
public class CharacterEncodingFilter implements Filter {

    @Override
    public void init(FilterConfig filterConfig)
            throws ServletException {
    }

    @Override
    public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain)
            throws IOException, ServletException {
        HttpServletRequest request = (HttpServletRequest) servletRequest;

        request.setCharacterEncoding("UTF-8");
        servletResponse.setContentType("text/html; charset=UTF-8");

        filterChain.doFilter(request, servletResponse);
    }

    @Override
    public void destroy() {
    }

}

希望这有帮助。

答案 3 :(得分:3)

另一个提示,不要只依赖于将characterEncoding设置为hibernate属性<prop key="hibernate.connection.characterEncoding">UTF-8</prop>,请确保将其明确地添加为DB url上的连接变量,所以

jdbc:mysql://host:port/dbName?useUnicode=true&characterEncoding=UTF-8&connectionCollation=utf8_general_ci&characterSetResults=utf8

此外,由于有许多层会丢失编码,因此您可以尝试隔离图层并更新问题。例如。如果它存储到DB,或者在

之前的某个时刻

答案 4 :(得分:2)

您的applicationContext文件应如下所示:

要使Spring MVC应用程序支持国际化,请注册两个bean:

  1. SessionLocaleResolver 注册一个“SessionLocaleResolver”bean,将它命名为完全相同的字符“localeResolver”。它通过从用户的会话中获取预定义属性来解析语言环境。 注意 如果您没有注册任何“localeResolver”,将使用默认的AcceptHeaderLocaleResolver,它通过检查HTTP请求中的accept-language标头来解析语言环境。

  2. LocaleChangeInterceptor 注册“LocaleChangeInterceptor”拦截器并将其引用到需要支持多种语言的任何处理程序映射。 “paramName”是用于设置区域设置的参数值。

    <bean id="localeResolver"
        class="org.springframework.web.servlet.i18n.SessionLocaleResolver">
        <property name="defaultLocale" value="en" />
    </bean>
    
    <bean id="localeChangeInterceptor"
        class="org.springframework.web.servlet.i18n.LocaleChangeInterceptor">
        <property name="paramName" value="language" />
    </bean>
    
    <bean class="org.springframework.web.servlet.mvc.support.ControllerClassNameHandlerMapping" >
        <property name="interceptors">
           <list>
            <ref bean="localeChangeInterceptor" />
           </list>
        </property>
    </bean>
    
    <!-- Register the bean -->
    <bean class="com.common.controller.WelcomeController" />
    
    <!-- Register the welcome.properties -->
    <bean id="messageSource"
        class="org.springframework.context.support.ResourceBundleMessageSource">
        <property name="basename" value="welcome" />
    </bean>
    
    <bean id="viewResolver"
        class="org.springframework.web.servlet.view.InternalResourceViewResolver" >
        <property name="prefix">
            <value>/WEB-INF/pages/</value>
        </property>
        <property name="suffix">
            <value>.jsp</value>
        </property>
    </bean>
    

  3. native2ascii是JDK中的一个方便的工具内置,用于将具有“非拉丁语1”或“非Unicode”字符的文件转换为“Unicode编码”字符。

    Native2ascii示例

    1. 创建文件(source.txt)
    2. 创建一个名为“source.txt”的文件,在其中放入一些中文字符,并将其保存为“UTF-8”格式。

      1. native2ascii的
      2. 使用native2ascii命令将其转换为Unicode格式。

        C:&gt; native2ascii -encoding utf8 c:\ source.txt c:\ output.txt

        native2ascii将读取“c:\ source.txt”中的所有字符并使用“utf8”格式对其进行编码,并将所有编码字符输出到“c:\ output.txt”

        1. 读取输出
        2. 打开“c:\ output.txt”,您将看到所有编码的字符,例如\ ufeff \ u6768 \ u6728 \ u91d1

          welcome.properties

          welcome.springmvc = \ u5feb \ u4e50 \ u5b66 \ u4e60

            

          调用上面的字符串并将值存储在数据库中。

          如果你想在JSP页面中显示:

            

          请记住添加行

               

          “&lt;%@ page contentType =”text / html; charset = UTF-8“%&gt;”

               

          在jsp页面的顶部,否则页面可能无法显示UTF-8   (中文)字符正确。