使用Unicode字符粉碎SQL Server XML

时间:2013-04-12 18:21:57

标签: xml sql-server-2008

在我之前的问题中 SQL Server XML String Manipluation

我得到了以下答案(感谢Mikael Eriksson)粉碎XML文档,并删除不需要的单词。我现在需要更进一步,并删除超过255的Unicode字符。当我在XML中有这些字符时,它们作为问号存储在@T表变量(在下面的代码中)中。我怎样才能将这些字符作为实际的Unicode字符来实现,所以我可以将它们删除?

我有一个可以很好地删除不需要的字符的函数,但由于Unicode以问号形式出现,因此它不会触及它们

 -- A table to hold the bad words
declare @BadWords table
(
  ID int identity,
  Value nvarchar(10)
)

-- These are the bad ones.
insert into @BadWords values
('one'),
('three'),
('five'),
('hold')

-- XML that needs cleaning
declare @XML xml = '
<root>
  <itemone ID="1one1">1one1</itemone>
  <itemtwo>2two2</itemtwo>
  <items>
    <item>1one1</item>
    <item>2two2</item>
    <item>onetwothreefourfive</item>
  </items>
  <hold>We hold these truths to be self evident</hold>
</root>
'

-- A helper table to hold the values to modify
declare @T table
(
  ID int identity,
  Pos int,
  OldValue nvarchar(max),
  NewValue nvarchar(max),
  Attribute bit
)

-- Get all attributes from the XML
insert into @T(Pos, OldValue, NewValue, Attribute)
select row_number() over(order by T.N),
       T.N.value('.', 'nvarchar(max)'),
       T.N.value('.', 'nvarchar(max)'),
       1
from @XML.nodes('//@*') as T(N)

-- Get all values from the XML
insert into @T(Pos, OldValue, NewValue, Attribute)
select row_number() over(order by T.N),
       T.N.value('text()[1]', 'nvarchar(max)'),
       T.N.value('text()[1]', 'nvarchar(max)'),
       0
from @XML.nodes('//*') as T(N)

declare @ID int
declare @Pos int
declare @Value nvarchar(max)
declare @Attribute bit

-- Remove the bad words from @T, one bad word at a time
select @ID = max(ID) from @BadWords
while @ID > 0
begin
  select @Value = Value
  from @BadWords
  where ID = @ID

  update @T
  set NewValue = replace(NewValue, @Value, '')

  set @ID -= 1
end

-- Write the cleaned values back to the XML
select @ID = max(ID) from @T
while @ID > 0
begin
  select @Value = nullif(NewValue, OldValue),
         @Attribute = Attribute,
         @Pos = Pos
  from @T
  where ID = @ID

  print @Attribute

  if @Value is not null
    if @Attribute = 1  
      set @XML.modify('replace value of ((//@*)[sql:variable("@Pos")])[1] 
                       with sql:variable("@Value")')
    else
      set @XML.modify('replace value of ((//*)[sql:variable("@Pos")]/text())[1] 
                           with sql:variable("@Value")')
  set @ID -= 1
end

select @XML

1 个答案:

答案 0 :(得分:2)

这部分关注:

insert into @BadWords values
('one'),
('three'),
('five'),
('hold')

您需要Unicode字符串文字的N前缀。如果没有N,您的代码会将它们视为VARCHAR,并且您会获得多字节字符的问号。还有其他地方你也必须使用Unicode友好字符串。 XML通常是UTF-8,因此应该能够处理Unicode字符,尽管标准不鼓励使用these。您的代码应如下所示:

insert into @BadWords values
(N'one'),
(N'three'),
(N'five'),
(N'hold')
相关问题