从T-SQL数据表中获取单词的最快方法是什么?

时间:2016-07-31 13:50:14

标签: c# sql-server sql-server-2008 datatable

我有一个SQL Server 2008 R2数据表dbo.Forum_Posts,列Subject (nvarchar(255))Body (nvarchar(max))

我想从列SubjectBody获取长度为> = 3的所有单词,并将它们插入数据表dbo.Search_Word (column Word, nvarchar(100))和数据表dbo.SearchItem (column Title (nvarchar(200))

我还希望从SearchWordsID (primary key, autoincrement, int)获取新生成的dbo.Search_Word,从SearchItemID (primary key, autoincrement,int)获取dbo.SearchItem,并将其插入数据表dbo.SearchItemWord (columns SearchWordsID (foreign key,int, not null)SearchItemID (foreign key,int,not null) }。

在T-SQL中执行此操作的最快方法是什么?或者我必须使用C#?提前感谢您的帮助。

3 个答案:

答案 0 :(得分:1)

根据要求,这将保留ID。因此,您将获得作品BY ID的DISTINCT列表。

与第一个答案略有不同,但通过外部应用

可轻松实现

**

  

您必须编辑初始查询从[YourTable]中选择KeyID = [YourKeyID],Words = [YourField1] +''+ [YourField2]

**

Declare @String    varchar(max) = ''
Declare @Delimeter varchar(25)  = ' '

-- Generate and Strip special characters
Declare @StripChar table (Chr varchar(10));Insert Into @StripChar values ('.'),(','),('/'),('('),(')'),(':')  -- Add/Remove as needed

-- Generate Base Data and Expand via Outer Apply
Declare @XML xml
Set @XML = (
            Select A.KeyID
                  ,B.Word
             From ( Select KeyID=[YourKeyID],Words=[YourField1]+' '+[YourField2] from [YourTable]) A
             Outer Apply (
                          Select Word=split.a.value('.', 'varchar(150)') 
                           From  (Select Cast ('<x>' + Replace(A.Words, @Delimeter, '</x><x>')+ '</x>' AS XML) AS Data) AS A 
                           Cross Apply data.nodes ('/x') AS Split(a)
             ) B
 For XML RAW)

-- Convert XML to varchar(max) for Global Search & Replace (could be promoted to Outer Appy)
Select @String = Replace(Replace(cast(@XML as varchar(max)),Chr,' '),'  ',' ') From @StripChar
Select @XML    = cast(@String as XML)

Select Distinct
       KeyID = t.col.value('@KeyID', 'int')
      ,Word  = t.col.value('@Word', 'varchar(150)')
 From  @XML.nodes('/row') AS t (col)
 Where Len(t.col.value('@Word', 'varchar(150)'))>3
 Order By 1

返回

KetID   Word
0       UNDEF
0       Undefined
1       HIER
1       System
2       Control
2       UNDEF
3       JOBCONTROL
3       Market
3       Performance
...
87      Analyitics
87      Market
87      UNDEF
88      Branches
88      FDIC
88      UNDEF
...

答案 1 :(得分:0)

您将需要T-SQL来插入表格。你面临的最大挑战是将帖子分成单词。

我的建议是将帖子读成C#,将每个帖子拆分为单词(您可以使用Split方法拆分空格或标点符号),过滤单词集合,然后执行插入C#。

如果使用Entity Framework或类似的ORM,则可以避免直接使用T-SQL。

不要尝试使用T-SQL将帖子分成单词,除非你真的想要一个完全的SQL解决方案,并愿意花时间来完善它。而且,是的,它会很慢:T-SQL在字符串操作方面并不快。

您还可以调查全文索引,我相信它支持搜索关键字。

答案 2 :(得分:0)

也许这会有所帮助

Declare @String varchar(max) = ''
Declare @Delimeter varchar(25)  = ' '

Select @String = @String + ' '+Words
  From (
         Select Words=[YourField1]+' '+[YourField2] from [YourTable]
       ) A

-- Generate and Strip special characters
Declare @StripChar table (Chr varchar(10));Insert Into @StripChar values ('.'),(','),('/'),('('),(')'),(':')  -- Add/Remove as needed
Select @String = Replace(Replace(@String,Chr,' '),'  ',' ') From @StripChar

-- Convert String into XML and Split Delimited String
Declare @Table Table (RowNr int Identity(1,1), String varchar(100))
Declare @XML xml = Cast('<x>' + Replace(@String,@Delimeter,'</x><x>')+'</x>' as XML)
Insert Into @Table Select String.value('.', 'varchar(max)') From @XML.nodes('x') as T(String)

-- Generate Final Resuls
Select Distinct String
 From  @Table
 Where Len(String)>3
 Order By 1

返回(样本)

    String
    ------------------
    Access
    Active
    Adminstrators
    Alternate
    Analyitics
    Applications
    Branches
    Cappelletti
    City
    Class
    Code
    Comments
    Contact
    Control
    Daily
    Data
    Date
    Definition
    Deleted
    Down
    Email
    FDIC
    Variables
    Weekly