提取特定单词之前和之后的单词

时间:2018-09-12 11:30:55

标签: tsql sql-server-2012

我需要在ntext列中提取诸如'%don%'之类的单词之前和之后的单词。

table A, column name: Text

示例: 文字

where it was done it will retrieve the... at the end of the trip clare done everything to improve it is the only one done in these times

我想要以下结果:

was done it clare done everything one done in

我正在使用T-SQL,Left和right函数不适用于包含文本的列的ntext数据类型。

1 个答案:

答案 0 :(得分:2)

正如其他人所说,您可以使用字符串拆分功能将每个单词拆分出来,然后返回所需的单词。使用先前链接的DelimitedSplit8K

CREATE FUNCTION dbo.DelimitedSplit8K
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000)
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT 1 UNION ALL
                 SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
                ),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
                 SELECT s.N1,
                        ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
                   FROM cteStart s
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l
;
go

declare @t table (t ntext);
insert into @t values('where it was done it will retrieve the...'),('at the end of the trip clare done everything to improve'),('we don''t take donut donations here'),('ending in don');

with t as (select cast(t as nvarchar(max)) as t from @t)
    ,d as (select t.t
                 ,case when patindex('%don%',s.Item) > 0 then 1 else 0 end as d
                 ,s.ItemNumber as i
                 ,lag(s.Item,1,'') over (partition by t.t order by s.ItemNumber) + ' '
                  + s.Item + ' '
                  + lead(s.Item,1,'') over (partition by t.t order by s.ItemNumber) as r
           from t
               cross apply dbo.DelimitedSplit8K(t.t, ' ') as s
          )
select t
      ,r
from d
where d = 1
order by t
        ,i;

输出:

+---------------------------------------------------------+-----------------------+
|                            t                            |           r           |
+---------------------------------------------------------+-----------------------+
| at the end of the trip clare done everything to improve | clare done everything |
| ending in don                                           | in don                |
| we don't take donut donations here                      | we don't take         |
| we don't take donut donations here                      | take donut donations  |
| we don't take donut donations here                      | donut donations here  |
| where it was done it will retrieve the...               | was done it           |
+---------------------------------------------------------+-----------------------+

还有一个可行的例子:

http://rextester.com/RND43071