3

I'm in the process of creating a temporary procedure in SQL because I have a value of a table which is written in markdown, so it appear as rendered HTML in the web browser (markdown to HTML conversion).

String of the column currently look like this:

Questions about **general computing hardware and software** are off-topic for Stack Overflow unless they directly involve tools used primarily for programming. You may be able to get help on [Super User](http://superuser.com/about)

I'm currently working with bold and italic text. This mean (in the case of bold text) I will need to replace odd N times the pattern**with<b>and even times with</b>.
I saw replace() but it perform the replacement on all the patterns of the string.

So How I can replace a sub-string only if it is odd or only it is even?

Update: Some peoples wonder what schemas I'm using so just take a look here.

One more extra if you want: The markdown style hyperlink to html hyperlink doesn't look so simple.

Community
  • 1
  • 1
user2284570
  • 2,891
  • 3
  • 26
  • 74
  • 2
    What the hell is an "impair substring?" – Robert Harvey Jul 30 '14 at 23:42
  • @RobertHarvey For example, If in the String, I have `** ** ** ** ** **`, then this sub-string/pattern is present six times in the current string. Now, imagine I assign a number to each pattern based on their position, I will access each of them with`1; 2; 3; 4; 5`, So impair sub-string correspond to the positions`1; 3; 5;` and pairs sub-string to the positions `2; 4; 6`. – user2284570 Jul 30 '14 at 23:49
  • 1
    Odd and even positions, then? – Robert Harvey Jul 30 '14 at 23:51
  • @RobertHarvey : Sorry, I thought it was the same word as in French. So yes, you are right. – user2284570 Jul 30 '14 at 23:53
  • I think if you can identify what instance of the string it is (first, second, etc), you can use the [MODULO](http://msdn.microsoft.com/en-us/library/ms190279.aspx) function: ` % 2`. If it returns 0, it's an even number, non-zero it's odd. – Andrew Jul 31 '14 at 00:40
  • Here is a [similar question](http://stackoverflow.com/questions/8845961/sql-server-2008-replacing-only-the-nth-instance-of-character) on how to use STUFF() – Be Kind To New Users Jul 31 '14 at 00:42
  • Why are you doing markdown formatting in SQL? The markdown specification and current implementations are just one big bag of special cases, you're going to have a **really** tough time doing this. Why not just offload this formatting to the GUI layer, where presumably existing markdown libraries can be used? In other words, *why are you doing text->html conversion in SQL*? – Lasse V. Karlsen Jul 31 '14 at 07:32
  • @LasseV.Karlsen : just look at the TAGs. Then, look at [`CloseAsOffTopicReasonTypes.MarkdownMini`](http://meta.stackexchange.com/a/2678/242800). So I can't modify any part of the GUI. – user2284570 Jul 31 '14 at 07:36

4 Answers4

2

Using theSTUFFfunction and a simpleWHILEloop:

CREATE FUNCTION dbo.fn_OddEvenReplace(@text nvarchar(500), 
                                      @textToReplace nvarchar(10), 
                                      @oddText nvarchar(10), 
                                      @evenText nvarchar(500))
RETURNS varchar(max)
AS
BEGIN
    DECLARE @counter tinyint
    SET @counter = 1

    DECLARE @switchText nvarchar(10)
    WHILE CHARINDEX(@textToReplace, @text, 1) > 0
    BEGIN
        SELECT @text = STUFF(@text, 
                    CHARINDEX(@textToReplace, @text, 1), 
                    LEN(@textToReplace), 
                    IIF(@counter%2=0,@evenText,@oddText)),
                @counter = @counter + 1
    END
    RETURN @text
END

And you can use it like this:

SELECT dbo.fn_OddEvenReplace(column, '**', '<b>', '</b>')
FROM table

UPDATE:

This is re-written as an SP:

CREATE PROC dbo.##sp_OddEvenReplace @text nvarchar(500), 
                                  @textToReplace nvarchar(10), 
                                  @oddText nvarchar(10), 
                                  @evenText nvarchar(10),
                                  @returnText nvarchar(500) output
AS
BEGIN
    DECLARE @counter tinyint
    SET @counter = 1

    DECLARE @switchText nvarchar(10)
    WHILE CHARINDEX(@textToReplace, @text, 1) > 0
    BEGIN
        SELECT @text = STUFF(@text, 
                    CHARINDEX(@textToReplace, @text, 1), 
                    LEN(@textToReplace), 
                    IIF(@counter%2=0,@evenText,@oddText)),
                @counter = @counter + 1
    END
    SET @returnText = @text
END
GO

And to execute:

DECLARE @returnText nvarchar(500)
EXEC dbo.##sp_OddEvenReplace '**a** **b** **c**', '**', '<b>', '</b>', @returnText output

SELECT @returnText
user2284570
  • 2,891
  • 3
  • 26
  • 74
Alireza
  • 4,976
  • 1
  • 23
  • 36
1

As per OP's request I have modified my earlier answer to perform as a temporary stored procedure. I have left my earlier answer as I believe the usage against a table of strings to be useful also.

If a Tally (or Numbers) table is known to already exist with at least 8000 values, then the marked section of the CTE can be omitted and the CTE reference tally replaced with the name of the existing Tally table.

create procedure #HtmlTagExpander(
     @InString   varchar(8000) 
    ,@OutString  varchar(8000)  output
)as 
begin
    declare @Delimiter  char(2) = '**';

    create table #t( 
         StartLocation  int             not null
        ,EndLocation    int             not null

        ,constraint PK unique clustered (StartLocation desc)
    );

    with 
          -- vvv Only needed in absence of Tally table vvv
    E1(N) as ( 
        select 1 from (values
            (1),(1),(1),(1),(1),
            (1),(1),(1),(1),(1)
        ) E1(N)
    ),                                              --10E+1 or 10 rows
    E2(N) as (select 1 from E1 a cross join E1 b),  --10E+2 or 100 rows
    E4(N) As (select 1 from E2 a cross join E2 b),  --10E+4 or 10,000 rows max
    tally(N) as (select row_number() over (order by (select null)) from E4),
          -- ^^^ Only needed in absence of Tally table ^^^

    Delimiter as (
        select len(@Delimiter)     as Length,
               len(@Delimiter)-1   as Offset
    ),
    cteTally(N) AS (
        select top (isnull(datalength(@InString),0)) 
            row_number() over (order by (select null)) 
        from tally
    ),
    cteStart(N1) AS 
        select 
            t.N 
        from cteTally t cross join Delimiter 
        where substring(@InString, t.N, Delimiter.Length) = @Delimiter
    ),
    cteValues as (
        select
             TagNumber = row_number() over(order by N1)
            ,Location   = N1
        from cteStart
    ),
    HtmlTagSpotter as (
        select
             TagNumber
            ,Location
        from cteValues
    ),
    tags as (
        select 
             Location       = f.Location
            ,IsOpen         = cast((TagNumber % 2) as bit)
            ,Occurrence     = TagNumber
        from HtmlTagSpotter f
    )
    insert #t(StartLocation,EndLocation)
    select 
         prev.Location
        ,data.Location
    from tags data
    join tags prev
       on prev.Occurrence = data.Occurrence - 1
      and prev.IsOpen     = 1;

    set @outString = @Instring;

    update this
    set @outString = stuff(stuff(@outString,this.EndLocation,  2,'</b>')
                                           ,this.StartLocation,2,'<b>')
    from #t this with (tablockx)
    option (maxdop 1);
end
go

Invoked like this:

declare @InString   varchar(8000) 
       ,@OutString  varchar(8000);

set @inString = 'Questions about **general computing hardware and software** are off-topic **for Stack Overflow.';
exec #HtmlTagExpander @InString,@OutString out; select @OutString;

set @inString = 'Questions **about** general computing hardware and software **are off-topic** for Stack Overflow.';
exec #HtmlTagExpander @InString,@OutString out; select @OutString;
go

drop procedure #HtmlTagExpander;
go

It yields as output:

Questions about <b>general computing hardware and software</b> are off-topic **for Stack Overflow.

Questions <b>about</b> general computing hardware and software <b>are off-topic</b> for Stack Overflow.
Pieter Geerkens
  • 11,775
  • 2
  • 32
  • 52
0

This solution makes use of techniques described by Jeff Moden in this article on the Running Sum problem in SQL. This solution is lengthy, but by making use of the Quirky Update in SQL Server over a clustered index, holds the promise of being much more efficient over large data sets than cursor-based solutions.

Update - amended below to operate off a table of strings

Assuming the existence of a tally table created like this (with at least 8000 rows):

create table dbo.tally (
     N int not null
    ,unique clustered (N desc)
);
go

with 
E1(N) as ( 
    select 1 from (values
        (1),(1),(1),(1),(1),
        (1),(1),(1),(1),(1)
    ) E1(N)
),                                              --10E+1 or 10 rows
E2(N) as (select 1 from E1 a cross join E1 b),  --10E+2 or 100 rows
E4(N) As (select 1 from E2 a cross join E2 b)   --10E+4 or 10,000 rows max
insert dbo.tally(N)
select row_number() over (order by (select null)) from E4;
go

and a HtmlTagSpotter function defined like this:

create function dbo.HtmlTagSPotter(
     @pString       varchar(8000)
    ,@pDelimiter    char(2))
returns table with schemabinding as
return
   WITH 
        Delimiter as (
        select len(@pDelimiter)     as Length,
               len(@pDelimiter)-1   as Offset
    ),
    cteTally(N) AS (
        select top (isnull(datalength(@pstring),0)) 
            row_number() over (order by (select null)) 
        from dbo.tally
    ),
    cteStart(N1) AS (--==== Returns starting position of each "delimiter" )
        select 
            t.N 
        from cteTally t cross join Delimiter 
        where substring(@pString, t.N, Delimiter.Length) = @pDelimiter
    ),
    cteValues as (
        select
             ItemNumber = row_number() over(order by N1)
            ,Location   = N1
        from cteStart
    )
    select
         ItemNumber
        ,Location
    from cteValues
go

then running the following SQL will perform the required substitution. Note that the inner join at the end prevents any trailing "odd" tags from being converted:

create table #t( 
     ItemNo         int             not null
    ,Item           varchar(8000)       null
    ,StartLocation  int             not null
    ,EndLocation    int             not null

    ,constraint PK unique clustered (ItemNo,StartLocation desc)
);

with data(i,s) as ( select i,s from (values
        (1,'Questions about **general computing hardware and software** are off-topic **for Stack Overflow.')
       ,(2,'Questions **about **general computing hardware and software** are off-topic **for Stack Overflow.')
          --....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8....,....9....,....0
    )data(i,s)
),
tags as (
    select 
         ItemNo         = data.i
        ,Item           = data.s
        ,Location       = f.Location
        ,IsOpen         = cast((TagNumber % 2) as bit)
        ,Occurrence     = TagNumber
    from data
    cross apply dbo.HtmlTagSPotter(data.s,'**') f
)
insert #t(ItemNo,Item,StartLocation,EndLocation)
select 
     data.ItemNo
    ,data.Item
    ,prev.Location
    ,data.Location
from tags data
join tags prev
   on prev.ItemNo       = data.ItemNo
  and prev.Occurrence = data.Occurrence - 1
  and prev.IsOpen     = 1

union all

select 
    i,s,8001,8002
from data
;

declare @ItemNo     int
       ,@ThisStting varchar(8000);

declare @s varchar(8000);
update this
    set @s = this.Item = case when this.StartLocation > 8000
                              then this.Item
                              else stuff(stuff(@s,this.EndLocation,  2,'</b>')
                                                 ,this.StartLocation,2,'<b>')
                         end
from #t this with (tablockx)
option (maxdop 1);

select
    Item
from (
    select 
         Item
        ,ROW_NUMBER() over (partition by ItemNo order by StartLocation) as rn
    from #t
) t
where rn = 1
go

yielding:

Item
------------------------------------------------------------------------------------------------------------
Questions about <b>general computing hardware and software</b> are off-topic **for Stack Overflow.
Questions <b>about </b>general computing hardware and software<b> are off-topic </b>for Stack Overflow.
Pieter Geerkens
  • 11,775
  • 2
  • 32
  • 52
  • @user2284570: Pardon? I don't understand. – Pieter Geerkens Jul 31 '14 at 22:03
  • Just look at my question. That's simple I've asked how to do this by creating a function, but I can't creates functions... only temporary procedures are allowed. So May you update your answer please? – user2284570 Jul 31 '14 at 22:06
  • @user2284570 What parameters are you passing in to the procedure? Just the string to be transformed? DO you want the result in an output string or as a result set? Perhaps you want to pass in a table name and column name? please be more specific. – Pieter Geerkens Jul 31 '14 at 22:19
0

One option is to use a Regular Expression as it makes replacing such patterns very simple. RegEx functions are not built into SQL Server so you need to use SQL CLR, either compiled by you or from an existing library.

For this example I will use the SQL# (SQLsharp) library (which I am the author of) but the RegEx functions are available in the Free version.

SELECT SQL#.RegEx_Replace
(
   N'Questions about **general computing hardware and software** are off-topic\
for Stack Overflow unless **they** directly involve tools used primarily for\
**programming. You may be able to get help on [Super User]\
(https://superuser.com/about)', -- @ExpressionToValidate
   N'\*\*([^\*]*)\*\*', -- @RegularExpression
   N'<b>$1</b>', -- @Replacement
   -1, -- @Count (-1 = all)
   1, - @StartAt
   'IgnoreCase' -- @RegEx options
);

The above pattern \*\*([^\*]*)\*\* just looks for anything surrounded by double-asterisks. In this case you don't need to worry about odd / even. It also means that you won't get a poorly-formed <b>-only tag if for some reason there is an extra ** in the string. I added two additional test cases to the original string: a complete set of ** around the word they and an unmatched set of ** just before the word programming. The output is:

Questions about <b>general computing hardware and software</b> are off-topicfor Stack Overflow unless <b>they</b> directly involve tools used primarily for **programming. You may be able to get help on [Super User](https://superuser.com/about)

which renders as:

Questions about general computing hardware and software are off-topicfor Stack Overflow unless they directly involve tools used primarily for **programming. You may be able to get help on Super User

Community
  • 1
  • 1
Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
  • Regex would be very nice, but I added one more tag to my question. – user2284570 Jul 31 '14 at 07:25
  • @user2284570: tag? Do you mean the **sql** tag? Is CLR not an option? If so, maybe adding a tag for **T-SQL** would help clarify as well as mentioning that in the text of the question (i.e. that CLR is not an option). Though if not an option, why not? Just curious. – Solomon Rutzky Jul 31 '14 at 07:31
  • I added the [tag:dataexplorer] tag. For more information just look at my [query](http://data.stackexchange.com/stackoverflow/query/193616/highest-flag-close-votes-per-questions-with-more-control-than-in-the-flag-queue?Limit=38369&opt.withExecutionPlan=true). – user2284570 Jul 31 '14 at 07:43
  • @user2284570: ok, I see that now. I will say that it is still a bit unclear from the text of your question what your constraints are. I think it would help if you stated clearly that you are using the GUI version of SEDE _and_ the field you have an issue with is `CloseAsOffTopicReasonTypes.MarkdownMini` which is already formatted in Markdown. I guess I can always suggest to SE that they load SQL# and make the RegEx and String functions available to people writing queries in SEDE :-). – Solomon Rutzky Jul 31 '14 at 07:56
  • @user2284570: also, how is it that you are creating a function? I don't see how SEDE allows for that. – Solomon Rutzky Jul 31 '14 at 08:09