-5

I've table in database which contains HTML of 500 pages. All these content contains <img width="100%".....

I want to remove width="100%" from all the images without affecting rest of the content.

e.g. Current string

<img width="100%" src="/images/skyline.jpg" alt="Image showing sky line" />

or

<img src="/images/skyline.jpg" width="100%" alt="Image showing sky line" />

W3C validation detects this as an error - "Bad value 100% for attribute width on element img: Expected a digit but saw % instead."

Expected string

<img src="/images/skyline.jpg" alt="Image showing sky line" />
hotfix
  • 3,376
  • 20
  • 36
Manish Dhariwal
  • 61
  • 1
  • 1
  • 9

2 Answers2

0

The solution is for the Oracle DB, as it was not clear which DBMS the solution is being searched for

If width is always after the img tag, you can do that in the Oracle database with the Replace function, for example

replace (<column>,'<img width="100%" ','<img ');

update statement could be like

update <table>
set <column> = replace (<column>,'<img width="100%" ','<img ')

If attribute width does not come directly after the img tag then you have to find the IMG tag first. This can you do with a regular expression. How to find the img tags was discussed here more than once.

check this question

This modified regular expression of the user sln could be useful to you:

 <img\s[^>]*?(width\s*=\s*[''\"]([^''\"]*?)[''\"])([^>]*?)>

First, you have to find the img tags and filter out the information

replace(REGEXP_SUBSTR(txt,'<img\s[^>]*?width\s*=\s*[''\"]([^''\"]*?)[''\"][^>]*?>'),'width="100%" ','') 

then you can replace the img tags with the filtered tags in the whole html, that could look like that:

REGEXP_REPLACE(txt
             , '<img\s[^>]*?(width\s*=\s*[''\"]([^''\"]*?)[''\"])([^>]*?)>'
             , replace(REGEXP_SUBSTR(txt,'<img\s[^>]*?width\s*=\s*[''\"]([^''\"]*?)[''\"][^>]*?>'),'width="100%" ','')          
             )

SQLFiddel Test

This may not be the optimal option, but it may be helpful

hotfix
  • 3,376
  • 20
  • 36
  • As per the question I asked above this doesn't work because there can be anything else inside the img tag before the width attribute. – TomC Aug 30 '18 at 22:09
  • @TomC if you have no more detailed information in the question, then you can not give a more precise answer. I have changed the answer now – hotfix Aug 31 '18 at 06:32
  • No problem. I've got an alternative for sqlserver too but as its currently on hold cant add it. We don't have regex_replace in sql server which is a shame as it makes it relatively straightforward, once you get the regex right! Good to see you understood the question too. Dont know why it was put on hold. – TomC Aug 31 '18 at 06:44
0

This will work for sql server. I've included a few sample img tags to show how it works.

However be aware that it will reformat the html which may not be ideal. It will be the same html but without extra white space and line breaks. This may not be what you want.

The reason I have to do this is that the when I convert to xml and then find the nodes, the results are trimmed of extra white space and they can not be found back in the original string. If your html doesn't have the extra whitespace inside the img tags then you can leave out this step:

select convert(varchar(max),convert(xml,x)) as trimmed from @t

If this doesnt work for you then hopefully the identification of the matching nodes will at least help. Just add a select * from matching after the matching cte to see what you've got.

It also works on a document by document basis, so you may have to put this in a loop to work through your documents. Or update it to work across the lot but I'm guessing this is a one-time operation so probably not required.

declare @t table(x varchar(max))
insert @t values ('<html><div><img width="50%" /><img    width="100%" src="foo" alt="bar" />
<span class="c">sometext</span>
<div><img src="foo" alt="bah" width="100%"  /></div></div>
</html>')

;with [xml] as (
    --- convert the string to xml
    select convert(xml,x) as x from @t
)
,matching as (
    -- Find all the img nodes that have width=100%
    select convert(varchar(max),c.query('.')) as matches
    from [xml]
    cross apply x.nodes('//img[@width="100%"]') as t(c)
) 
,source as (
    -- Tidy up the source, as it has multiple spaces that prevent the matches from finding the nodes
    select convert(varchar(max),convert(xml,x)) as trimmed from @t
)
,replaced as (
    -- Work through recursively removing the 100% from matching nodes
    select trimmed from source
    union all
    select replace(trimmed,matches,replace(matches,'width="100%"','')) as replaced from matching
    cross join replaced
    where charindex(matches,replaced.trimmed)>0
)
-- Get the shortest (or last) string from above
select top 1 * from replaced order by len(trimmed) 

Output:

<html><div><img width="50%"/><img  src="foo" alt="bar"/><span class="c">sometext</span><div><img src="foo" alt="bah" /></div></div></html>
TomC
  • 2,759
  • 1
  • 7
  • 16