4

I have two MySQL tables, and I want to find and replace text strings in one using data in another.

Table texts:

+---------------------+
|      messages       |
+---------------------+
| 'thx guys'          |
| 'i think u r great' |
| 'thx again'         |
| ' u rock'           |
+---------------------+

Table dictionary:

+--------------+---------------+
| bad_spelling | good_spelling |
+--------------+---------------+
|    'thx'     |    'thanks'   |
|    ' u '     |    ' you '    |
|    ' r '     |    ' are '    |
+--------------+---------------+

I want SQL to go through and look at every row in messages and replace every instance of bad_spelling with good_spelling, and to do this for all the pairs of bad_spelling and good_spelling.

The closest I have gotten is this:

update texts, dictionary
set texts.message = replace(texts.message,
                            dictionary.bad_spelling,
                            dictionary.good_spelling)

But this only changes "thx" to "thanks" (in two rows) and does not go on to replace " u " with " you" or " r " with " are ."

Any ideas how to make it use all the rows in dictionary in the replace statement?


PS forgot to mention that this is a small example and in the real thing I will have a lot of find/replace pairs, which may get added to over time.

Pops
  • 30,199
  • 37
  • 136
  • 151
Charlie
  • 43
  • 1
  • 4

4 Answers4

2

I've never used MySql, so this is just a theory based on my other database work. When reading the other answers, trying to use REPLACE(), I thought I could post this and get someone with MySql syntax experience a few ideas to make a set base solution.

here is some SQL Server code to that does most of the work for you:

DECLARE @Source table (Texts varchar(50))
INSERT @Source VALUES ('thx guys')
INSERT @Source VALUES ('i think u r great')
INSERT @Source VALUES ('thx again')
INSERT @Source VALUES ('u rock')

DECLARE @Dictionary table (bad_spelling varchar(50), good_spelling varchar(50))
INSERT @Dictionary VALUES ('thx', 'thanks')
INSERT @Dictionary VALUES ('u', 'you')
INSERT @Dictionary VALUES ('r', 'are')

SELECT
    t.Texts,COALESCE(d.good_spelling,c.ListValue) AS WordToUse
    FROM @Source                                     t
        CROSS APPLY dbo.FN_ListToTable(' ',t.Texts)  c
        LEFT OUTER JOIN @Dictionary                  d ON c.ListValue=d.bad_spelling

OUTPUT:

Texts              WordToUse
------------------ ---------
thx guys           thanks
thx guys           guys
i think u r great  i
i think u r great  think
i think u r great  you
i think u r great  are
i think u r great  great
thx again          thanks
thx again          again
u rock             you
u rock             rock

(11 row(s) affected)

It would be better to use a "real" PK than the actual "Texts" in the query above, but the OP doesn't list many columns in that table, so I use "Texts".

Using SQL Server you need to use a some funky XML syntax to join the rows back together (so I won't show that code, as it doesn't matter), but using MySql's GROUP_CONCAT() you should be able to concatenate the word rows back together into phrase rows.

the code for the (SQL Server) split function and how it works can be found here: SQL Server: Split operation

Community
  • 1
  • 1
KM.
  • 101,727
  • 34
  • 178
  • 212
0

You have to call Replace multiple times on the text:

Update ...
Set texts.message = Replace(
                        Replace(
                            Replace( texts.message, 'thx ', 'thanks ' )
                            , ' u ', ' you ')
                        , ' r ', ' are ')

EDIT Given that you said you had numerous replacements, you would need to do this in a cursor with multiple UPDATE statement calls. Something like (I haven't tested this at all, so beware):

Create Temporary Table ReplaceValues 
    (
    BeforeText varchar(100) not null
    , AfterText varchar(100) not null
    )

Insert ReplaceValues(BeforeText, AfterText) Values('thx ', 'thanks ')
Insert ReplaceValues(BeforeText, AfterText) Values(' u ', ' you ')
Insert ReplaceValues(BeforeText, AfterText) Values(' r ', ' are ')

DECLARE done int DEFAULT(0)
DECLARE BeforeValue varchar(100);
DECLARE AfterValue varchar(100);
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

DECLARE ReplaceList CURSOR FOR Select BeforeText, AfterText From ReplaceValues;

OPEN ReplaceList;   

REPEAT
    If NOT done THEN
        FETCH ReplaceList INTO BeforeValue, AfterValue;

        Update texts
        Set texts.message = REPLACE(texts.message, BeforeValue, AfterValue);
    END IF  
UNTIL done END REPEAT;
CLOSE ReplaceList;

You could wrap all this up into a procedure so that you can call it again later.

Thomas
  • 63,911
  • 12
  • 95
  • 141
  • Thanks Thomas, that is the correct thing to do in the situation I've shown you but I forgot to mention that for the real thing I'm going to have a LOT of pairs to replace (and they may change over time), which I why I want to list them in a separate table. – Charlie Mar 24 '10 at 17:21
  • Unfortunately, the answer is the same (for a single statement): multiple replace calls. The only other way would be to do it in a loop via a cursor where you populate a temp table with your "before" and "after" text and call the update statement each time in the loop. – Thomas Mar 24 '10 at 17:24
0

It does not go all the way because even though the replace had been run x times (where x is the number of rows in dictionary) only one update is retained (the last one).

Transactions don't write down intermediate results and therefore can't see them as input values for the next batch of replacements.

As (AFAIK) MySQL does not support recursive queries you'll have to resort to procedural approach.

Unreason
  • 12,556
  • 2
  • 34
  • 50
0

You need to execute your query many times anyways. Since this is the operation of clean-up type, which you usually do occasionally, i suggest you perform the following query until there was something updated. I do not know how to do it with MySql, but in SQL Server it would be to check the number of rows updated (which is result of this UPDATE query execution), and run the UPDATE again, until no rows are updated.

update  texts, 
        dictionary
set     texts.message = replace(texts.message, dictionary.bad_spelling, dictionary.good_spelling)
where   texts.message <> replace(texts.message, dictionary.bad_spelling, dictionary.good_spelling)
van
  • 74,297
  • 13
  • 168
  • 171
  • Thanks Van, I think this is a winner :) I'll be using PHP to run a whole bunch of SQL queries on the same dataset, so I guess I can use mysql_affected_rows and tell it to repeat the query until 0 rows are affected. – Charlie Mar 25 '10 at 09:45