1

I am trying to detect all the paragraphs in this file:

an XML file

to do so I used this code :

    Pattern p = Pattern.compile("<paragraph>\\s*?(.*?)\\s*?(.*?)\\s*?(.*?)</paragraph>");
    Matcher m = p.matcher(ne);
    int occur = 1;

    while(m.find()) {

        System.out.print("Word = " + ne.substring(m.start(), m.end())+"\n");        }


    }

the problem is that it only detects the first paragraph. Help please?

Dreamer
  • 17
  • 7

2 Answers2

2

Here's a one-liner using commons-lang:

String[] paragraphs = StringUtils.substringsBetween(ne, "<paragraph>", "</paragraph>");
Matt Flowers
  • 131
  • 1
  • 6
0

Dreamer, as you said... for a "simple java project":

//import java.util.regex.Matcher;
//import java.util.regex.Pattern;
StringBuilder text = new StringBuilder();
text.append("<html><something>");
text.append("<paragraph><Sentence>text 1 qwe</Sentence></paragraph>");
text.append("<paragraph><Sentence>text 2 qwe</Sentence></paragraph>");
text.append("<zzz>this text wont go</zzz>");
text.append("<paragraph><Sentence>text 3 qwe</Sentence></paragraph>");
text.append("</something></html");
System.out.println(text.toString());

Pattern p = Pattern.compile("<paragraph>(.*?)</paragraph>");
Matcher m = p.matcher(text.toString());

while (m.find()) {
    System.out.print("Word = " + m.group() + "\n");
}

Output:

<html><something><paragraph><Sentence>text 1 qwe</Sentence></paragraph>
<paragraph><Sentence>text 2 qwe</Sentence></paragraph><zzz>this text wont   
go</zzz><paragraph><Sentence>text 3 qwe</Sentence></paragraph></something>  
</html>
Word = <paragraph><Sentence>text 1 qwe</Sentence></paragraph>
Word = <paragraph><Sentence>text 2 qwe</Sentence></paragraph>
Word = <paragraph><Sentence>text 3 qwe</Sentence></paragraph>
  • Actually my code worked just fine when i changed the regular expression "([\\s\\S]*?)" , thank you for your help ! – Dreamer May 14 '15 at 20:16