1

I can not find a solution.

How to check string with the html code.

example

<p><o:p></o:p></p> 
<p> <br /> </p> 
<p><b style=\"font-weight: bold;\"><b>Desc: </b>AnyText.</p> 
 <br /> </p> 
<p><b>Color:</b> green<
<p> <b>Param 2: AU55688</p> 
<p><b>Param 3: </b>420 x 562</p> 
<p><b>Height: </b>1425</p>

If there are unclosed tags or undiscovered, then return string if all is well, then skip.

I found and modified function. But it does not work properly

function closetag($html)
{
    $ignore_tags = array('img', 'br', 'hr');

    preg_match_all ( "#<([a-z]+)( .*)?(?!/)>#iU", mb_strtolower($html), $result1);
    preg_match_all ( "#</([a-z]+)>#iU", mb_strtolower($html), $result2);
    $results_start = $result1[1];
    $results_end = $result2[1];

    $result = array();
    foreach($results_start AS $startag)
    {
        if (!in_array($startag, $results_end) && !in_array($startag, $ignore_tags))
        {
            $result['start_tags'][] = $startag;
        }
    }
    foreach($results_end AS $endtag)
    {
        if (!in_array($endtag, $results_start) && !in_array($endtag, $ignore_tags))
        {
            $result['end_tags'][] = $endtag;
        }
    }

    return ($result) ? $result : false;
}

I do not need to correct the code, I need only determine that the syntax is not correct.

An example of what I want to get a result

$getTexts = $this->getTexts();

$no_valid = array();
foreach($getTexts AS $text)
{
    $_valid = check_html_systax_function($text);
    if (!$_valid)
    {
        $no_valid[] = $text;
    }
}

check_html_systax_function checks texts for correct html syntax

$no_valid array of texts in which errors in html syntax

P.S. Sorry for my English!

Andrey
  • 11
  • 1
  • 3
  • If you can explain more about what you are going to do with the text that you are going to extract from the HTML part, May be there is an easy and alternate solutions there. Like DOMDocument (proper HTML parser), PHP Tidy (to repair the un-closed tag), or HTML Purifier – PHCJS May 20 '15 at 09:42
  • @PHJCJO I opt out of the database record that there is a check description. I just need to determine what the text is html syntax errors. I updated the question and added the sample code. – Andrey May 20 '15 at 11:22
  • NEVER use regex to parse HTML! – René Roth May 20 '15 at 11:24

4 Answers4

0

Do not use Regex to parse or validate HTML.

For PHP, there is the class DOMDocument. You can use this as follows:

$dom = new DOMDocument;
$dom->loadHTML($html);
if ($dom->validate()) {
    //valid HTML code
}

If you're looking for a library that offers more configurability and detailed error reporting, check HTMLpurifier.

René Roth
  • 1,979
  • 1
  • 18
  • 25
  • When checking an error - Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag o:p invalid in Entity, line: 1.... The error causes the tag "o:p" – Andrey May 20 '15 at 11:44
  • @PHJCJO Yes, notice and warnings)) I am using php 5.3. my code [link](http://pastebin.com/7LrJHGJZ). If check the **$html** is getting an error - Warning: DOMDocument::validate() [domdocument.validate]: No declaration for element html..... If check the **$html2** is getting an error - Warning: DOMDocument::validate(http://www.w3.org/TR/REC-html40/loose.dtd) [domdocument.validate]: failed to open stream: HTTP request failed! HTTP/1.0 500 Server Error What am i doing wrong? – Andrey May 21 '15 at 07:12
  • 1
    see this link https://github.com/Masterminds/html5-php/issues/21 and http://stackoverflow.com/questions/4062792/domdocumentvalidate-problem – PHCJS May 21 '15 at 10:45
0

You can check the following links for PHP HTML DOM parsers:

MNR
  • 727
  • 1
  • 9
  • 23
0

You can check html is valid or not by following code :

function closetags($html) {
    preg_match_all('#<(?!meta|img|br|hr|input\b)\b([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
    $openedtags = $result[1];
    preg_match_all('#</([a-z]+)>#iU', $html, $result);
    $closedtags = $result[1];
    $len_opened = count($openedtags);
    if (count($closedtags) == $len_opened) {
        echo 'valid html'; 
    } else {
        echo 'invalid html';
    }
} 

$html = '<p>This is some text and here is a <strong>bold text then the post stop here....</p>';
closetags($html);
Charvi
  • 171
  • 2
  • 7
0

I've created method based on regex by Charvi.

It is available in text utilities: https://github.com/Alex-K-O-R/Text-utilities

Alex K
  • 1
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 05 '23 at 11:51