Hello ?????!

random code random code

Lorem ipsum.

Then I have two sentences stored in variables:

$begin = 'Hello ?????!';  $end = 'Lorem ipsum.';

I want to search $html for these two sentences, and strip everything before and after them. So $html will become:

$html = 'Hello ?????! random code random code

Lorem ipsum.';

How can I achieve this? Note that the $begin and $end variables do not have html tags but the sentences in $html very likely do have tags as shown above.

Maybe a regex approach?

What I've tried so far

A strpos() approach. The problem is that $html contains tags in the sentences, making the $begin and $end sentences not match. I can strip_tags($html) before running strpos(), but then I will obviously end up with $html without the tags.
Search part of variable, like Hello, but that's never safe and will give many matches.

Answer by Druzion for Search HTML for 2 phrases (ignoring all tags) and strip everything else

You could try this RegEx:

(.*?)  # Data before sentences (to be removed)  (      # Capture Both sentences and text in between    H.*?e.*?l.*?l.*?o.*?\s    # Hello[space]    (<.*?>)*                  # Optional Opening Tag(s)    ?.*??.*??.*??.*??.*?   # ?????    (<\/.*?>)*                # Optional Closing Tag(s)    (.*?)                     # Optional Data in between sentences    (<.*?>)*                  # Optional Opening Tag(s)    L.*?o.*?r.*?e.*?m.*?\s    # Lorem[space]    (<.*?>)*                  # Optional Opening Tag(s)    i.*?p.*?s.*?u.*?m.*?      # ipsum  )  (.*)   # Data after sentences (to be removed)

Substituting with the 2nd Capture Group

Live Demo on Regex101

The Regex can be shortened to:

(.*?)(H.*?e.*?l.*?l.*?o.*?\s(<.*?>)*?.*??.*??.*??.*??.*?(<\/.*?>)*(.*?)(<.*?>)*L.*?o.*?r.*?e.*?m.*?\s(<.*?>)*i.*?p.*?s.*?u.*?m.*?)(.*)

Answer by Tim007 for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Just for fun

).*/s";   $str = "        \n        \n        Hello Moto!\n        random code\n        random code\n        
Lorem ipsum.\n        \n        \n        ";   $subst = "$1";     $result = preg_replace($re, $subst, $str);  echo $result."\n";  ?>

Input

$begin = 'Hello Moto!';  $end = 'Lorem ipsum.';                    Hello Moto!      random code      random code      
Lorem ipsum.

Output

Hello Moto! random code random code Lorem ipsum.

Answer by Paul for Search HTML for 2 phrases (ignoring all tags) and strip everything else

This might by far not be the optimal solution, but I love cracking my head about such "riddles", so here's my approach.

      Hello Lydia!
   random code   random code   Lorem ipsum.
      ';    $begin = 'Hello Lydia!';  $end = 'Lorem ipsum.';    $begin_chars = str_split($begin);  $end_chars = str_split($end);    $begin_re = '';  $end_re = '';    foreach ($begin_chars as $c) {      if ($c == ' ') {          $begin_re .= '(\s|(<[a-z/]+>))+';      }      else {          $begin_re .= $c . '(<[a-z/]+>)?';      }  }  foreach ($end_chars as $c) {      if ($c == ' ') {          $end_re .= '(\s|(<[a-z/]+>))+';      }      else {          $end_re .= $c . '(<[a-z/]+>)?';      }  }    $re = '~(.*)((' . $begin_re . ')(.*)(' . $end_re . '))(.*)~ms';    $result = preg_match( $re, $subject , $matches );  $start_tag = preg_match( '~(<[a-z/]+>)$~', $matches[1] , $stmatches );    echo $stmatches[1] . $matches[2];

This outputs:

Hello Lydia!
   random code   random code   Lorem ipsum.

This is matching this case, but I think it would require some more logic to escape regex special chars like periods.

In general, what this snippet does:

Splitting the strings into array, each array value representing a single character. This needs to be done because Hello needs to match Hello as well.
To do that, for the regex part an additional (<[a-z/]+>)? is inserted after each character with a special case for the space character.

Answer by Wiktor Stribiżew for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Here is a short, yet - I believe - working solution based on a lazy dot matching regex (that can be improved by creating a longer, unrolled regex, but should be enough unless you have really large chunks of text).

$html = "\n\n
H
ello
 ?   ????!
\nrandom code\nrandom code\nLorem ipsum.
\n\n ";  $begin = 'Hello     ?????!';  $end = 'Lorem ipsum.';  $begin = preg_replace_callback('~\s++(?!\z)|(\s++\z)~u', function ($m) { return !empty($m[1]) ? '' : ' '; }, $begin);  $end = preg_replace_callback('~\s++(?!\z)|(\s++\z)~u', function ($m) { return !empty($m[1]) ? '' : ' '; }, $end);  $begin_arr = preg_split('~(?=\X)~u', $begin, -1, PREG_SPLIT_NO_EMPTY);  $end_arr = preg_split('~(?=\X)~u', $end, -1, PREG_SPLIT_NO_EMPTY);  $reg = "(?s)(?:<[^<>]+>)?(?:&#?\\w+;)*\\s*" .  implode("", array_map(function($x, $k) use ($begin_arr) { return ($k < count($begin_arr) - 1 ? preg_quote($x, "~") . "(?:\s*(?:<[^<>]+>|&#?\\w+;))*" : preg_quote($x, "~"));}, $begin_arr, array_keys($begin_arr)))          . "(.*?)" .           implode("", array_map(function($x, $k) use ($end_arr) { return ($k < count($end_arr) - 1 ? preg_quote($x, "~") . "(?:\s*(?:<[^<>]+>|&#?\\w+;))*" : preg_quote($x, "~"));}, $end_arr, array_keys($end_arr)));   echo $reg .PHP_EOL;  preg_match('~' . $reg . '~u', $html, $m);  print_r($m[0]);

See the IDEONE demo

Algorithm:

Create a dynamic regex pattern by splitting the delimiter strings into single graphemes (since these can be Unicode characters, I suggest using preg_split('~(?) and imploding back by adding an optional tag matching pattern (?:<[^<>]+>)?.

  Then, (?s) enables a DOTALL mode when . matches any character including a newline, and .*? will match 0+ characters from the leading to trailing delimiter.

    Regex details:
      '~(? matches every location other than at the start of the string before each grapheme
  (sample final regex) (?s)(?:<[^<>]+>)?(?:&#?\w+;)*\s*H(?:\s*(?:<[^<>]+>|&#?\w+;))*e(?:\s*(?:<[^<>]+>|&#?\w+;))*l(?:\s*(?:<[^<>]+>|&#?\w+;))*l(?:\s*(?:<[^<>]+>|&#?\w+;))*o(?:\s*(?:<[^<>]+>|&#?\w+;))* (?:\s*(?:<[^<>]+>|&#?\w+;))*?(?:\s*(?:<[^<>]+>|&#?\w+;))*?(?:\s*(?:<[^<>]+>|&#?\w+;))*?(?:\s*(?:<[^<>]+>|&#?\w+;))*?(?:\s*(?:<[^<>]+>|&#?\w+;))*?(?:\s*(?:<[^<>]+>|&#?\w+;))*\!(?:\s*(?:<[^<>]+>|&#?\w+;))* + (.*?) + L(?:\s*(?:<[^<>]+>|&#?\w+;))*o(?:\s*(?:<[^<>]+>|&#?\w+;))*r(?:\s*(?:<[^<>]+>|&#?\w+;))*e(?:\s*(?:<[^<>]+>|&#?\w+;))*m(?:\s*(?:<[^<>]+>|&#?\w+;))* (?:\s*(?:<[^<>]+>|&#?\w+;))*i(?:\s*(?:<[^<>]+>|&#?\w+;))*p(?:\s*(?:<[^<>]+>|&#?\w+;))*s(?:\s*(?:<[^<>]+>|&#?\w+;))*u(?:\s*(?:<[^<>]+>|&#?\w+;))*m(?:\s*(?:<[^<>]+>|&#?\w+;))*\. - the leading and trailing delimiters with optional subpatterns for tag matching and a (.*?) (capturing might not be necessary) inside.
  ~u modifier is necessary since Unicode strings are to be processed.
  UPDATE: To account for 1+ spaces, any whitespace in the begin and end patterns can be replaced with \s+ subpattern to match any kind of 1+ whitespace characters in the input string.
  UPDATE 2:  The auxiliary $begin = preg_replace('~\s+~u', ' ', $begin); and $end = preg_replace('~\s+~u', ' ', $end); are necessary to account for 1+ whitespace in the input string.
  To account for HTML entities, add another subpattern to the optional parts: &#?\\w+;, it will also match   and { like entities. It is also prepended with \s* to match optional whitespace, and quantified with * (can be zero or more).
  
  
Answer by v7d8dpo4 for Search HTML for 2 phrases (ignoring all tags) and strip everything else

How about this?
    $escape=array('\\'=>1,'^'=>1,'?'=>1,'+'=>1,'*'=>1,'{'=>1,'}'=>1,'('=>1,')'=>1,'['=>1,']'=>1,'|'=>1,'.'=>1,'$'=>1,'+'=>1,'/'=>1);  $pattern='/';  for($i=0;isset($begin[$i]);$i++){  if(ord($c=$begin[$i])<0x80||ord($c)>0xbf){      if(isset($escape[$c]))          $pattern.="([ \t\r\n\v\f]*<\\/?[a-zA-Z]+>[ \t\r\n\v\f]*)*\\$c";      else          $pattern.="([ \t\r\n\v\f]*<\\/?[a-zA-Z]+>[ \t\r\n\v\f]*)*$c";      }      else          $pattern.=$c;  }  $pattern.="(.|\n|\r)*";  for($i=0;isset($end[$i]);$i++){  if(ord($c=$end[$i])<0x80||ord($c)>0xbf){      if(isset($escape[$c]))          $pattern.="([ \t\r\n\v\f]*<\\/?[a-zA-Z]+>[ \t\r\n\v\f]*)*\\$c";      else          $pattern.="([ \t\r\n\v\f]*<\\/?[a-zA-Z]+>[ \t\r\n\v\f]*)*$c";      }      else          $pattern.=$c;  }  $pattern[17]='?';  $pattern.='(<\\/?[a-zA-Z]+>)?/';  preg_match($pattern,$html,$a);  $match=$a[0];  
  
Answer by Dávid Horváth for Search HTML for 2 phrases (ignoring all tags) and strip everything else

I really wanted to write a regex solution. But I am preceeded with some nice and complex solutions. So, here is a non-regex solution.
    Short explanation: The major problem is keeping HTML tags. We could easily search text, if HTML tags were stripped. So: strip these! We can easily search in the stripped content, and produce a substring we want to cut (see 'Begin-End' part). Then, try to cut this substring from the HTML while keeping the tags.
    Advantages:
      Searching is easy and independent from HTML, you can search with regex too if you need
  Requirements are scalable: you can easily add full multibyte support, support for entities and white-space collapse, and so on
  Relatively fast (it is possible, that a direct regex can be faster)
  Does not touch original HTML, and adaptable to other markup languages
  
    A static utility class for this scenario:
    Explanation comments will be added later!
    class HtmlExtractUtil  {        const FAKE_MARKUP = '<>';      const MARKUP_PATTERN = '#<[^>]+>#u';        static public function extractBetween($html, $startTextToFind, $endTextToFind)      {          $strippedHtml = preg_replace(self::MARKUP_PATTERN, '', $html);          $startPos = strpos($strippedHtml, $startTextToFind);          $lastPos = strrpos($strippedHtml, $endTextToFind);            if ($startPos === false || $lastPos === false) {              return "";          }            $endPos = $lastPos + strlen($endTextToFind);          if ($endPos <= $startPos) {              return "";          }            return self::extractSubstring($html, $startPos, $endPos);      }        static public function extractSubstring($html, $startPos, $endPos)      {          preg_match_all(self::MARKUP_PATTERN, $html, $matches, PREG_OFFSET_CAPTURE);          $start = -1;          $end = -1;          $previousEnd = 0;          $stripPos = 0;          $matchArray = $matches[0];          $matchArray[] = [self::FAKE_MARKUP, strlen($html)];          foreach ($matchArray as $match) {              $diff = $previousEnd - $stripPos;              $textLength = $match[1] - $previousEnd;              if ($start == (-1)) {                  if ($startPos >= $stripPos && $startPos < $stripPos + $textLength) {                      $start = $startPos + $diff;                  }              }              if ($end == (-1)) {                  if ($endPos > $stripPos && $endPos <= $stripPos +

$textLength) { $end = $endPos + $diff; break; } } $tagLength = strlen($match[0]); $previousEnd = $match[1] + $tagLength; $stripPos += $textLength; } if ($start == (-1)) { return ""; } elseif ($end == (-1)) { return substr($html, $start); } else { return substr($html, $start, $end - $start); } } }

Usage:

$html = '      Any string before
  Hello ?????!
  random code  random code  Lorem ipsum.
  Any string after
      ';  $startTextToFind = 'Hello ?????!';  $endTextToFind = 'Lorem ipsum.';    $extractedText = HtmlExtractUtil::extractBetween($html, $startTextToFind, $endTextToFind);    header("Content-type: text/plain; charset=utf-8");  echo $extractedText . "\n";

Answer by Steve Chambers for Search HTML for 2 phrases (ignoring all tags) and strip everything else

PHP solution:

PHPFiddle Demo

$html = '                              Hello ?????!
          random code          random code          Lorem ipsum.
                              ';  $begin = 'Hello ?????!';  $end = 'Lorem ipsum.';    $matchHtmlTag = '(?:<.*?>)?';  $matchAllNonGreedy = '(?:.|\r?\n)*?';  $matchUnescapedCharNotAtEnd = '([^\\\\](?!$)|\\.(?!$))';  $matchBeginWithTags = preg_replace(      $matchUnescapedCharNotAtEnd, '$0' . $matchHtmlTag, preg_quote($begin));  $matchEndWithTags = preg_replace(      $matchUnescapedCharNotAtEnd, '$0' . $matchHtmlTag, preg_quote($end));  $pattern = '/' . $matchBeginWithTags . $matchAllNonGreedy . $matchEndWithTags . '/';    preg_match($pattern, $html, $matches);  $html = $matches[0];

Generated regex ($pattern):

Regex101 Demo

H(?:<.*?>)?e(?:<.*?>)?l(?:<.*?>)?l(?:<.*?>)?o(?:<.*?>)? (?:<.*?>)??(?:<.*?>)??(?:<.*?>)??(?:<.*?>)??(?:<.*?>)??(?:<.*?>)?!(?:.|\r?\n)*?L(?:<.*?>)?o(?:<.*?>)?r(?:<.*?>)?e(?:<.*?>)?m(?:<.*?>)? (?:<.*?>)?i(?:<.*?>)?p(?:<.*?>)?s(?:<.*?>)?u(?:<.*?>)?m(?:<.*?>)?\.

Answer by trincot for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Regular expressions have their limitations when it comes to parsing HTML. Like many have done before me, I will refer to this famous answer.

Potential Problems when relying on Regular Expressions

For instance, imagine this tag appears in the HTML before the part that must be extracted:

This comes before the match

Many regexp solutions will stumble over this, and return a string that starts in the middle of this opening p tag.

Or consider a comment inside the HTML section that has to be matched:

Or, some loose less-than and greater-than signs appear (let's say in a comment, or attribute value):

What will those regexes do with that?

These are just examples... there are countless other situations that pose problems to regular expression based solutions.

There are more reliable ways to parse HTML.

Load the HTML into a DOM

I will suggest here a solution based on the DOMDocument interface, using this algorithm:

Get the text content of the HTML document and identify the two offsets where both sub strings (begin/end) are located.
Then go through the DOM text nodes keeping track of the offsets where these nodes fit in. In the nodes where either of the two bounding offsets are crossed, a predefined delimiter (|) is inserted. That delimiter should not be present in the HTML string. Therefore it is doubled (||, ||||, ...) until that condition is met;
Finally split the HTML representation by this delimiter and extract the middle part as the result.

Here is the code:

function extractBetween($html, $begin, $end) {      $dom = new DOMDocument();      // Load HTML in DOM, making sure it supports UTF-8; double HTML tags are no problem      $dom->loadHTML('                        ' . $html);      // Get complete text content      $text = $dom->textContent;      // Get positions of the beginning/ending text; exit if not found.      if (($from = strpos($text, $begin)) === false) return false;      if (($to = strpos($text, $end, $from + strlen($begin))) === false) return false;      $to += strlen($end);      // Define a non-occurring delimiter by repeating `|` enough times:      for ($delim = '|'; strpos($html, $delim) !== false; $delim .= $delim);      // Use XPath to traverse the DOM      $xpath = new DOMXPath($dom);      // Go through the text nodes keeping track of total text length.      // When exceeding one of the two offsets, inject a delimiter at that position.      $pos = 0;      foreach($xpath->evaluate("//text()") as $node) {          // Add length of node's text content to total length          $newpos = $pos + strlen($node->nodeValue);          while ($newpos > $from || ($from === $to && $newpos === $from)) {              // The beginning/ending text starts/ends somewhere in this text node.              // Inject the delimiter at that position:              $node->nodeValue = substr_replace($node->nodeValue, $delim, $from - $pos, 0);              // If a delimiter was inserted at both beginning and ending texts,              // then get the HTML and return the part between the delimiters              if ($from === $to) return explode($delim, $dom->saveHTML())[1];              // Delimiter was inserted at beginning text. Now search for ending text              $from = $to;          }          $pos = $newpos;      }  }

You would call it like this:

// Sample input data  $html = '                              This comes before the match
          Hey! Hello ?????!
          random code          random code          Lorem ipsum. la la la
          This comes after the match
                              ';    $begin = 'Hello ?????!';  $end = 'Lorem ipsum.';    // Call  $html = extractBetween($html, $begin, $end);    // Output result  echo $html;

Output:

Hello ?????! random code random code

Lorem ipsum.

You'll find this code is also easier to maintain than regex alternatives.

See it run on eval.in.

Answer by Quasimodo's clone for Search HTML for 2 phrases (ignoring all tags) and strip everything else

There are several different approaches to do a content search on HTML source. They all have advantages and disadvantages. If the structure in unknown code is an issue, the safest way would be to use an XML parser, however, those are complex and therefore rather slow.

Regular expressions are designed for text processing. Although regexp is not the quickest thing due to overhead, preg_functions are a reasonable compromise to keep code small and concise while not paying to much performance impact if and only if you prevent patterns becoming too complex.

Analysis of HTML structures is doable by recursive regular expressions. Since the slow down the processing and are hard to debug I prefer to code the base logic in PHP and utilize preg_functions to do smaller quick tasks.

Here is an solution in OOP, a tiny class intended to process many searches on the same HTML source. It is already an approach to handle extended similar problems like adding preceding and succeeding content until next tag boundary. It does not claim to be a perfect solution yet, but it is easily extendable.

The logic is: Pay some runtime for initialization to store tag positions relative to plain text, strip tags and store the strings between <...> and sums of length as well. Then on each content search match the needles with plain content. Locate the start/end position in the HTML source by binary search.

Binary search works like that: A sorted list is required. You store the index of first and last element+1. Calculate the average by an addition and integer division by 2. Division and floor is performantly done by a right bitshift. If the found value is to low, set the less index var to the current index, else the greater one. Stop on index difference 1. If you search an exact value, break early on element found. 0,(14+1) => 7 ; 7,15 => 11 ; 7,11 => 9 ; 7,9 => 8 ; 8-7 = diff.1 Instead of 15 iterations only 4 are done. The greater the start value is, the more time is exponentially saved.

PHP class:

set_html($html);    }      public function set_html($html)    {      $this->html = $html;      $regexp = '~<.*?>~su';      preg_match_all($regexp, $html, $this->tags, PREG_PATTERN_ORDER | PREG_OFFSET_CAPTURE);      $this->tags = $this->tags[0];      # we use exact the same algorithm to strip html      $this->heystack = preg_replace($regexp, '', $html);        # convert positions to plain content      $sum_length = 0;      foreach($this->tags as &$tag)      { $tag['pos_in_content'] = $tag[1] - $sum_length;        $tag['sum_length'    ] = $sum_length += strlen($tag[0]);      }        # zero length dummy tags to mark start/end position of strings not beginning/ending with a tag      array_unshift($this->tags , [0 => '', 1 => 0, 'pos_in_content' => 0, 'sum_length' => 0 ]);       array_push   ($this->tags , [0 => '', 1 => strlen($html)-1]);     }      public function translate_pos_plain2html($content_position)    {      # binary search      $idx = [true => 0, false => count($this->tags)-1];      while(1 < $idx[false] - $idx[true])      { $i = ($idx[true] + $idx[false]) >>1;                               // integer half of both array indexes        $idx[$this->tags[$i]['pos_in_content'] <= $content_position] = $i; // hold one index less and the other greater      }        $this->current_tag_idx = $idx[true];      return $this->tags[$this->current_tag_idx]['sum_length'] + $content_position;    }      public function &find_content($needle_start, $needle_end = '', $result_modifiers = self::RESULT_NO_MODIFICATION)    {      $needle_start = preg_quote($needle_start, '~');      $needle_end   = '' == $needle_end ? '' : preg_quote($needle_end  , '~');      if((self::MATCH_BLANK_MULTIPLE | self::MATCH_BLANK_AS_WHITESPACE) & $result_modifiers)      {         $replacement  = self::MATCH_BLANK_AS_WHITESPACE & $result_modifiers ? '\s' : ' ';        if(self::MATCH_BLANK_MULTIPLE & $result_modifiers)        { $replacement .= '+';          $multiplier = '+';        }        else          $multiplier = '';        $repl_pattern = "~ $multiplier~";        $needle_start = preg_replace($repl_pattern, $replacement, $needle_start);        $needle_end   = preg_replace($repl_pattern, $replacement, $needle_end);      }        $icase = self::MATCH_CASE_INSENSITIVE & $result_modifiers ? 'i' : '';      $search_pattern = "~{$needle_start}.*?{$needle_end}~su$icase";      preg_match_all($search_pattern, $this->heystack, $matches, PREG_PATTERN_ORDER | PREG_OFFSET_CAPTURE);        foreach($matches[0] as &$match)      { $pre = $post = '';          $pos_start = $this->translate_pos_plain2html($match[1]);        if(self::RESULT_PREPEND_TAG_CONTENT & $result_modifiers)          $pos_start = $this->tags[$this->current_tag_idx][1]            +( self::RESULT_PREPEND_TAG & $result_modifiers ? 0 : strlen ($this->tags[$this->current_tag_idx][0]) );        elseif(self::RESULT_PREPEND_TAG     & $result_modifiers)          $pre = $this->tags[$this->current_tag_idx][0];          $pos_end   = $this->translate_pos_plain2html($match[1] + strlen($match[0]));        if(self::RESULT_APPEND_TAG_CONTENT & $result_modifiers)        { $next_tag = $this->tags[$this->current_tag_idx+1];          $pos_end = $next_tag[1]            +( self::RESULT_APPEND_TAG  & $result_modifiers ? strlen ($next_tag[0]) : 0);        }        elseif(self::RESULT_APPEND_TAG     & $result_modifiers)          $post = $this->tags[$this->current_tag_idx+1][0];          $match = $pre . substr($this->html, $pos_start, $pos_end - $pos_start) . $post;      };      return $matches[0];    }  }

Some test case:

$html_source = get($_POST['html'], <<< ___            He said: "Hello ?????!"
      random code      random code      Lorem ipsum. foo bar







Share This:  
 Facebook
 Twitter
 Google+
 Stumble
 Digg

Discussion of Coding

Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Sunday, May 1, 2016

Search HTML for 2 phrases (ignoring all tags) and strip everything else

What I've tried so far

Answer by Druzion for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by Tim007 for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by Paul for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by Wiktor Stribiżew for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by v7d8dpo4 for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by Dávid Horváth for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by Steve Chambers for Search HTML for 2 phrases (ignoring all tags) and strip everything else

PHP solution:

Generated regex ($pattern):

Answer by trincot for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Potential Problems when relying on Regular Expressions

Load the HTML into a DOM

Answer by Quasimodo's clone for Search HTML for 2 phrases (ignoring all tags) and strip everything else

0 comments:

Post a Comment

Popular Posts

Fun Page

Sunday, May 1, 2016

Search HTML for 2 phrases (ignoring all tags) and strip everything else

What I've tried so far

Answer by Druzion for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by Tim007 for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by Paul for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by Wiktor Stribiżew for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by v7d8dpo4 for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by Dávid Horváth for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Answer by Steve Chambers for Search HTML for 2 phrases (ignoring all tags) and strip everything else

PHP solution:

Generated regex ($pattern):

Answer by trincot for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Potential Problems when relying on Regular Expressions

Load the HTML into a DOM

Answer by Quasimodo's clone for Search HTML for 2 phrases (ignoring all tags) and strip everything else

Related Posts:

0 comments:

Post a Comment

Popular Posts

Fun Page