Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Wednesday, October 12, 2016

Regular Expression to remove single characters from the beginning of string, only if there are 2 or more

Regular Expression to remove single characters from the beginning of string, only if there are 2 or more


I'm devastatingly miserable at complex regular expressions, but I would love a nudge in the right direction. I'm trying to parse some authors' names by removing initials, when the full names are used later. I realize there probably won't be a "perfect" solution that catches all exceptions, but I'm looking for a "good enough" solution.

Example input

C S Clive Staples Lewis  T H Terence Hanbury White  R Salvatore  George R R Martin  J R R John Ronald Reuel Tolkien  J K Rowling  

Ideal output

Clive Staples Lewis  Terence Hanbury White  R Salvatore  George R R Martin  John Ronald Reuel Tolkien  J K Rowling  

Something along the lines of this: $str = preg_replace('#(?:\s+\S{1,2})+\s+#',' ',$str); though this is obviously missing the first instance of the single character, but changing that would remove the r in r salvatore and the j k in j k rowling.

Thank you for any insight.

Answer by anubhava for Regular Expression to remove single characters from the beginning of string, only if there are 2 or more


You can use it like this:

$str = 'C S Clive Staples Lewis';      $str = preg_replace('#^([A-Z]\s)+(?=([A-Z]+\s+){2,})#i','',$str);   echo $str; // Clive Staples Lewis    $str = 'J K Rowling';      $str = preg_replace('#^([A-Z]\s)+(?=([A-Z]+\s+){2,})#i','',$str);   echo $str; // J K Rowling  

Answer by Casimir et Hippolyte for Regular Expression to remove single characters from the beginning of string, only if there are 2 or more


You can use this:

$result = preg_replace('~^(?:[A-Z]\h){2,}~m', '', $str);   

If you want to put exceptions you can do that:

$str = <<  J \h K \h      Rowling                    | J \h F \h      Kennedy                    | C \h P \h E \h Bach      )    )      # pattern      ^(?!\g)    (?:[A-Z]\h){2,}  ~xm  LOD;    $result = preg_replace($pattern, '', $str);  

Answer by SmokeyPHP for Regular Expression to remove single characters from the beginning of string, only if there are 2 or more


This seems to do what you're after:

var t = [  'C S Clive Staples Lewis'  ,'T H Terence Hanbury White'  ,'R Salvatore'  ,'George R R Martin'  ,'J R R John Ronald Reuel Tolkien'  ,'J K Rowling'  ];  for(var i=0,c=t.length;i

Do note, however, that this method is limited to 3 initials (though I can't see you ever having more than that!)

On the plus side, this is checking that initials are matched up to a name starting with that letter before removing them

If you need PHP:

$t = array(  'C S Clive Staples Lewis'  ,'T H Terence Hanbury White'  ,'R Salvatore'  ,'George R R Martin'  ,'J R R John Ronald Reuel Tolkien'  ,'J K Rowling'  );  for($i=0,$c=count($t);$i<$c;$i++)  {      $newStr = preg_replace('/^([A-Z]) ([A-Z])((?: [A-Z])?) (\1\w+ \2\w+( \3\w+)?.+)$/','$4',$t[$i]);      var_dump($newStr);  }  

Answer by gpmurthy for Regular Expression to remove single characters from the beginning of string, only if there are 2 or more


Consider the following Regex...

(?(^(\w\s)+\w{2,}(\s\w{2,}){1,})^(\w\s)+)

Answer by edi_allen for Regular Expression to remove single characters from the beginning of string, only if there are 2 or more


Even though you are using PHP you did not specify a language. So this is a sample in Perl.

use strict;  use warnings;    open my $data_fh, '<', 'Data1.txt'       or die "Can't open Data1.txt $!";    while (my $line = <$data_fh>) {      $line =~ s/\b([A-Z])\b (?=.*?\b\1[A-Z]+\b)//xig; # Match an initial only if there is a word starting with that initial later in the string.      $line =~ s/^\s*|\s*$//g; #strip leading or trailing space.      print "$line\n";  }    #OUTPUT  Clive Staples Lewis  Terence Hanbury White  R Salvatore  George R R Martin  John Ronald Reuel Tolkien  J K Rowling  

Answer by Teneff for Regular Expression to remove single characters from the beginning of string, only if there are 2 or more


You can use the following regular expression:

^(?:([A-Z])(?=.*?\1[a-z]+)\s)+  

It will match:

^ // from the beginning of the string  (?:  // non-capturing group      ([A-Z]) // cature uppercase string      (?=.*?\1[a-z]+) // positive lookahead for the letter captured above followed by multiple lowercase characters      \s // followed by a space  )+ // multiple times  

php live regex example


Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72

0 comments:

Post a Comment

Popular Posts

Powered by Blogger.