Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Tuesday, February 16, 2016

How to trim leading and trailing whitespace in R?

How to trim leading and trailing whitespace in R?


I am having some troubles with leading and trailing whitespace in a data.frame. Eg I like to take a look at a specific row in a data.frame based on a certain condition:

> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)]     [1] codeHelper     country        dummyLI    dummyLMI       dummyUMI         [6] dummyHInonOECD dummyHIOECD    dummyOECD        <0 rows> (or 0-length row.names)  

I was wondering why I didn't get the expected output since the country Austria obviously existed in my data.frame. After looking through my code history and trying to figure out what went wrong I tried:

> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)]     codeHelper  country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD  18        AUT Austria        0        0        0              0           1     dummyOECD  18         1  

All I have changed in the command is an additional whitespace after Austria.

Further annoying problems obviously arise. Eg when I like to merge two frames based on the country column. One data.frame uses "Austria " while the other frame has "Austria". The matching doesn't work.

  1. Is there a nice way to 'show' the whitespace on my screen so that i am aware of the problem?
  2. And can I remove the leading and trailing whitespace in R?

So far I used to write a simple Perl script which removes the whitespace but it would be nice if I can somehow do it inside R.

Answer by f3lix for How to trim leading and trailing whitespace in R?


Probably the best way is to handle the trailing whitespaces when you read your data file. If you use read.csv or read.table you can set the parameterstrip.white=TRUE.

If you want to clean strings afterwards you could use one of these functions:

# returns string w/o leading whitespace  trim.leading <- function (x)  sub("^\\s+", "", x)    # returns string w/o trailing whitespace  trim.trailing <- function (x) sub("\\s+$", "", x)    # returns string w/o leading or trailing whitespace  trim <- function (x) gsub("^\\s+|\\s+$", "", x)  

To use one of these functions on myDummy$country:

 myDummy$country <- trim(myDummy$country)  

To 'show' the whitespace you could use:

 paste(myDummy$country)  

which will show you the strings surrounded by quotation marks (") making whitespaces easier to spot.

Answer by Jyotirmoy Bhattacharya for How to trim leading and trailing whitespace in R?


Use grep or grepl to find observations with whitespaces and sub to get rid of them.

names<-c("Ganga Din\t","Shyam Lal","Bulbul ")  grep("[[:space:]]+$",names)  [1] 1 3  grepl("[[:space:]]+$",names)  [1]  TRUE FALSE  TRUE  sub("[[:space:]]+$","",names)  [1] "Ganga Din" "Shyam Lal" "Bulbul"    

Answer by Marek for How to trim leading and trailing whitespace in R?


ad1) To see white spaces you could directly call print.data.frame with modified arguments:

print(head(iris), quote=TRUE)  #   Sepal.Length Sepal.Width Petal.Length Petal.Width  Species  # 1        "5.1"       "3.5"        "1.4"       "0.2" "setosa"  # 2        "4.9"       "3.0"        "1.4"       "0.2" "setosa"  # 3        "4.7"       "3.2"        "1.3"       "0.2" "setosa"  # 4        "4.6"       "3.1"        "1.5"       "0.2" "setosa"  # 5        "5.0"       "3.6"        "1.4"       "0.2" "setosa"  # 6        "5.4"       "3.9"        "1.7"       "0.4" "setosa"  

See also ?print.data.frame for other options.

Answer by userJT for How to trim leading and trailing whitespace in R?


To manipulate the white space, use str_trim() in the stringr package. The package has manual dated Feb 15,2013 and is in CRAN. The function can also handle string vectors.

install.packages("stringr", dependencies=TRUE)  require(stringr)  example(str_trim)  d4$clean2<-str_trim(d4$V2)  

(credit goes to commenter: R. Cotton)

Answer by Bernhard Kausler for How to trim leading and trailing whitespace in R?


A simple function to remove leading and trailing whitespace:

trim <- function( x ) {    gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)  }  

Usage:

> text = "   foo bar  baz 3 "  > trim(text)  [1] "foo bar  baz 3"  

Answer by KAA for How to trim leading and trailing whitespace in R?


I'd prefer to add the answer as comment to user56 but yet unable so writing as an independent answer. Removing leading and trailing blanks might be achieved through trim() function from gdata package as well:

require(gdata)  example(trim)  

Usage example:

> trim("   Remove leading and trailing blanks    ")  [1] "Remove leading and trailing blanks"  

Answer by wligtenberg for How to trim leading and trailing whitespace in R?


As of R 3.2.0 a new function was introduced for removing leading/trailing whitespaces:

trimws()  

See: http://stat.ethz.ch/R-manual/R-patched/library/base/html/trimws.html

(now the only issue is getting on top as the best answer... :) )

Answer by TMOTTM for How to trim leading and trailing whitespace in R?


Another related problem occurs if you have multiple spaces inbetween inputs:

> a <- "  a string         with lots   of starting, inter   mediate and trailing   whitespace     "  

You can then easily split this string into "real" tokens using a regular expression to the split argument:

> strsplit(a, split=" +")  [[1]]   [1] ""           "a"          "string"     "with"       "lots"         [6] "of"         "starting,"  "inter"      "mediate"    "and"         [11] "trailing"   "whitespace"  

Note that if there is a match at the beginning of a (non-empty) string, the first element of the output is ?""?, but if there is a match at the end of the string, the output is the same as with the match removed.

Answer by Jaap for How to trim leading and trailing whitespace in R?


Another option is to use the stri_trim function from the stringi package which defaults to removing leading and trailing whitespace:

> x <- c("  leading space","trailing space   ")  > stri_trim(x)  [1] "leading space"  "trailing space"  

For only removing leading whitespace, use stri_trim_left. For only removing trailing whitespace, use stri_trim_right. When you want to remove other leading or trailing characters, you have to specify that with pattern =.

See also ?stri_trim for more info.

Answer by Partha Roy for How to trim leading and trailing whitespace in R?


trimws() <- This method removes the whitespaces from both side of a string and return that raw string...


Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72

0 comments:

Post a Comment

Popular Posts

Powered by Blogger.