How to trim leading and trailing whitespace in R?
How to trim leading and trailing whitespace in R?
I am having some troubles with leading and trailing whitespace in a data.frame. Eg I like to take a look at a specific row
in a data.frame
based on a certain condition:
> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)] [1] codeHelper country dummyLI dummyLMI dummyUMI [6] dummyHInonOECD dummyHIOECD dummyOECD <0 rows> (or 0-length row.names)
I was wondering why I didn't get the expected output since the country Austria obviously existed in my data.frame
. After looking through my code history and trying to figure out what went wrong I tried:
> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)] codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD 18 AUT Austria 0 0 0 0 1 dummyOECD 18 1
All I have changed in the command is an additional whitespace after Austria.
Further annoying problems obviously arise. Eg when I like to merge two frames based on the country column. One data.frame
uses "Austria "
while the other frame has "Austria"
. The matching doesn't work.
- Is there a nice way to 'show' the whitespace on my screen so that i am aware of the problem?
- And can I remove the leading and trailing whitespace in R?
So far I used to write a simple Perl
script which removes the whitespace but it would be nice if I can somehow do it inside R.
Answer by f3lix for How to trim leading and trailing whitespace in R?
Probably the best way is to handle the trailing whitespaces when you read your data file. If you use read.csv
or read.table
you can set the parameterstrip.white=TRUE
.
If you want to clean strings afterwards you could use one of these functions:
# returns string w/o leading whitespace trim.leading <- function (x) sub("^\\s+", "", x) # returns string w/o trailing whitespace trim.trailing <- function (x) sub("\\s+$", "", x) # returns string w/o leading or trailing whitespace trim <- function (x) gsub("^\\s+|\\s+$", "", x)
To use one of these functions on myDummy$country
:
myDummy$country <- trim(myDummy$country)
To 'show' the whitespace you could use:
paste(myDummy$country)
which will show you the strings surrounded by quotation marks (") making whitespaces easier to spot.
Answer by Jyotirmoy Bhattacharya for How to trim leading and trailing whitespace in R?
Use grep or grepl to find observations with whitespaces and sub to get rid of them.
names<-c("Ganga Din\t","Shyam Lal","Bulbul ") grep("[[:space:]]+$",names) [1] 1 3 grepl("[[:space:]]+$",names) [1] TRUE FALSE TRUE sub("[[:space:]]+$","",names) [1] "Ganga Din" "Shyam Lal" "Bulbul"
Answer by Marek for How to trim leading and trailing whitespace in R?
ad1) To see white spaces you could directly call print.data.frame
with modified arguments:
print(head(iris), quote=TRUE) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 "5.1" "3.5" "1.4" "0.2" "setosa" # 2 "4.9" "3.0" "1.4" "0.2" "setosa" # 3 "4.7" "3.2" "1.3" "0.2" "setosa" # 4 "4.6" "3.1" "1.5" "0.2" "setosa" # 5 "5.0" "3.6" "1.4" "0.2" "setosa" # 6 "5.4" "3.9" "1.7" "0.4" "setosa"
See also ?print.data.frame
for other options.
Answer by userJT for How to trim leading and trailing whitespace in R?
To manipulate the white space, use str_trim() in the stringr package. The package has manual dated Feb 15,2013 and is in CRAN. The function can also handle string vectors.
install.packages("stringr", dependencies=TRUE) require(stringr) example(str_trim) d4$clean2<-str_trim(d4$V2)
(credit goes to commenter: R. Cotton)
Answer by Bernhard Kausler for How to trim leading and trailing whitespace in R?
A simple function to remove leading and trailing whitespace:
trim <- function( x ) { gsub("(^[[:space:]]+|[[:space:]]+$)", "", x) }
Usage:
> text = " foo bar baz 3 " > trim(text) [1] "foo bar baz 3"
Answer by KAA for How to trim leading and trailing whitespace in R?
I'd prefer to add the answer as comment to user56 but yet unable so writing as an independent answer. Removing leading and trailing blanks might be achieved through trim() function from gdata package as well:
require(gdata) example(trim)
Usage example:
> trim(" Remove leading and trailing blanks ") [1] "Remove leading and trailing blanks"
Answer by wligtenberg for How to trim leading and trailing whitespace in R?
As of R 3.2.0 a new function was introduced for removing leading/trailing whitespaces:
trimws()
See: http://stat.ethz.ch/R-manual/R-patched/library/base/html/trimws.html
(now the only issue is getting on top as the best answer... :) )
Answer by TMOTTM for How to trim leading and trailing whitespace in R?
Another related problem occurs if you have multiple spaces inbetween inputs:
> a <- " a string with lots of starting, inter mediate and trailing whitespace "
You can then easily split this string into "real" tokens using a regular expression to the split
argument:
> strsplit(a, split=" +") [[1]] [1] "" "a" "string" "with" "lots" [6] "of" "starting," "inter" "mediate" "and" [11] "trailing" "whitespace"
Note that if there is a match at the beginning of a (non-empty) string, the first element of the output is ?""?, but if there is a match at the end of the string, the output is the same as with the match removed.
Answer by Jaap for How to trim leading and trailing whitespace in R?
Another option is to use the stri_trim
function from the stringi
package which defaults to removing leading and trailing whitespace:
> x <- c(" leading space","trailing space ") > stri_trim(x) [1] "leading space" "trailing space"
For only removing leading whitespace, use stri_trim_left
. For only removing trailing whitespace, use stri_trim_right
. When you want to remove other leading or trailing characters, you have to specify that with pattern =
.
See also ?stri_trim
for more info.
Answer by Partha Roy for How to trim leading and trailing whitespace in R?
trimws() <- This method removes the whitespaces from both side of a string and return that raw string...
Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72
0 comments:
Post a Comment