Fill NA values with the trailing row value times a growth rate?
Fill NA values with the trailing row value times a growth rate?
What would be a good way to populate NA
values with the previous value times (1+growth)
?
df <- data.frame(year=0:6, price1=c(1.1, 2.1, 3.2, 4.8, NA, NA, NA), price2=c(1.1, 2.1, 3.2, NA, NA, NA, NA)) growth <- .02
In this case, I would want the missing values in price1
to be filled with 4.8*1.02
, 4.8*1.02^2
, and 4.8*1.02^3
. Similarly, I would want the missing values in price2
to be filled with 3.2*1.02
, 3.2*1.02^2
, 3.2*1.02^3
, and 3.2*1.02^4
.
I've tried this, but I think it needs to be set to repeat somehow (apply
?):
library(dplyr) df %>% mutate(price1=ifelse(is.na(price1), lag(price1)*(1+growth), price1))
I'm not using dplyr
for anything else (yet), so something from base R or plyr
or similar would be appreciated.
Answer by Batanichek for Fill NA values with the trailing row value times a growth rate?
You can try such function
test <- function(x,n) { if (!is.na(df[x,n])) return (df[x,n]) else return (test(x-1,n)*(1+growth)) } a=1:nrow(df) lapply(a, FUN=function(i) test(i,2)) unlist(lapply(a, FUN=function(i) test(i,2)))
[1] 1.100000 2.100000 3.200000 4.800000 4.896000 4.993920 5.093798
Answer by Ben Bolker for Fill NA values with the trailing row value times a growth rate?
Assuming only trailing NAs:
NAgrow <- function(x,growth=0.02) { isna <- is.na(x) lastval <- tail(x[!isna],1) x[isna] <- lastval*(1+growth)^seq(sum(isna)) return(x) }
If there are interior NA
values as well this would get a little trickier.
Apply to all columns except the first:
df[-1] <- lapply(df[-1],NAgrow) ## year price1 price2 ## 1 0 1.100000 1.100000 ## 2 1 2.100000 2.100000 ## 3 2 3.200000 3.200000 ## 4 3 4.800000 3.264000 ## 5 4 4.896000 3.329280 ## 6 5 4.993920 3.395866 ## 7 6 5.093798 3.463783
Answer by cr1msonB1ade for Fill NA values with the trailing row value times a growth rate?
It looks like dplyr
can't handle access newly assigned lag values. Here is a solution that should work even if the NA
's are in the middle of a column.
df <- apply( df, 2, function(x){ if(sum(is.na(x)) == 0){return(x)} ## updated with optimized portion from @josilber r <- rle(is.na(x)) na.loc <- which(r$values) b <- rep(cumsum(r$lengths)[na.loc-1], r$lengths[na.loc]) lastValIs <- 1:length(x) lastValI[is.na(x)] <- b x[is.na(x)] <- sapply(which(is.na(x)), function(i){ return(x[lastValIs[i]]*(1 + growth)^(i - lastValIs[i])) }) return(x) })
Answer by josliber for Fill NA values with the trailing row value times a growth rate?
The following solution based on rle
works with NA in any position and does not rely on looping to fill in the missing values:
NAgrow.rle <- function(x) { if (is.na(x[1])) stop("Can't have NA at beginning") r <- rle(is.na(x)) na.loc <- which(r$values) b <- rep(cumsum(r$lengths)[na.loc-1], r$lengths[na.loc]) x[is.na(x)] <- ave(x[b], b, FUN=function(y) y[1]*(1+growth)^seq_along(y)) x } df[,-1] <- lapply(df[,-1], NAgrow.rle) # year price1 price2 # 1 0 1.100000 1.100000 # 2 1 2.100000 2.100000 # 3 2 3.200000 3.200000 # 4 3 4.800000 3.264000 # 5 4 4.896000 3.329280 # 6 5 4.993920 3.395866 # 7 6 5.093798 3.463783
I'll drop in two additional solutions using for loops, one in base R and one in Rcpp:
NAgrow.for <- function(x) { for (i in which(is.na(x))) { x[i] <- x[i-1] * (1+growth) } x } library(Rcpp) cppFunction( "NumericVector NAgrowRcpp(NumericVector x, double growth) { const int n = x.size(); NumericVector y(x); for (int i=1; i < n; ++i) { if (R_IsNA(x[i])) { y[i] = (1.0 + growth) * y[i-1]; } } return y; }")
The solutions based on rle
(crimson
and josilber.rle
) take about twice as long as the simple solution based on a for loop (josilber.for
), and as expected the Rcpp solution is the fastest, running in about 0.002 seconds.
set.seed(144) big.df <- data.frame(ID=1:100000, price1=sample(c(1:10, NA), 100000, replace=TRUE), price2=sample(c(1:10, NA), 100000, replace=TRUE)) crimson <- function(df) apply(df[,-1], 2, function(x){ if(sum(is.na(x)) == 0){return(x)} ## updated with optimized portion from @josilber r <- rle(is.na(x)) na.loc <- which(r$values) b <- rep(cumsum(r$lengths)[na.loc-1], r$lengths[na.loc]) lastValIs <- 1:length(x) lastValIs[is.na(x)] <- b x[is.na(x)] <- sapply(which(is.na(x)), function(i){ return(x[lastValIs[i]]*(1 + growth)^(i - lastValIs[i])) }) return(x) }) ggrothendieck <- function(df) { growthfun <- function(x, y) if (is.na(y)) (1+growth)*x else y lapply(df[,-1], Reduce, f = growthfun, acc = TRUE) } josilber.rle <- function(df) lapply(df[,-1], NAgrow.rle) josilber.for <- function(df) lapply(df[,-1], NAgrow.for) josilber.rcpp <- function(df) lapply(df[,-1], NAgrowRcpp, growth=growth) library(microbenchmark) microbenchmark(crimson(big.df), ggrothendieck(big.df), josilber.rle(big.df), josilber.for(big.df), josilber.rcpp(big.df)) # Unit: milliseconds # expr min lq mean median uq max neval # crimson(big.df) 98.447546 131.063713 161.494366 152.477661 183.175840 379.643222 100 # ggrothendieck(big.df) 437.015693 667.760401 822.530745 817.864707 925.974019 1607.352929 100 # josilber.rle(big.df) 59.678527 115.220519 132.874030 127.476340 151.665657 262.003756 100 # josilber.for(big.df) 21.076516 57.479169 73.860913 72.959536 84.846912 178.412591 100 # josilber.rcpp(big.df) 1.248793 1.894723 2.373469 2.190545 2.697246 5.646878 100
Answer by G. Grothendieck for Fill NA values with the trailing row value times a growth rate?
A compact base R solution can be obtained using Reduce
:
growthfun <- function(x, y) if (is.na(y)) (1+growth)*x else y replace(df, TRUE, lapply(df, Reduce, f = growthfun, acc = TRUE))
giving:
year price1 price2 1 0 1.100000 1.100000 2 1 2.100000 2.100000 3 2 3.200000 3.200000 4 3 4.800000 3.264000 5 4 4.896000 3.329280 6 5 4.993920 3.395866 7 6 5.093798 3.463783
Note: The data in the question has no non-trailing NA values but if there were some then we could use na.fill
from zoo to first replace the trailing NAs with a special value, such as NaN, and look for it instead of NA:
library(zoo) DF <- as.data.frame(na.fill(df, c(NA, NA, NaN))) growthfun <- function(x, y) if (is.nan(y)) (1+growth)*x else y replace(DF, TRUE, lapply(DF, Reduce, f = growthfun, acc = TRUE))
Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72
0 comments:
Post a Comment