Stata to R Replace Values based on Condition
Sebastian Wright
I'm trying to do something very simple in R that I can do in Stata but I can't quite get it right.
Here is my sample of my data
data<-data.frame( C1=c(rep(2,5), rep(20,5), rep(70,5)), C2=c(rep(20,5), rep(70,5), rep(80,5)), year=rep(1990:1994, 3), VAR1=NA, VAR2=NA, VAR3=NA
)in Stata I can do this
replace VAR1=1 if CC1=2 & CC2==20 & year == 1990
replace VAR2=60 if CC1=2 & CC2==20 & year == 1990
replace VAR3=70 if CC1=2 & CC2==20 & year == 1990annoyingly Stata syntax does not allow
replace VAR1=1 & VAR2=60 & VAR3=70 if CC1=2 & CC2==20 & year == 1990using the first Stata code
this
data1<-data.frame(C1=c(2),C2=c(20),year=c(1990),VAR1=NA,VAR2=NA,VAR3=NA)becomes this
data2<-data.frame(C1=c(2),C2=c(20),year=c(1990),VAR1=c(1),VAR2=c(60),VAR3=c(70))I can't find anything similar to this problem (it's very likely that I'm not asking/looking for the right phrase)
I'd like to do either the 1st but preferably the 2nd Stata command in R.
12 Answers
If your condition is going to remain the same for all the columns you can calculate them once to get indices in different column and assign the values together.
inds <- with(data, C1 == 2 & C2 == 20 & year == 1990)
data[inds, paste0("VAR", 1:3)] <- as.list(c(1, 60, 70))
data
# C1 C2 year VAR1 VAR2 VAR3
#1 2 20 1990 1 60 70
#2 2 20 1991 NA NA NA
#3 2 20 1992 NA NA NA
#4 2 20 1993 NA NA NA
#5 2 20 1994 NA NA NA
#6 20 70 1990 NA NA NA
#7 20 70 1991 NA NA NA
#8 20 70 1992 NA NA NA
#9 20 70 1993 NA NA NA
#10 20 70 1994 NA NA NA
#11 70 80 1990 NA NA NA
#12 70 80 1991 NA NA NA
#13 70 80 1992 NA NA NA
#14 70 80 1993 NA NA NA
#15 70 80 1994 NA NA NAIf you might have different conditions for different columns you can have a look at dplyr package which makes it easier such replacement using pipes
library(dplyr)
data %>% mutate(VAR1 = replace(VAR1, C1 == 2 & C2 == 20 & year == 1990, 1), VAR2 = replace(VAR2, C1 == 2 & C2 == 20 & year == 1990, 60), VAR3 = replace(VAR3, C1 == 2 & C2 == 20 & year == 1990, 70)) Here is one option using data.table
library(data.table)
nm1 <- grep("VAR", names(data))
setDT(data)[C1 == 2 & C2 == 20 & year == 1990, (nm1) := .(1, 60, 70)]
data
# C1 C2 year VAR1 VAR2 VAR3
# 1: 2 20 1990 1 60 70
# 2: 2 20 1991 NA NA NA
# 3: 2 20 1992 NA NA NA
# 4: 2 20 1993 NA NA NA
# 5: 2 20 1994 NA NA NA
# 6: 20 70 1990 NA NA NA
# 7: 20 70 1991 NA NA NA
# 8: 20 70 1992 NA NA NA
# 9: 20 70 1993 NA NA NA
#10: 20 70 1994 NA NA NA
#11: 70 80 1990 NA NA NA
#12: 70 80 1991 NA NA NA
#13: 70 80 1992 NA NA NA
#14: 70 80 1993 NA NA NA
#15: 70 80 1994 NA NA NAOr another option is to set the key while creating the data.table and then specify the i with the values
setDT(data, key = c("C1", "C2", "year"))
data[.(2, 20, 1990), (nm1) := .(1, 60, 70)]Or using tidyverse
library(tidyverse)
i1 <- with(data, C1 == 2 & C2 == 20 & year == 1990)
data %>% select(starts_with("VAR")) %>% map2_df(., c(1, 60, 70), ~ replace(.x, i1, .y)) %>% bind_cols(data %>% select(1:3), .)data
data <- structure(list(C1 = c(2, 2, 2, 2, 2, 20, 20, 20, 20, 20, 70,
70, 70, 70, 70), C2 = c(20, 20, 20, 20, 20, 70, 70, 70, 70, 70,
80, 80, 80, 80, 80), year = c(1990L, 1991L, 1992L, 1993L, 1994L,
1990L, 1991L, 1992L, 1993L, 1994L, 1990L, 1991L, 1992L, 1993L,
1994L), VAR1 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_), VAR2 = c(NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_), VAR3 = c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_)),
class = "data.frame", row.names = c(NA,
-15L)) 0