Velvet Star Monitor

Standout celebrity highlights with iconic style.

updates

Split column at delimiter in data frame [duplicate]

Writer Andrew Henderson

I would like to split one column into two within at data frame based on a delimiter. For example,

a|b
b|c

to become

a b
b c

within a data frame.

Thanks!

1

6 Answers

@Taesung Shin is right, but then just some more magic to make it into a data.frame. I added a "x|y" line to avoid ambiguities:

df <- data.frame(ID=11:13, FOO=c('a|b','b|c','x|y'))
foo <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

Or, if you want to replace the columns in the existing data.frame:

within(df, FOO<-data.frame(do.call('rbind', strsplit(as.character(FOO), '|', fixed=TRUE))))

Which produces:

 ID FOO.X1 FOO.X2
1 11 a b
2 12 b c
3 13 x y
7

The newly popular tidyr package does this with separate. It uses regular expressions so you'll have to escape the |

df <- data.frame(ID=11:13, FOO=c('a|b', 'b|c', 'x|y'))
separate(data = df, col = FOO, into = c("left", "right"), sep = "\\|")
# ID left right
# 1 11 a b
# 2 12 b c
# 3 13 x y

though in this case the defaults are smart enough to work (it looks for non-alphanumeric characters to split on).

separate(data = df, col = FOO, into = c("left", "right"))
2

Hadley has a very elegant solution to do this inside data frames in his reshape package, using the function colsplit.

require(reshape)
> df <- data.frame(ID=11:13, FOO=c('a|b','b|c','x|y'))
> df ID FOO
1 11 a|b
2 12 b|c
3 13 x|y
> df = transform(df, FOO = colsplit(FOO, split = "\\|", names = c('a', 'b')))
> df ID FOO.a FOO.b
1 11 a b
2 12 b c
3 13 x y
5

Just came across this question as it was linked in a recent question on SO.

Shameless plug of an answer: Use cSplit from my "splitstackshape" package:

df <- data.frame(ID=11:13, FOO=c('a|b','b|c','x|y'))
library(splitstackshape)
cSplit(df, "FOO", "|")
# ID FOO_1 FOO_2
# 1 11 a b
# 2 12 b c
# 3 13 x y

This particular function also handles splitting multiple columns, even if each column has a different delimiter:

df <- data.frame(ID=11:13, FOO=c('a|b','b|c','x|y'), BAR = c("A*B", "B*C", "C*D"))
cSplit(df, c("FOO", "BAR"), c("|", "*"))
# ID FOO_1 FOO_2 BAR_1 BAR_2
# 1 11 a b A B
# 2 12 b c B C
# 3 13 x y C D

Essentially, it's a fancy convenience wrapper for using read.table(text = some_character_vector, sep = some_sep) and binding that output to the original data.frame. In other words, another A base R approach could be:

df <- data.frame(ID=11:13, FOO=c('a|b','b|c','x|y'))
cbind(df, read.table(text = as.character(df$FOO), sep = "|")) ID FOO V1 V2
1 11 a|b a b
2 12 b|c b c
3 13 x|y x y
5
strsplit(c('a|b','b|c'),'|',fixed=TRUE)

Combining @Ramnath and @Tommy's answers allowed me to find an approach that works in base R for one or more columns.

Basic usage:

> df = data.frame(
+ id=1:3, foo=c('a|b','b|c','c|d'),
+ bar=c('p|q', 'r|s', 's|t'), stringsAsFactors=F)
> transform(df, test=do.call(rbind, strsplit(foo, '|', fixed=TRUE)), stringsAsFactors=F) id foo bar test.1 test.2
1 1 a|b p|q a b
2 2 b|c r|s b c
3 3 c|d s|t c d

Multiple columns:

> transform(df, lapply(list(foo,bar),
+ function(x)do.call(rbind, strsplit(x, '|', fixed=TRUE))), stringsAsFactors=F) id foo bar X1 X2 X1.1 X2.1
1 1 a|b p|q a b p q
2 2 b|c r|s b c r s
3 3 c|d s|t c d s t

Better naming of multiple split columns:

> transform(df, lapply({l<-list(foo,bar);names(l)=c('foo','bar');l},
+ function(x)do.call(rbind, strsplit(x, '|', fixed=TRUE))), stringsAsFactors=F) id foo bar foo.1 foo.2 bar.1 bar.2
1 1 a|b p|q a b p q
2 2 b|c r|s b c r s
3 3 c|d s|t c d s t