Velvet Star Monitor

Standout celebrity highlights with iconic style.

news

Extract first and last row of a dataframe in pandas

Writer Olivia Zamora

How can I extract the first and last rows of a given dataframe as a new dataframe in pandas?

I've tried to use iloc to select the desired rows and then concat as in:

df=pd.DataFrame({'a':range(1,5), 'b':['a','b','c','d']})
pd.concat([df.iloc[0,:], df.iloc[-1,:]])

but this does not produce a pandas dataframe:

a 1
b a
a 4
b d
dtype: object

6 Answers

I think the most simple way is .iloc[[0, -1]].

df = pd.DataFrame({'a':range(1,5), 'b':['a','b','c','d']})
df2 = df.iloc[[0, -1]]
print(df2) a b
0 1 a
3 4 d
2

You can also use head and tail:

In [29]: pd.concat([df.head(1), df.tail(1)])
Out[29]: a b
0 1 a
3 4 d
1

The accepted answer duplicates the first row if the frame only contains a single row. If that's a concern

df[0::len(df)-1 if len(df) > 1 else 1]

works even for single row-dataframes.

Example: For the following dataframe this will not create a duplicate:

df = pd.DataFrame({'a': [1], 'b':['a']})
df2 = df[0::len(df)-1 if len(df) > 1 else 1]
print df2 a b
0 1 a

whereas this does:

df3 = df.iloc[[0, -1]]
print df3 a b
0 1 a
0 1 a

because the single row is the first AND last row at the same time.

1

I think you can try add parameter axis=1 to concat, because output of df.iloc[0,:] and df.iloc[-1,:] are Series and transpose by T:

print df.iloc[0,:]
a 1
b a
Name: 0, dtype: object
print df.iloc[-1,:]
a 4
b d
Name: 3, dtype: object
print pd.concat([df.iloc[0,:], df.iloc[-1,:]], axis=1) 0 3
a 1 4
b a d
print pd.concat([df.iloc[0,:], df.iloc[-1,:]], axis=1).T a b
0 1 a
3 4 d
0

Here is the same style as in large datasets:

x = df[:5]
y = pd.DataFrame([['...']*df.shape[1]], columns=df.columns, index=['...'])
z = df[-5:]
frame = [x, y, z]
result = pd.concat(frame)
print(result)

Output:

 date temp
0 1981-01-01 00:00:00 20.7
1 1981-01-02 00:00:00 17.9
2 1981-01-03 00:00:00 18.8
3 1981-01-04 00:00:00 14.6
4 1981-01-05 00:00:00 15.8
... ... ...
3645 1990-12-27 00:00:00 14
3646 1990-12-28 00:00:00 13.6
3647 1990-12-29 00:00:00 13.5
3648 1990-12-30 00:00:00 15.7
3649 1990-12-31 00:00:00 13

Alternatively you can use take:

In [3]: df.take([0, -1])
Out[3]: a b
0 1 a
3 4 d

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy