How to sum a value grouped by product and year in SAS?
Mia Lopez
I need to get the sum value of volume for each stock in each year. The data looks like
Date ID volume
2009 BA 100
2009 BA 20
2011 BA 100
2009 VOD 100
2009 VOD 150
2009 VOD 100
2013 BT 300
... ... ...What I want is
Date ID sumvolume
2009 BA 120
2011 BA 100
2009 VOD 350
2013 BT 300
... ... ...I used code
proc sql;
create table want as
select *, (select sum(volume) from data as sub where sub.data=main.date)as sumvolume
from data as main;
quit;but this one only gave the sumvolume in each year instead of sumvolume for each stock in each year.
Anyone can help me with code ? Thanks in advance!!!
3 Answers
You can use a group by statement to use summary functions (like sum()) across the groups defined by variables in the group by statement.
proc sql; create table want as select date, id, sum(volume) as sumvolume from data group by id, date;
quit; 1 You are getting the total sum of stock per year since you are using where sub.date=main.date. If you would add and sub.ID = main.ID to the where clause, you would get it per product. But that is not your expected behavior, this since you keep every individual observation by having * in your select statement and no group by statement.
Instead of a subquery on the data table, you could use group by to accomplish your wanted behaviour.
input Date ID $ volume;
datalines;
2009 BA 100
2009 BA 20
2011 BA 100
2009 VOD 100
2009 VOD 150
2009 VOD 100
2013 BT 300
;
data work.want;
input Date ID $ sumvolume;
datalines;
2009 BA 120
2011 BA 100
2009 VOD 350
2013 BT 300
;
proc sql;
create table work.wanted as
select Date, ID, sum(volume) as sumvolume
from work.data
group by Date, ID
;I'm leaving one thing to you, the sorting of the resulting table.
If your source data isn't already sorted then do that first:
proc sort data=input ; by ID date;
run ;Then you can do it in one simple pass:
data output(drop=volume) ; retain sumvolume ; set input ; by ID date ; if first.date then sumvolume=volume ; else sumvolume=sum(sumvolume,volume) ; if last.date then output ;
run ; 1