python – 如何在DataFrame中增加groupby中的行数
发布时间:2020-12-30 13:16:22  所属栏目:Python  来源:互联网 
            导读:我需要计算pandas DataFrame中每个产品的activity_months数.到目前为止,这是我的数据和代码: from pandas import DataFramefrom datetime import datetimedata = [(product_a,08/31/2013),(product_b,08/31/2013),(product_c,0
                
                
                
            | 
                         我需要计算pandas DataFrame中每个产品的activity_months数.到目前为止,这是我的数据和代码: from pandas import DataFrame
from datetime import datetime
data = [
('product_a','08/31/2013'),('product_b',('product_c',('product_a','09/30/2013'),'10/31/2013'),'10/31/2013')
]
product_df = DataFrame( data,columns=['prod_desc','activity_month'])
for index,row in product_df.iterrows():
  row['activity_month']= datetime.strptime(row['activity_month'],'%m/%d/%Y')
  product_df.loc[index,'activity_month'] = datetime.strftime(row['activity_month'],'%Y-%m-%d')
product_df = product_df.sort(['prod_desc','activity_month'])
product_df['month_num'] = product_df.groupby(['prod_desc']).size() 
 但是,这会返回month_num的NaN. 这是我想要的: prod_desc activity_month month_num product_a 2014-08-31 1 product_a 2014-09-30 2 product_a 2014-10-31 3 product_b 2014-08-31 1 product_b 2014-09-30 2 product_b 2014-10-31 3 product_c 2014-08-31 1 product_c 2014-09-30 2 product_c 2014-10-31 3 解决方法groupby是正确的想法,但正确的方法是cumcount:>>> product_df['month_num'] = product_df.groupby('product_desc').cumcount()
>>> product_df
  product_desc activity_month  prod_count    pct_ch  month_num
0    product_a     2014-01-01          53       NaN          0
3    product_a     2014-02-01          52 -0.018868          1
6    product_a     2014-03-01          50 -0.038462          2
1    product_b     2014-01-01          44       NaN          0
4    product_b     2014-02-01          43 -0.022727          1
7    product_b     2014-03-01          41 -0.046512          2
2    product_c     2014-01-01          36       NaN          0
5    product_c     2014-02-01          35 -0.027778          1
8    product_c     2014-03-01          34 -0.028571          2 
 如果你真的希望它从1开始,那么就这样做: >>> product_df['month_num'] = product_df.groupby('product_desc').cumcount() + 1
  product_desc activity_month  prod_count    pct_ch  month_num
0    product_a     2014-01-01          53       NaN          1
3    product_a     2014-02-01          52 -0.018868          2
6    product_a     2014-03-01          50 -0.038462          3
1    product_b     2014-01-01          44       NaN          1
4    product_b     2014-02-01          43 -0.022727          2
7    product_b     2014-03-01          41 -0.046512          3
2    product_c     2014-01-01          36       NaN          1
5    product_c     2014-02-01          35 -0.027778          2
8    product_c     2014-03-01          34 -0.028571          3                        (编辑:莱芜站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!  | 
                  
