根据分位数对熊猫列进行装箱

时间:2018-09-06 15:30:58

标签: python pandas quantile

我的熊猫数据框“火车”为

Name   Comb   Sales
Joy     A123   102
John    A134   112
Aby     A123   140
Amit    A123   190
Andrew  A134   210
Pren    A123   109
Abry    A134   230
Hulk    A134   188  
...

对于每个唯一的梳子,我想找到相应销售额的25%的分位数并创建各自的垃圾箱。例如,如果您为Comb ='A123'的销售额创建25%的分位数垃圾箱,您将获得(102.00 107.25 124.50 152.50 190.00)。现在,我要使用这些分位数将所有Comb ='A123'的销售额进行分类。我得到的数据将是

Name   Comb   Sales  Bin  Bin_Low  Bin_High
Joy     A123   102    1    102     107.25
John    A134   112    1    112     169
Aby     A123   140    3    124.50  152.50
Amit    A123   190    4    152.90  190
Andrew  A134   210    3    199     215
Pren    A123   109    2    107.25  124.50
Abry    A134   230    4    215     230
Hulk    A134   188    2    169     199

我创建了以下代码,但最终的数据帧格式不正确。

     quant = pd.DataFrame()
     i = ''
     for  i  in train.comb.unique():    
     a=pd.qcut(train[train.comb == i ]['Sales'], 4,duplicates='drop')
     df = pd.DataFrame(np.array(a))
     comp=pd.concat([train[train.combo == i ],df], axis=1)
     quant=quant.append(comp)

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

您可以在数据框上使用socket = /run/mysqld/mysqld.sock skip-external-locking key_buffer_size = 16M max_allowed_packet = 1M table_open_cache = 64 sort_buffer_size = 512K net_buffer_length = 8K read_buffer_size = 256K read_rnd_buffer_size = 512K myisam_sort_buffer_size = 8M slave-net-timeout = 30 binlog_ignore_db = mysql binlog_ignore_db = zoom binlog_ignore_db = performance_schema binlog_ignore_db = information_schema binlog_do_db = TK09 replicate_do_db = TK09 binlog_ignore_db = TK09_user log-bin=binlog log-slave-updates=1 binlog_format=mixed innodb_buffer_pool_size = 2G innodb_buffer_pool_instances = 8 innodb_log_buffer_size = 8M query_cache_size = 40M [mysqldump] quick max_allowed_packet = 16M [mysql] no-auto-rehash [myisamchk] key_buffer_size = 20M sort_buffer_size = 20M read_buffer = 2M write_buffer = 2M [mysqlhotcopy] interactive-timeout ,并按qcut分组。然后,将左侧分配给Comb列,将右侧分配给Bin_low。请注意,qcut在Bin_max端有一个开放时间间隔,因此这些值将比您期望的输出稍差一点,但本质上是相同的:

left