Question

所以让我再试一次。我使用以下内容创建了数据透视表：

---
title: "Untitled"
output: html_document
params:
  n: 100
  test: !r c(mtcars, cars)
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r test}
summary(params$test)
```

返回：

df3.reset_index(inplace=True)
surveys = df3.groupby(['cohort','nps']).agg({'id': pd.Series.nunique})
surveys['%'] = surveys['id'] / surveys.id.sum()

以此类推。

我正试图弄清楚如何才能为各自的同类人群求和。

所以理想的情况是：

cohort    status       count     %
---------------------------------
2017-01 sad         188  0.009276
        ok           53  0.002615
        happy       253  0.012483
2017-02 sad         174  0.008585
        ok          113  0.005575
        happy       247  0.012187
2017-03 sad         221  0.010904
        ok          60   0.002960
        happy       299  0.014752

然后我可以取消nps的堆栈，并从批评者中减去启动子，而忽略被动。

这里有没有人做过这样的事情？

Answer 1

不用说如何获取原始数据就很难说，但是您可以通过替换最后一行来获得所需的结果：

surveys['%'] = surveys['id'] / surveys.id.sum()

使用：

surveys['%'] = surveys.groupby('cohort')['count'].transform(lambda x: x/sum(x))

例如，从您的数据框开始，例如：

>>> surveys
    cohort status  count
0  2017-01    sad    188
1  2017-01     ok     53
2  2017-01  happy    253
3  2017-02    sad    174
4  2017-02     ok    113
5  2017-02  happy    247
6  2017-03    sad    221
7  2017-03     ok     60
8  2017-03  happy    299

然后：

>>> surveys['%'] = surveys.groupby('cohort')['count'].transform(lambda x: x/sum(x))
>>> surveys
    cohort status  count         %
0  2017-01    sad    188  0.380567
1  2017-01     ok     53  0.107287
2  2017-01  happy    253  0.512146
3  2017-02    sad    174  0.325843
4  2017-02     ok    113  0.211610
5  2017-02  happy    247  0.462547
6  2017-03    sad    221  0.381034
7  2017-03     ok     60  0.103448
8  2017-03  happy    299  0.515517

Answer 2

您可能想改用>>> span = soup.find('span') >>> span <span>Geolocation: 35.1391, -90.0536</span> >>> span.text 'Geolocation: 35.1391, -90.0536' >>> span.get_text() 'Geolocation: 35.1391, -90.0536'：

pivot_table

这取决于您的下一个目标是什么，但这可能是更好的格式...

数据透视表索引中的聚合

2 个答案: