查找每列的唯一值

时间:2017-07-12 11:45:52

标签: python pandas numpy dataframe counter

我希望在数据框中找到每列的唯一值。 (整个数据帧的唯一值)

        Col1         Col2            Col3
1        A             A               B
2        C             A               B
3        B             B               F

Col1将C作为唯一值,Col2没有,Col3具有F.

任何天才的想法?谢谢!

2 个答案:

答案 0 :(得分:3)

您可以Series使用keep=False,然后stack - df = df.stack() .drop_duplicates(keep=False) .reset_index(level=0, drop=True) .reindex(index=df.columns) print (df) Col1 C Col2 NaN Col3 F dtype: object 删除所有内容,按drop_duplicates删除第一个级别,然后删除reset_index

print (df)
  Col1 Col2 Col3
1    A    A    B
2    C    A    X
3    B    B    F

s = df.stack().drop_duplicates(keep=False).reset_index(level=0, drop=True)
print (s)
Col1    C
Col3    X
Col3    F
dtype: object

s = s.groupby(level=0).unique().reindex(index=df.columns)
print (s)
Col1       [C]
Col2       NaN
Col3    [X, F]
dtype: object

如果每列只有一个唯一值,则上面的解决方案很有效。

我尝试创建更通用的解决方案:

<!DOCTYPE html>
<head>
  <meta charset="utf-8">
  <title>Simple Bar Chart</title>
  <script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
  <script src="https://d3js.org/d3.v4.min.js"></script> </script>
  <style>
    .bar {
      fill: steelblue;
    }
    .bar:hover {
      fill: brown;
    }
  </style>
</head>

<body>
  <div id="chart" style="width:90%;height:600px;"></div>
  <!-- For plotly code to bind to -->
  <div id="drop-down"></div>
  <script>
    // set the dimensions and margins of the graph
    var margin = { top: 20, right: 20, bottom: 80, left: 40 };
    var width = 960 - margin.left - margin.right;
    var height = 500 - margin.top - margin.bottom;

    // append the svg object to the body of the page
    // append a 'group' element to 'svg'
    // moves the 'group' element to the top left margin
    var svg = d3.select("body").append("svg")
      .attr("width", width + margin.left + margin.right)
      .attr("height", height + margin.top + margin.bottom)
      .append("g")
      .attr("transform",
          "translate(" + margin.left + "," + margin.top + ")");

     // sends asynchronous request to the url
    var HttpClient = function() {
      this.get = function(aUrl, aCallback) {
        var anHttpRequest = new XMLHttpRequest();
        anHttpRequest.onreadystatechange = function() {
          if (anHttpRequest.readyState == 4 && anHttpRequest.status == 200) {
            aCallback(anHttpRequest.responseText);
          }
        }
        anHttpRequest.open("GET", aUrl, true);
        anHttpRequest.send(null);
      }
    };

    var client = new HttpClient();
    //hard coded URL for now, will accept from UI later
    myURL = "https://neel-dot-village-test.appspot.com/_ah/api/searchApi/v1/fetchChartData?chartSpecs=%7B%22axis1%22%3A+%22name%22%2C+%22axis2%22%3A%22cumulativeNumbers.totalBudget%22%7D&topicType=%2Ftype%2Ftata%2Fproject";    

    client.get(myURL, function(response) {
      var jresp = JSON.parse(response); //get response as JS object
      plotHist(JSON.parse(jresp.json));
    });

    var chooseChart = function(data){
      buttons: [{
          method: plotHist,
          args: data,
          label: 'Histogram'
      }, {
          method: plotBar,
          args: data,
          label: 'Bar Chart'
        }]
    };

    var plotHist = function(data) {
      var plotdata = [{
        x: data.y.values,
        type: 'histogram',
        marker: {
          //color: 'rgba(100,250,100,0.7)'
        },
      }];

      var layout = {
        xaxis: { 
          title: data.y.label,
          rangeslider: {} }, //does not automatically adjust bin sizes though
        yaxis: { title: "Count" },
        updatemenus: chooseChart(data),
        autobinx: true
      };
      Plotly.newPlot('chart', plotdata, layout);
    };

    var plotBar = function(data) { //using plotly (built on d3)
      var plotdata = [{
        x: data.x.values,
        y: data.y.values,
        type: 'bar'
      }];

      var layout = {
        xaxis: { title: data.x.label },
        yaxis: { title: data.y.label },
        updatemenus: chooseChart(data)
      };

      Plotly.newPlot('chart', plotdata, layout);
    };

  </script>
</body>

答案 1 :(得分:0)

我不相信这正是你想要的,但是作为有用的信息 - 您可以使用numpy的.unique()找到DataFrame的唯一值,如下所示:

>>> np.unique(df[['Col1', 'Col2', 'Col3']])
['A' 'B' 'C' 'F']

您还可以获取特定列的唯一值,例如Col3

>>> df.Col3.unique()
['B' 'F']