使用特定时间间隔对pandas时间序列数据帧进行分组

时间:2015-11-04 01:19:08

标签: python csv pandas

我有一个带有时间戳数据的大型csv文件,格式为2015-04-01 10:26:41。数据跨越多个月,条目从30秒到数小时不等。它的列是id,时间,速度。

最终,我希望按照15分钟的时间间隔对数据进行分组,然后计算平均速度,但是很多条目都在15分钟的时间段内。

我正在尝试使用Pandas,因为它似乎有一个坚实的时间序列工具,它可能很容易做到这一点,但我在第一个障碍下降。

到目前为止,我已将CSV导入为数据框,并且所有列的dtype均为object。我按日期对数据进行了排序,现在我正在尝试按时间间隔对条目进行分组,这是我正在努力的地方。基于谷歌搜索,我尝试使用此代码resample df.resample('5min', how=sum)数据TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex我收到错误groupby。我正在考虑尝试使用lambda方法,可能使用df.groupby(lambda x:x.minutes + 5),因为AttributeError: 'str' object has no attribute 'minutes'会产生错误dtype

基本上我对a)pandas是否有时间序列数据感到困惑,因为它的object 0 1 2 3 0 id boat_id time speed 1 386226 32 2015-01-15 05:14:32 4.2343243 2 386285 32 2015-01-15 05:44:57 3.45234 ,而b)是否可以识别我似乎无法让时间间隔缩小。

热衷于了解是否有人能指出我正确的方向。

DF看起来像这样

public class AsyncHttpTask extends AsyncTask<String, Void, Integer> {

    @Override
    protected Integer doInBackground(String... params) {

        ArrayList<NameValuePair> nameValuePairs = new ArrayList<NameValuePair>();
        nameValuePairs.add(new BasicNameValuePair("imovel_id", i_id));

        try {
            HttpClient httpclient = new DefaultHttpClient();
            HttpPost httppost = new HttpPost("http://meuwebsite.com/panel/json_images.php");
            httppost.setEntity(new UrlEncodedFormEntity(nameValuePairs));
            HttpResponse response = httpclient.execute(httppost);
            HttpEntity entity = response.getEntity();
            inputStream = entity.getContent();
            Log.e("pass 1", "connection success ");
        } catch (Exception e) {
            Log.e("Fail 1", e.toString());
            Toast.makeText(getApplicationContext(), "Invalid IP Address",
                    Toast.LENGTH_LONG).show();
        }

        try {
            BufferedReader reader = new BufferedReader
                    (new InputStreamReader(inputStream, "UTF-8"));
            StringBuilder sb = new StringBuilder();
            while ((line = reader.readLine()) != null) {
                sb.append(line + "\n");
            }
            inputStream.close();
            result = sb.toString();
            Log.e("pass 2", "connection success ");
        } catch (Exception e) {
            Log.e("Fail 2", e.toString());
        }

        try {
            JSONObject json_data = new JSONObject(result);
            i_id = (json_data.getString("imovel_id"));
            Log.e("pass 1", "id do imovel = " + i_id);
        } catch (Exception e) {
            Log.e("Fail 3", e.toString());
        }


        Integer result = 0;
        try {
            // Create Apache HttpClient
            HttpClient httpclient = new DefaultHttpClient();
            HttpResponse httpResponse = httpclient.execute(new HttpGet(params[0]));
            int statusCode = httpResponse.getStatusLine().getStatusCode();

            // 200 represents HTTP OK
            if (statusCode == 200) {
                String response = streamToString(httpResponse.getEntity().getContent());
                parseResult(response);
                result = 1; // Successful
            } else {
                result = 0; //"Failed
            }
        } catch (Exception e) {
            Log.d(TAG, e.getLocalizedMessage());
        }

        return result;
    }

    @Override
    protected void onPostExecute(Integer result) {
        // Download complete. Lets update UI

        if (result == 1) {
            mGridAdapter.setGridData(mGridData);
        } else {
            Toast.makeText(GridViewActivity.this, "Failed to fetch data!", Toast.LENGTH_SHORT).show();
        }

        //Hide progressbar
        mProgressBar.setVisibility(View.GONE);
    }
}


String streamToString(InputStream stream) throws IOException {
    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(stream));
    String line;
    String result = "";
    while ((line = bufferedReader.readLine()) != null) {
        result += line;
    }

    // Close stream
    if (null != stream) {
        stream.close();
    }
    return result;
}

/**
 * Parsing the feed results and get the list
 *
 * @param result
 */
private void parseResult(String result) {
    try {
        JSONObject response = new JSONObject(result);
        JSONArray posts = response.optJSONArray("posts");
        GridItem item;

        for (int i = 0; i < posts.length(); i++) {

            JSONObject post = posts.optJSONObject(i);

            item = new GridItem();
            item.setImage(post.getString("images"));

            mGridData.add(item);
        }
    } catch (JSONException e) {
        e.printStackTrace();
    }
}

2 个答案:

答案 0 :(得分:2)

首先,看起来你读了一个空行。您可能希望跳过文件pd.read_csv(filename, skiprows=1)中的第一行。

您应该使用pd.to_datetime()将时间的文本表示转换为DatetimeIndex。

df.set_index(pd.to_datetime(df['time']), inplace=True)

然后您应该可以重新取样。

df.resample('15min', how=np.mean)

答案 1 :(得分:0)

亚历山大的回答是正确的;还要注意你可以做到

df = pd.read_csv('myfile.csv', parse_dates=True)

如果格式合理,您的日期列应该具有日期时间类型。然后你可以设置索引和重新采样,如上所述。