R包/函数,用于比较多个数据集的描述性统计数据

时间:2017-07-05 18:28:45

标签: r statistics anova

我有一个数据集,分为2个部分,用于训练和测试预测模型。两个子集之间的表现非常不同,因此我想进入并比较两组的描述性统计数据。

我知道有一些功能可以为单个数据集(即汇总,描述等)执行此操作,但不知道为比较2组而构建的任何功能。

是否有任何功能可以将两个或更多数据集中的描述性统计数据(平均值,中位数,最大值,最小值,频繁出现的值,NA等)组合起来,以便于比较?

理想情况下列出这个:

{
  "objects": [
    {
      "failureAndRerunMode": "CASCADE",
      "resourceRole": "DataPipelineDefaultResourceRole",
      "role": "DataPipelineDefaultRole",
      "pipelineLogUri": "#{myS3LogsPath}",
      "scheduleType": "ONDEMAND",
      "name": "Default",
      "id": "Default"
    },
    {
      "database": {
        "ref": "DatabaseId_WC2j5"
      },
      "name": "DefaultSqlDataNode1",
      "id": "SqlDataNodeId_VevnE",
      "type": "SqlDataNode",
      "selectQuery": "#{myRDSSelectQuery}",
      "table": "#{myRDSTable}"
    },
    {
      "*password": "#{*myRDSPassword}",
      "name": "RDS_database",
      "id": "DatabaseId_WC2j5",
      "type": "RdsDatabase",
      "rdsInstanceId": "#{myRDSId}",
      "username": "#{myRDSUsername}"
    },
    {
      "output": {
        "ref": "S3DataNodeId_iYhHx"
      },
      "input": {
        "ref": "SqlDataNodeId_VevnE"
      },
      "name": "DefaultCopyActivity1",
      "runsOn": {
        "ref": "ResourceId_G9GWz"
      },
      "id": "CopyActivityId_CapKO",
      "type": "CopyActivity"
    },
    {
      "dependsOn": {
        "ref": "CopyActivityId_CapKO"
      },
      "filePath": "#{myS3Container}#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}",
      "name": "DefaultS3DataNode1",
      "id": "S3DataNodeId_iYhHx",
      "type": "S3DataNode"
    },
    {
      "resourceRole": "DataPipelineDefaultResourceRole",
      "role": "DataPipelineDefaultRole",
      "instanceType": "m1.medium",
      "name": "DefaultResource1",
      "id": "ResourceId_G9GWz",
      "type": "Ec2Resource",
      "terminateAfter": "30 Minutes"
    }
  ],
  "parameters": [
  ]
}

0 个答案:

没有答案