如何检查float pandas列是否只包含整数?

时间:2018-03-13 06:46:11

标签: python pandas floating-point precision

我有一个数据框

 public static async Task<TelemetryClient> GeTelemetryClient()
        {
            var azureServiceTokenProvider = new AzureServiceTokenProvider();

            var keyVaultClient =
                new KeyVaultClient(
                    new KeyVaultClient.AuthenticationCallback(azureServiceTokenProvider.KeyVaultTokenCallback));

            var secret = await keyVaultClient.GetSecretAsync("https://{keyvault}.vault.azure.net/secrets/{sceret}")
                .ConfigureAwait(false);
            return new TelemetryClient {InstrumentationKey = secret.Value};
        }

        // This correllates all telemetry with the current Function invocation
        private static void UpdateTelemetryContext(TelemetryContext context, ExecutionContext functionContext,
            string userName)
        {
            context.Operation.Id = functionContext.InvocationId.ToString();
            context.Operation.ParentId = functionContext.InvocationId.ToString();
            context.Operation.Name = functionContext.FunctionName;
            context.User.Id = userName;
        }

        [FunctionName("Function1")]

        public static async Task<HttpResponseMessage> Run(
            [HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequestMessage req,
            ExecutionContext context, ILogger log)
        {
            log.LogInformation("C# HTTP trigger function processed a request.");
            DateTime start = DateTime.UtcNow;

            // parse query parameter
            string name = req.GetQueryNameValuePairs()
                .FirstOrDefault(q => string.Compare(q.Key, "name", true) == 0)
                .Value;

            // Get request body
            dynamic data = await req.Content.ReadAsAsync<object>();

            // Set name to query string or body data
            name = name ?? data?.name;

            // Track an Event
            var telemetryClient = await GeTelemetryClient().ConfigureAwait(false);
            var evt = new EventTelemetry("Function called");
            UpdateTelemetryContext(evt.Context, context, name);

            telemetryClient.TrackEvent(evt);

            // Track a Metric
            var metric = new MetricTelemetry("Test Metric", DateTime.Now.Millisecond);
            UpdateTelemetryContext(metric.Context, context, name);
            telemetryClient.TrackMetric(metric);

            // Track a Dependency
            var dependency = new DependencyTelemetry
            {
                Name = "GET api/planets/1/",
                Target = "swapi.co",
                Data = "https://swapi.co/api/planets/1/",
                Timestamp = start,
                Duration = DateTime.UtcNow - start,
                Success = true
            };
            UpdateTelemetryContext(dependency.Context, context, name);
            telemetryClient.TrackDependency(dependency);

            return name == null
                ? req.CreateResponse(HttpStatusCode.BadRequest,
                    "Please pass a name on the query string or in the request body")
                : req.CreateResponse(HttpStatusCode.OK, "Hello " + name);
        }
    }
}

如何确保df = pd.DataFrame(data=np.arange(10),columns=['v']).astype(float) 中的数字是整数? 我非常关心舍入/截断/浮点表示错误

4 个答案:

答案 0 :(得分:11)

astype(int)

的比较

暂时将您的专栏转换为int并使用np.array_equal进行测试:

np.array_equal(df.v, df.v.astype(int))
True

float.is_integer

您可以将此python函数与apply

结合使用
df.v.apply(float.is_integer).all()
True

或者,在生成器理解中使用python&#39; s all来提高空间效率:

all(x.is_integer() for x in df.v)
True

答案 1 :(得分:8)

这是一种更简单,而且可能更快的方法:

(df[col] % 1  == 0).all()

要忽略空值:

(df[col].fillna(-9999) % 1  == 0).all()

答案 2 :(得分:2)

如果要检查数据框中的多个浮点列,可以执行以下操作:

col_should_be_int = df.select_dtypes(include=['float']).applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = df.loc[:, float_to_int_cols].astype(int)

请记住,如果浮动列包含np.NaN值,则不会选择包含所有整数的浮动列。要将缺少值的浮点型列转换为整数,您需要填充/删除缺少的值,例如,使用中位插补:

float_cols = df.select_dtypes(include=['float'])
float_cols = float_cols.fillna(float_cols.median().round()) # median imputation
col_should_be_int = float_cols.applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = float_cols[float_to_int_cols].astype(int)

答案 3 :(得分:0)

出于完整性考虑, Pandas v1.0 + 提供了convert_dtypes() utility,即(在其他3次转换中)对仅包含整数的所有数据框列(或序列)执行请求的操作

如果您只想将转换限制为单个列,则可以执行以下操作:

>>> df.dtypes          # inspect previous dtypes
v                      float64

>>> df["v"] = df["v"].convert_dtype()
>>> df.dtypes          # inspect converted dtypes
v                      Int64
相关问题