如何在不同范围之间的熊猫数据框中的列上应用多个规范化

时间:2019-01-18 16:47:38

标签: python pandas dataframe normalization

我想在以下情况下使用for循环对Pandas数据框中的多列应用规范化:

A,B列之间的归一化: [-1,+1]

C列的标准化: [-40,+150]

并将结果替换到数据框中,并将其存储为csv文件。

我的数据是txt文件,如下:

id_set: 000
     A: 3.29117131
     B: -3.68965849
     C: 345.9876546

我已经定义了normalize函数,并在if子句和else子句之后调用它,并打印它们以控制其工作,但是最后我找不到如何替换它结果以new_value的形式出现在名为df_norm的新数据框中。

def normalize(value, min_value, max_value, min_norm, max_norm):
    new_value = ((max_norm - min_norm)*((value - min_value)/(max_value - min_value))) + min_norm
    return new_value

#Split data in three different lists A, B and C
dff = pd.read_csv('D:\me4.TXT', header=None)
id_set = dff[dff.index % 4 == 0].astype('int').values
A = dff[dff.index % 4 == 1].values
B = dff[dff.index % 4 == 2].values
C = dff[dff.index % 4 == 3].values

#df contains all the data
df = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])  
data = {'A': A[:,0], 'B': B[:,0],'C': C[:,0]}

#next iteration create all plots, change the number of cycles
for i in df:
    min_val = df[i].min()
    max_val = df[i].max()
    if 'C' in i:
        #Applying normalization for C between [-40,+150]
        new_value = normalize(df[i].values, min_val, max_val, -40, 150)
    else:
        #Applying normalization for A , B between [-1,+1]
        new_value = normalize(df[i].values, min_val, max_val, -1, 1)

df_norm = pd.df(new_value)
#df_norm = df[i].new_value()
print(df_norm)
df_norm.to_csv('df_norm.csv', header=None, index=None) 

我想要的输出应该是:

           A         B           C
000   -0.716746  0.158663  112.403310
010   -0.726023  0.037448  113.289702
020   -0.716746  0.165824  112.567557
030   -0.726040 -0.104426  150.000000
040   -0.693538  0.208556  112.372881
050   -0.104061  0.158573  112.176238
060   -0.735354  0.144351  112.148590
070   -0.712112  0.151505  111.973514
080   -0.336932  0.215719  113.076807
090   -0.698181  0.130189  111.839319
010    0.068357 -0.019388  114.346421
011    0.022007  0.165824  112.381444

后来我想应用高斯函数来发展这种归一化。

2 个答案:

答案 0 :(得分:0)

也许尝试一次更改一列:

for i in main_data:
    min_val = df[i].min()
    max_val = df[i].max()
    if 'C' in i:
        #Applying normalization for C between [-40,+150]
        new_value = normalize(df[i].values, min_val, max_val, -40, 150)
    else:
        #Applying normalization for A , B between [-1,+1]
        new_value = normalize(df[i].values, min_val, max_val, -1, 1)
    df_norm[i] = new_value 

# df_norm = pd.df(new_value)

答案 1 :(得分:0)

这是一个可行的示例,可能有解决方案:

import pandas as pd
import random
import numpy as np

a = list(random.sample(range(0,1000),100))
b = list(random.sample(range(0,1000),100))
c = list(random.sample(range(0,1000),100))

df = pd.DataFrame({'A':a, 'B':b, 'C': c})

my_dct = {'key_a': [],'key_b': [],'key_c': []}
for i in df.columns:
    min_val = df[i].min()
    max_val = df[i].max()
    if i=='C':
        #Applying normalization for C between [-40,+150]
        my_dct['key_c'] = normalize(df[i].values, min_val, max_val, -40, 150)
    elif i=='A':
        #Applying normalization for A , B between [-1,+1]
        my_dct['key_a'] = normalize(df[i].values, min_val, max_val, -1, 1)
    else:
        my_dct['key_b'] = normalize(df[i].values, min_val, max_val, -1, 1)

df2 = pd.DataFrame(my_dct)

df2.to_csv('my_file.csv')