我想在以下情况下使用for循环对Pandas数据框中的多列应用规范化:
A,B列之间的归一化: [-1,+1]
C列的标准化: [-40,+150]
并将结果替换到数据框中,并将其存储为csv文件。
我的数据是txt文件,如下:
id_set: 000
A: 3.29117131
B: -3.68965849
C: 345.9876546
我已经定义了normalize
函数,并在if
子句和else
子句之后调用它,并打印它们以控制其工作,但是最后我找不到如何替换它结果以new_value的形式出现在名为df_norm
的新数据框中。
def normalize(value, min_value, max_value, min_norm, max_norm):
new_value = ((max_norm - min_norm)*((value - min_value)/(max_value - min_value))) + min_norm
return new_value
#Split data in three different lists A, B and C
dff = pd.read_csv('D:\me4.TXT', header=None)
id_set = dff[dff.index % 4 == 0].astype('int').values
A = dff[dff.index % 4 == 1].values
B = dff[dff.index % 4 == 2].values
C = dff[dff.index % 4 == 3].values
#df contains all the data
df = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])
data = {'A': A[:,0], 'B': B[:,0],'C': C[:,0]}
#next iteration create all plots, change the number of cycles
for i in df:
min_val = df[i].min()
max_val = df[i].max()
if 'C' in i:
#Applying normalization for C between [-40,+150]
new_value = normalize(df[i].values, min_val, max_val, -40, 150)
else:
#Applying normalization for A , B between [-1,+1]
new_value = normalize(df[i].values, min_val, max_val, -1, 1)
df_norm = pd.df(new_value)
#df_norm = df[i].new_value()
print(df_norm)
df_norm.to_csv('df_norm.csv', header=None, index=None)
我想要的输出应该是:
A B C
000 -0.716746 0.158663 112.403310
010 -0.726023 0.037448 113.289702
020 -0.716746 0.165824 112.567557
030 -0.726040 -0.104426 150.000000
040 -0.693538 0.208556 112.372881
050 -0.104061 0.158573 112.176238
060 -0.735354 0.144351 112.148590
070 -0.712112 0.151505 111.973514
080 -0.336932 0.215719 113.076807
090 -0.698181 0.130189 111.839319
010 0.068357 -0.019388 114.346421
011 0.022007 0.165824 112.381444
后来我想应用高斯函数来发展这种归一化。
答案 0 :(得分:0)
也许尝试一次更改一列:
for i in main_data:
min_val = df[i].min()
max_val = df[i].max()
if 'C' in i:
#Applying normalization for C between [-40,+150]
new_value = normalize(df[i].values, min_val, max_val, -40, 150)
else:
#Applying normalization for A , B between [-1,+1]
new_value = normalize(df[i].values, min_val, max_val, -1, 1)
df_norm[i] = new_value
# df_norm = pd.df(new_value)
答案 1 :(得分:0)
这是一个可行的示例,可能有解决方案:
import pandas as pd
import random
import numpy as np
a = list(random.sample(range(0,1000),100))
b = list(random.sample(range(0,1000),100))
c = list(random.sample(range(0,1000),100))
df = pd.DataFrame({'A':a, 'B':b, 'C': c})
my_dct = {'key_a': [],'key_b': [],'key_c': []}
for i in df.columns:
min_val = df[i].min()
max_val = df[i].max()
if i=='C':
#Applying normalization for C between [-40,+150]
my_dct['key_c'] = normalize(df[i].values, min_val, max_val, -40, 150)
elif i=='A':
#Applying normalization for A , B between [-1,+1]
my_dct['key_a'] = normalize(df[i].values, min_val, max_val, -1, 1)
else:
my_dct['key_b'] = normalize(df[i].values, min_val, max_val, -1, 1)
df2 = pd.DataFrame(my_dct)
df2.to_csv('my_file.csv')