合并两个数据帧pandas

时间:2015-10-04 18:18:26

标签: python pandas

我正在阅读2个dfs:

extra = pd.read_csv('table1.txt', sep = '\s+')
data = pd.read_csv('table2.dat', sep = '\s+')

extra.info()的输出是:

class 'pandas.core.frame.DataFrame'>
Int64Index: 11528 entries, 0 to 11527
Data columns:
a     11528  non-null values
key   11528  non-null values
c     11528  non-null values
d     11528  non-null values
e     11528  non-null values
f     11528  non-null values
g     11528  non-null values
h     11528  non-null values
i     11528  non-null values
j     11528  non-null values
k     11528  non-null values
dtypes: float64(11)None

data.info()的输出是:

class 'pandas.core.frame.DataFrame'>
Int64Index: 11528 entries, 0 to 11527
Data columns:
1      11528  non-null values
2      11528  non-null values
3      11528  non-null values
key    11528  non-null values
5      11528  non-null values
...
79     11528  non-null values
80     11528  non-null values
81     11528  non-null values
dtypes: float64(80), int64(1)None

因此,这两个dfs都有11528 rows,并且它们有一个名为key的公共列

我使用以下方法合并了这两个dfs:

result = pd.merge(data, extra, on='key', sort = False)

result.info()的输出是:

class 'pandas.core.frame.DataFrame'>
Int64Index: 11926 entries, 0 to 11925
Data columns:
1    11926  non-null values
2    11926  non-null values
3    11926  non-null values
key  11926  non-null values
5    11926  non-null values
6    11926  non-null values
...
80   11926  non-null values
81   11926  non-null values
a    11926  non-null values
b    11926  non-null values
...    
j    11926  non-null values
k    11926  non-null values
dtypes: float64(90), int64(1)None

显然有一些错误,因为新合并的df,result11926 rows

有人可以解释一下发生了什么,以及写下这个的正确方法是什么?

谢谢!

实施例

df1 = 1 key 3 4
    1 8 90 5 11
    2 7 60 2 30
    3 3 70 3 26
    4 7 60 2 10

df2 = 5 6 key 7
    1 3 2 90 17
    2 9 3 60 42
    3 6 4 70 17
    4 1 5 60 23

我想要的输出是:

1 key 3 4 5 6 7 
1 8 90 5 11 3 2 17
2 7 60 2 30 9 3 42
3 3 70 3 26 6 4 17
4 7 60 2 10 1 5 23

1 个答案:

答案 0 :(得分:1)

发生了什么?您在一个或两个数据框中都有$(window).on('load', function() { var startpoint = 500, maxheight = 500, img = $('img'), apex = img.height(), scale = maxheight/apex, modern = window.requestAnimationFrame; img.css('transform-origin', '0 0'); scaleIt(0); $(this).scroll(function() { var current = $(this).scrollTop(), active = img.hasClass('zoomed'); if (current > startpoint && current <= startpoint+maxheight) { if (!active) img.addClass('zoomed'); var ratio = (current-startpoint)/maxheight*scale; if (modern) requestAnimationFrame(function() {scaleIt(ratio)}); else scaleIt(ratio); } else if (current <= startpoint && active) { scaleIt(0); img.removeClass('zoomed'); } else if (active) { scaleIt(scale); img.removeClass('zoomed'); } }); function scaleIt(proportion) { var factor = 'scale(' + proportion + ')'; img.css({'-webkit-transform': factor, transform: factor}); } }); 的重复值。因此,如果key中有data 5次,而key1中有extra 2次,则合并时您将有{10}个key1条目关键列上的两个数据帧。

解决这个问题的方法是:

key1