Question

我正在寻找使用海洋计数图在一个轴上显示两个不同数据列表的频率分布。我遇到的问题是两个列表都包含唯一元素，因此我不能简单地使用较大列表的轴绘制一个列表。

我尝试使用python的count对象，但是由于python字典是无序的，所以图的轴与图上显示的计数不匹配。

import seaborn as sns


first_list = ["a", "b", "c", "d", "e", "a", "b", "c", "a", "b","n"]
second_list = ["a","b","c","d", "e", "e","d","c","e","q"]


sns.countplot(first_list, color="blue", alpha=.5)
sns.countplot(second_list, color="red",alpha=.5)


plt.show()

上面的代码应显示包含唯一值“ n”和“ q”的频率的图表，但显示的图表的轴仅包含第二个列表中的值。

Answer 1

我认为最好将数据合并到一个传递给seaborn的数据框中，而不要在彼此之上绘制两个图。我在计数时调用了sns.barplot，而不是在原始原始值上使用了sns.countplot。

#convert the lists to series and get the counts
first_list = pd.Series(
    ["a", "b", "c", "d", "e", "a", "b", "c", "a", "b","n"]
).value_counts()

second_list = pd.Series(
    ["a","b","c","d", "e", "e","d","c","e","q"]
).value_counts()

#get the counts as a dataframe
df=pd.concat([first_list,second_list],axis=1)
df.columns=['first','second']

# melt the data frame so it has a "tidy" data format
df=df.reset_index().melt(id_vars=['index'])

df

   index variable  value
0      a    first    3.0
1      b    first    3.0
2      c    first    2.0
3      d    first    1.0
4      e    first    1.0
5      n    first    1.0
6      q    first    NaN
7      a   second    1.0
8      b   second    1.0
9      c   second    2.0
10     d   second    2.0
11     e   second    3.0
12     n   second    NaN
13     q   second    1.0

#plot a bar graph and assign variable to hue
sns.barplot(
    x='index',
    y='value',
    hue='variable',
    data=df,
    palette=['blue','red'],
    alpha=.5,
    dodge=False,
)

plt.show()

Answer 2

我不知道有什么简单的方法可以直接使用海洋计数图，而无需先创建数据框。这是基于numpy和matplotlib基于this example构建的解决方案。我让您检查一下这是否比使用数据框和计数图更有效。

import numpy as np                # v 1.19.2
import matplotlib.pyplot as plt   # v 3.3.2

first_list = ["a", "b", "c", "d", "e", "a", "b", "c", "a", "b", "n"]
second_list = ["a", "b", "c", "d", "e", "e", "d", "c", "e", "q"]

# Create dictionaries from lists with this format: 'letter':count
dict1 = dict(zip(*np.unique(first_list, return_counts=True)))
dict2 = dict(zip(*np.unique(second_list, return_counts=True)))

# Add missing letters with count=0 to each dictionary so that keys in
# each dictionary are identical
only_in_set1 = set(dict1)-set(dict2)
only_in_set2 = set(dict2)-set(dict1)
dict1.update(dict(zip(only_in_set2, [0]*len(only_in_set2))))
dict2.update(dict(zip(only_in_set1, [0]*len(only_in_set1))))

# Sort dictionaries alphabetically
dict1 = dict(sorted(dict1.items()))
dict2 = dict(sorted(dict2.items()))

# Create grouped bar chart
xticks = np.arange(len(dict1))
bar_width = 0.3
fig, ax = plt.subplots(figsize=(9, 5))
ax.bar(xticks-bar_width/2, dict1.values(), bar_width,
       color='blue', alpha=0.5, label='first_list')
ax.bar(xticks+bar_width/2, dict2.values(), bar_width,
       color='red', alpha=0.5, label='second_list')

# Set annotations, x-axis ticks and tick labels
ax.set_ylabel('Counts')
ax.set_title('Letter counts grouped by list')
ax.set_xticks(xticks)
ax.set_xticklabels(dict1.keys())
ax.legend(frameon=False)
plt.show()

grouped_bar_letter_count

制作两个共享同一轴的海洋计数图

2 个答案: