我的Postgres查询如何更快地执行?我可以使用Python来提供更快的迭代吗?

时间:2017-09-15 14:54:35

标签: python postgresql pandas

这是一个由两部分组成的问题。如果你正在检查这个,谢谢你的时间!

  1. 有没有办法让我的查询更快?

    我之前问了一个问题here,最终能够自己解决问题。

    然而,我设计的产生我想要的结果的查询在我的数据库运行时非常慢(25分钟以上),该数据库包含40,000多条记录。

    查询正在服务于它的目的,但我希望你们中的一位能够向我提出如何以更优选的速度执行查询的优秀人才。

    我的查询:

    <ListView.ItemTemplate>
            <DataTemplate>
                <ViewCell>
                    <ViewCell.View>
                        <Grid>
                            <Grid.RowDefinitions>
                                <RowDefinition Height="Auto"></RowDefinition>
                                <RowDefinition Height="Auto"></RowDefinition>
                            </Grid.RowDefinitions>
                            <Grid.ColumnDefinitions>
                                <ColumnDefinition Width="6*"></ColumnDefinition>
                                <ColumnDefinition Width="Auto"></ColumnDefinition>
                            </Grid.ColumnDefinitions>
                            <Label Text="{Binding article_description}"
                                       FontAttributes="Bold" FontSize="13"  Margin="10,5,0,-6" Grid.Row="0" LineBreakMode="NoWrap"/>
                            <Label Text="{Binding dish_name}" 
                                   FontSize="13" Margin="10,0,0,2" Grid.Row="1" Grid.Column="0"/>
                            <Label Grid.Row="0" Grid.Column="0" x:Name="LabelReserved"  Text="{Binding reserved}" IsVisible="false" LineBreakMode="NoWrap"/> 
                            <Switch Grid.Row="0" Grid.RowSpan="2" Grid.Column="1"  HorizontalOptions="Start" VerticalOptions="Center" IsEnabled="False" Toggled="SwitchMenu_OnToggled" >
                                <Switch.Triggers>
                                    <DataTrigger TargetType="Switch" Binding="{Binding Source={x:Reference LabelReserved},
                                   Path=Text.Length}" Value="7">
                                        <Setter Property="IsToggled" Value="true" />
                                    </DataTrigger>
                                </Switch.Triggers>
                            </Switch>
                        </Grid>
                    </ViewCell.View>
                </ViewCell>
            </DataTemplate>
        </ListView.ItemTemplate>
    

    再次,一些示例数据:

    第1行:

    with dupe as (
        select
             json_document->'Firstname'->0->'Content' as first_name,
             json_document->'Lastname'->0->'Content' as last_name,
             identifiers->'RecordID' as record_id
        from (
            select *,  
                   jsonb_array_elements(json_document->'Identifiers') as identifiers
            from staging
        ) sub
        group by record_id, json_document
        order by last_name
    ) 
    
    select * from dupe da where (
      select count(*) from dupe db 
      where db.record_id = da.record_id
    ) > 1;
    

    第2行:

    {
            "Firstname": "Bobb",
            "Lastname": "Smith",
            "Identifiers": [
                {
                    "Content": "123",
                    "RecordID": "123",
                    "SystemID": "Test",
                    "LastUpdated": "2017-09-12T02:23:30.817Z"
                },
                {
                    "Content": "abc",
                    "RecordID": "abc",
                    "SystemID": "Test",
                    "LastUpdated": "2017-09-13T10:10:21.598Z"
                },
                {
                    "Content": "def",
                    "RecordID": "def",
                    "SystemID": "Test",
                    "LastUpdated": "2017-09-13T10:10:21.598Z"
                }
            ]
    }
    
  2. 如果我要将我的查询结果或部分结果引入可以使用Pandas操作它们的Python环境中,我怎样才能迭代查询结果(或者查询)以获得与原始查询相同的最终结果?

    是否有一种更简单的方法,使用Python,以与Postgres相同的方式迭代我的非嵌套json数组?

    例如,执行此查询后:

    {
            "Firstname": "Bob",
            "Lastname": "Smith",
            "Identifiers": [
                {
                    "Content": "abc",
                    "RecordID": "abc",
                    "SystemID": "Test",
                    "LastUpdated": "2017-09-13T10:10:26.020Z"
                }
            ]
    }
    

    如何使用Python / Pandas,我可以获取该查询的结果并执行以下操作:

    select
        json_document->'Firstname'->0->'Content' as first_name,
        json_document->'Lastname'->0->'Content' as last_name,
        identifiers->'RecordID' as record_id
    from (
           select *,  
                  jsonb_array_elements(json_document->'Identifiers') as identifiers
           from staging
         ) sub
    order by last_name;
    

    然后执行等效的

    da = datasets[query_results]  # to equal my dupe da query
    db = datasets[query_results]  # to equal my dupe db query
    
    在Python中

  3. 如果我在这里没有提供足够的信息,我道歉。我是Python新手。非常感谢任何和所有的帮助!谢谢!

2 个答案:

答案 0 :(得分:1)

考虑读取Postgres json列类型的原始未经验证的值,并使用pandas json_normalize()绑定到平面数据帧。从那里使用pandas drop_duplicates

为了演示,下面将您的一个json数据解析为每个对应的 Identifiers 记录的三行数据帧:

import json
import pandas as pd

json_str = '''
{
        "Firstname": "Bobb",
        "Lastname": "Smith",
        "Identifiers": [
            {
                "Content": "123",
                "RecordID": "123",
                "SystemID": "Test",
                "LastUpdated": "2017-09-12T02:23:30.817Z"
            },
            {
                "Content": "abc",
                "RecordID": "abc",
                "SystemID": "Test",
                "LastUpdated": "2017-09-13T10:10:21.598Z"
            },
            {
                "Content": "def",
                "RecordID": "def",
                "SystemID": "Test",
                "LastUpdated": "2017-09-13T10:10:21.598Z"
            }
        ]
}
'''

data = json.loads(json_str)    
df = pd.io.json.json_normalize(data, 'Identifiers', ['Firstname','Lastname'])

print(df)    
#   Content               LastUpdated RecordID SystemID Lastname Firstname
# 0     123  2017-09-12T02:23:30.817Z      123     Test    Smith      Bobb
# 1     abc  2017-09-13T10:10:21.598Z      abc     Test    Smith      Bobb
# 2     def  2017-09-13T10:10:21.598Z      def     Test    Smith      Bobb

对于您的数据库,请考虑使用您的DB-API(例如psycopg2sqlAlchemy进行连接,并相应地将每个json解析为字符串。不可否认,可能还有其他方法可以处理json,如psycopg2 docs中所示,但下面接收数据作为文本并在python端进行解析:

import psycopg2
conn = psycopg2.connect("dbname=test user=postgres")

cur = conn.cursor()    
cur.execute("SELECT json_document::text FROM staging;")

df = pd.io.json.json_normalize([json.loads(row[0]) for row in cur.fetchall()], 
                               'Identifiers', ['Firstname','Lastname'])

df = df.drop_duplicates(['RecordID'])

cur.close()
conn.close()

答案 1 :(得分:1)

尝试以下操作,这会消除您的计数(*),而是使用存在。

main.scss
相关问题