将大数据集加载到Pandas Python中

时间:2017-06-14 10:06:59

标签: python csv pandas


基本上,我无法将orders.csv加载到Pandas DataFrame中。我想学习将大文件加载到Pandas / Python中的最佳实践。

3 个答案:

答案 0 :(得分:3)


幸运的是,Option Explicit Sub AutoPivot() Dim PvtTbl As PivotTable Dim PvtCache As PivotCache Dim PvtTblName As String Dim pivotTableWs As Worksheet PvtTblName = "pivotTableName" ' set the worksheet object where we will create the Pivot-Table Set pivotTableWs = Sheets.Add(after:=Worksheets("Sheet1")) ' set the Pivot Cache (the Range is static) Set PvtCache = ActiveWorkbook.PivotCaches.Create(SourceType:=xlDatabase, SourceData:="Sheet1!R1C1:R1048576C8") ' create a new Pivot Table in the new created sheet Set PvtTbl = pivotTableWs.PivotTables.Add(PivotCache:=PvtCache, TableDestination:=pivotTableWs.Range("A1"), TableName:=PvtTblName) ' after we set the PvtTbl object, we can easily modifty all it's properties With PvtTbl .ColumnGrand = True .HasAutoFormat = True .DisplayErrorString = False .DisplayNullString = True .EnableDrilldown = True .ErrorString = "" .MergeLabels = False .NullString = "" .PageFieldOrder = 2 .PageFieldWrapCount = 0 .PreserveFormatting = True .RowGrand = True .SaveData = True .PrintTitles = False .RepeatItemsOnEachPrintedPage = True .TotalsAnnotation = False .CompactRowIndent = 1 .InGridDropZones = False .DisplayFieldCaptions = True .DisplayMemberPropertyTooltips = False .DisplayContextTooltips = True .ShowDrillIndicators = True .PrintDrillIndicators = False .AllowMultipleFilters = False .SortUsingCustomLists = True .FieldListSortAscending = False .ShowValuesRow = False .CalculatedMembersInFilters = False .RowAxisLayout xlCompactRow With .PivotCache .RefreshOnFileOpen = False .MissingItemsLimit = xlMissingItemsDefault End With .RepeatAllLabels xlRepeatLabels With .PivotFields("field1") .Orientation = xlRowField .Position = 1 End With .AddDataField .PivotFields("ticketid"), "Count of field1", xlCount With .PivotFields("field2") .Orientation = xlColumnField .Position = 1 End With End With End Sub 方法接受read_csv参数。


注意:通过指定for chunk in pd.read_csv(file.csv, chunksize=somesize): process(chunk) chunksizeread_csv,返回值将是read_table类型的iterable对象:


答案 1 :(得分:0)


答案 2 :(得分:0)
