如何使用dask枢转非常大的数据框?

时间:2020-09-27 13:36:33

标签: python pandas dask

我有一个这样加载的Dask数据帧:

function u_enqueue(){
    $uri = get_theme_file_uri();


    wp_register_style('u_google_fonts','https://fonts.googleapis.com/css2?family=Montserrat:wght@400;500;600;700&display=swap');
    wp_register_style('u_google_fonts_2','https://fonts.googleapis.com/css2?family=Rubik:wght@500;700;900&display=swap');
    wp_register_style('u_font_icons','https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css');
    wp_register_style('u_style', $uri . '/assets/css/style.css');
    wp_register_style('u_custom', $uri . '/assets/css/custom.css');
    wp_register_style('u_styleSwitcher', $uri . '/assets/css/styleSwitcher.css');
    wp_register_style('u_blue_style', $uri . '/assets/css/skins/blue.css');
    wp_register_style('u_green_style', $uri . '/assets/css/skins/green.css');
    wp_register_style('u_orange_style', $uri . '/assets/css/skins/orange.css');
    wp_register_style('u_pink_style', $uri . '/assets/css/skins/pink.css');
    wp_register_style('u_yellow_style', $uri . '/assets/css/skins/yellow.css');


    wp_enqueue_style('u_google_fonts');
    wp_enqueue_style('u_google_fonts_2');
    wp_enqueue_style('u_font_icons');
    wp_enqueue_style('u_style');
    wp_enqueue_style('u_custom');
    wp_enqueue_style('u_styleSwitcher');
    wp_enqueue_style('u_blue_style');
    wp_enqueue_style('u_green_style');
    wp_enqueue_style('u_orange_style');
    wp_enqueue_style('u_pink_style');
    wp_enqueue_style('u_yellow_style');


    wp_register_script('u_js', $uri . '/assets/js/script.js',[],false,true);
    wp_register_script('u_styleSwitcher', $uri . '/assets/js/styleSwitcher.js',[],false,true);

    wp_enqueue_script('colorpicker');
    wp_enqueue_script('u_js');
    wp_enqueue_script('u_styleSwitcher');
}

我正在尝试使用dask旋转表:

dates_devices = dd.read_csv('data_part*.csv.gz', compression='gzip', blocksize=None) 
dates_devices['cnt'] = 1
dates_devices.astype({'cnt': 'uint8'}).dtypes # make it smaller

“运行”就很好,但是当我在final_table = (dates_devices .categorize(columns=['date']) .pivot_table(index='device', columns='date', values='cnt').fillna(0).astype('uint8')) 中执行隐式计算时,我得到:

dd.to_parquet()

然后我从here拿走了,

MemoryError: Unable to allocate 5.42 GiB for an array with shape (727304656,) and data type uint8

但是内核仍然被杀死。我有32GB的RAM,在Linux Xubuntu上有32GB的交换空间,因此应该可以轻松地放入RAM。有没有办法做到或“测试”为什么我要杀死我的内核?

1 个答案:

答案 0 :(得分:1)

您可以尝试分批编写dask数据帧以克服内存限制:例如:

Javascript Autoclosing Tags

请查看我的操作import { useFilter, generateItems, options } from "./utils"; import { MuuriComponent } from "muuri-react"; import './style.css' function MuuriDemo() { const [items, setItems] = useState(generateItems()); const Item = ({ color, width, height, title, remove }) => { console.log(color); return ( <div className={`item h${height} w${width} ${color}`}> <div className="item-content"> <div className="card"> <div className="card-title">{title}</div> <div className="card-remove"> <i className="material-icons" onMouseDown={remove}> &#xE5CD; </i> </div> </div> </div> </div> ); }; // Children. const children = items.map(({ id, color, title, width, height }) => ( <Item key={id} color={color} title={title} width={width} height={height} /> )); return ( <div> <button onClick={() => setItems(generateItems())}>Generate item</button> <section className="grid-demo"> <MuuriComponent {...options} propsToData={({ color, title }) => ({ color, title })} > {children} </MuuriComponent> </section> </div> ) } export default MuuriDemo``` -您可以对for i in range(final_table.npartitions): partition = final_table.get_partition(i)` 采取类似的方法:https://stackoverflow.com/a/62458085/6366770

相关问题