Question

我正在尝试使用lambda和map在我的数据框架中创建一个新列。基本上，如果满足标准，则新列将采用A列，而不满足B列。请参阅下面的代码。

    final FileDownloadListener queueTarget = new FileDownloadListener() {
            @Override
            protected void pending(BaseDownloadTask task, int soFarBytes, int totalBytes) {
            }

            @Override
            protected void connected(BaseDownloadTask task, String etag, boolean isContinue, int soFarBytes, int totalBytes) {
            }

            @Override
            protected void progress(BaseDownloadTask task, int soFarBytes, int totalBytes) {
            }

            @Override
            protected void blockComplete(BaseDownloadTask task) {
            }

            @Override
            protected void retry(final BaseDownloadTask task, final Throwable ex, final int retryingTimes, final int soFarBytes) {
            }

            @Override
            protected void completed(BaseDownloadTask task) {
            }

            @Override
            protected void paused(BaseDownloadTask task, int soFarBytes, int totalBytes) {
            }

            @Override
            protected void error(BaseDownloadTask task, Throwable e) {
            }

            @Override
            protected void warn(BaseDownloadTask task) {
            }
        };


final FileDownloadQueueSet queueSet = new FileDownloadQueueSet(downloadListener);

final List<BaseDownloadTask> tasks = new ArrayList<>();
for (int i = 0; i < count; i++) {
     tasks.add(FileDownloader.getImpl().create(Constant.URLS[i]).setTag(i + 1));
}

queueSet.disableCallbackProgressTimes(); // Do not need for each task callback `FileDownloadListener#progress`,
// We just consider which task will complete. so in this way reduce ipc will be effective optimization.

// Each task will auto retry 1 time if download fail.
queueSet.setAutoRetryTimes(1);

if (serial) {
     // Start downloading in serial order.
     queueSet.downloadSequentially(tasks);
     // If your tasks are not a list, invoke such following will more readable:
//      queueSet.downloadSequentially(
//              FileDownloader.getImpl().create(url).setPath(...),
//              FileDownloader.getImpl().create(url).addHeader(...,...),
//              FileDownloader.getImpl().create(url).setPath(...)
//      );
}

if (parallel) {
   // Start parallel download.
   queueSet.downloadTogether(tasks);
   // If your tasks are not a list, invoke such following will more readable:
//    queueSet.downloadTogether(
//            FileDownloader.getImpl().create(url).setPath(...),
//            FileDownloader.getImpl().create(url).setPath(...),
//            FileDownloader.getImpl().create(url).setSyncCallback(true)
//    );
}


queueSet.start();

但是，当我这样做时，该函数会将整个列返回到新列中的每个项目。

在英语中，我将浏览Long列中的每个项目y。如果该项目是> 0然后在“货币”列中取第y个值。否则，请在“开始”列中取yth值。

运行上面的迭代非常慢。还有其他选择吗？

谢谢！詹姆斯

Answer 1

只做

df['LS']=np.where(df.Long>0,df.Currency,df.StartDate)

这是很好的矢量方法。

df.Long.map适用于每一行，但实际返回的是df.State或df.current系列。

另一种方法是考虑：

df.apply(lambda row : row[1] if row[0]>0 else row[2],1)

也适用于df.columns=Index(['Long', 'Currency', 'StartDate', ...])

但它不是一种矢量方法，所以它很慢。（在这种情况下，1000行慢200倍）。

Answer 2

您可以使用where：

执行相同操作

df['LS'] = df['Currency'].where(df['Long']>0,df['StartDate'])

使用Lambda和条件连接

2 个答案: