Question

我的应用程序存在内存问题，嵌套for循环，我无法弄清楚如何改进它。我已经尝试过使用linq，但我想在内部它是一样的，因为内存泄漏仍然存在。

编辑：正如我所要求的那样，我会提供有关我的问题的更多信息。

我的所有客户（约400,000）都在Lucene文档商店中编入索引。每个客户都可以出现在多个代理商中，其中一些代理商可以在200-300家代理商中退出。

我需要从“全球”客户索引中检索所有客户，并为每个代理商构建一个单独的索引，仅包含他们可以看到的客户。有一些业务规则和安全规则需要应用于每个代理商索引，所以现在，我无法为所有代理商维护单一客户索引。

我的流程如下：

int numDocuments = 400000;

// Get a Lucene Index Searcher from an Index Factory
IndexSearcher searcher = SearcherFactory.Instance.GetSearcher(Enums.CUSTOMER);

// Builds a query that gets everything in the index
Query query = QueryHelper.GetEverythingQuery();
Filter filter = new CachingWrapperFilter(new QueryWrapperFilter(query));

// Sorts by Agency Id
SortField sortField = new SortField("AgencyId, SortField.LONG);
Sort sort = new Sort(sortField);

TopDocs documents = searcher.Search(query, filter, numDocuments, sort);

for (int i = 0; i < numDocuments; i++)
{
     Document document = searcher.Doc(documents.scoreDocs[i].doc);

     // Builds a customer object from the lucene document
     Customer customer = new Customer(document);

     // If this nested loop is removed, the memory doesn't grow
     foreach(Agency agency in customer.Agencies)
     {
          // Gets a writer from a factory for the agency id.
          IndexWriter writer = WriterFactory.Instance.GetWriter(agency.Id);

          // Builds an agency-specific document from the customer
          Document customerDocument = customer.GetAgencyDocument(agency.Id);

          // Adds the document to the agency's lucene index
          writer.AddDocument(customerDocument);
     }
}

编辑：解决方案

问题是我没有重复使用内部循环中“Document”对象的实例，这导致我的服务的内存使用量不雅增长。只需重复使用单个Document实例就可以解决我的问题。

谢谢大家。

Answer 1

我相信这里发生的事情是：

循环中有太多对象创建。如果可能的话，不要在循环中使用new（）关键字。初始化可在循环中重用的对象，并将它们传递给工作。不要在那么多循环中构造新对象，因为垃圾收集将成为一个严重的问题，垃圾收集器可能无法跟上你，并将推迟收集。

如果这是真的，你可以做的第一件事就是尝试强制每个X循环收集垃圾并等待待定的终结器。如果这会带来记忆，你知道这就是问题所在。解决它很容易：只是不要在每次循环迭代时创建新实例。

Answer 2

关键可能是您初始化customers和customer.Agencies的方式。如果可以，请返回类型List和IEnumerable<Customer>，而不是返回IEnumerable<Agency>类型。这可能允许延迟执行发生，这应该消耗更少的内存，但可能会使操作花费更长时间。

另一种选择是批量运行代码，因此请使用上面的代码，但一次批量填充List<Customer> customers，例如，一次填充10,000个。

Answer 3

首先，您应该重复使用传递给Document的{{1}}和Field个实例，以最大限度地减少内存使用量并减轻垃圾收集器的压力。

•重复使用Document和Field实例从Lucene 2.3开始，有新的   setValue（...）方法，允许您更改Field的值。   这允许您在多个添加的实例中重复使用单个Field实例   文件，可以节省大量的GC成本。最好创建一个   单个Document实例，然后向其添加多个Field实例，但是   抓住这些Field实例并通过更改它们来重新使用它们   每个添加文档的值。例如，您可能有一个idField，   bodyField，nameField，storedField1等。添加文档后，   然后直接更改字段值（idField.setValue（...），   等），然后重新添加您的Document实例。

请注意，您无法在Document中重复使用单个Field实例，   并且，在文档之前不应更改字段的值   包含该字段已添加到索引中。

http://wiki.apache.org/lucene-java/ImproveIndexingSpeed