Question

我有一个正在重构的应用程序，并试图遵循一些“清洁代码”原则。我有一个应用程序，可以从多个不同的数据源中读取数据，并处理/格式化该数据并将其插入到另一个数据库中。我有一个数据层，其中包含每个数据源的关联的DTO，存储库，接口和助手，以及一个具有匹配的实体，存储库和接口的业务层。

我的问题归结于“导入方法”。我基本上有一种方法可以系统地调用每个业务逻辑方法来读取，处理和保存数据。需要进行很多调用，即使Import方法本身根本不处理数据，但该方法仍然非常庞大。有没有更好的方法来处理这些数据？

ICustomer<Customer> sourceCustomerList = new CustomerRepository();
foreach (Customer customer in sourceCustomerList.GetAllCustomers())
{

   // Read Some Data
   DataObject object1 = iSourceDataType1.GetDataByCustomerID(customer.ID)
   // Format and save the Data
   iTargetDataType1.InsertDataType1(object1)

   // Read Some Data

   // Format the Data

   // Save the Data

   //...Rinse and repeat
}

Answer 1

您应该查看Task Parallel Library (TPL)和Dataflow

ICustomer<Customer> sourceCustomerList = new CustomerRepository();

var customersBuffer = new BufferBlock<Customer>();
var transformBlock = new TransformBlock<Customer, DataObject>(
    customer => iSourceDataType1.GetDataByCustomerID(customer.ID)
);

// Build your block with TransformBlock, ActionBlock, many more... 
customersBuffer.LinkTo(transformBlock);

// Add all the blocks you need here....

// Then feed the first block or use a custom source
foreach (var c in sourceCustomerList.GetAllCustomers())
    customersBuffer.Post(c)
customersBuffer.Complete();

Answer 2

您的性能将受IO限制，尤其是在每次迭代中对数据库进行多次访问时。因此，您需要修改架构以最小化IO。

是否有可能在第一遍将所有记录靠在一起（可能在临时数据库中），然后在数据库中进行记录匹配和格式化，作为第二遍，然后读出并保存到需要的地方成为？

（作为一个附带说明，有时我们会被DDD和OO所迷惑，在DDD和OO中，一切都“需要”成为对象。但这并不总是最好的方法。）

大数据处理方法的最佳设计模式

2 个答案: