Parallel Computing in .NET 4.0 - Parallel LINQ


.NET Framework 4.0 introduces Parallel Task Library and Parallel LINQ to Objects. This article is the continuation of Parallel Computing in .NET 4.0 – Task Parallel Library & PLINQ.

Introduction

.NET Framework 4.0 introduces Parallel Task Library and Parallel LINQ to Objects. For more info Parallel Task Library see this article  Parallel Computing in .NET Framework 4.0.

LINQ is introduced in .NET Framework 3.5 which presents a unified model for querying any System.Collections.IEnumerable or System.Collections.Generic.IEnumerable data source in a type-safe way. PLINQ is Parallel implementation of LINQ to Objects. PLINQ makes use of multiple processors by partitioning the data source into segments and querying them individually, each query executed by different processor core making entire query execution faster.

System.Linq.ParallelEnumerable class exposes methods like AsParallel, AsOrdered, WithDegreeOfParallelism  etc which enables parallel execution behavior for data sources.

See example usages below

int[] arr = ... 

arr.AsParallel() 

.Where(x => ExpensiveFilter(x)); 

foreach(var x in q) { 

Console.WriteLine(x); 

}

var evenNums = from num in numbers.AsParallel().AsOrdered()

                       where num % 2 == 0

                       select num;

To turn on ordering, you can either use the AsOrdered() operator to preserve the initial ordering in the input sequence, or use OrderBy() to sort the values.

DegreeOfParallelism allows us to mention the number of threads to work parallel for this task

var query = from item in source.AsParallel().WithDegreeOfParallelism(2)

 where Compute(item) > 42

 select item;

Can combine Parallel and sequential LINQ queries

// Paste into PLINQDataSample class.

static void SequentialDemo()

{

  var orders = GetOrders();

  var query = ( from ord in orders.AsParallel()

     orderby ord.CustomerID

                     select new

 

Details = ord.OrderID,

Date = ord.OrderDate,

Shipped = ord.ShippedDate

}).AsSequential().Take(5);

}


First AsParallel instructs that the operation needs to be executed in parallel and thereafter AsSequential instructs that the execution to take top 5 should be done in sequential.

ForAll operator 

ForAll allows us to perform action on the result as each thread completes. This is useful when we don’t need to preserve order of execution by merging the results back to List (by calling ToList() method).

 
var query = from item in source.AsParallel().WithDegreeOfParallelism(2)
      where Compute(item) > 42
      select item; 
query.ForAll((e) => Console.WriteLine(Comput(e)));

It is important to note that PLINQ uses static partitioning on arrays and other collections that implement the IList<> interface. That means that the array will be statically split into as many partitions as there are cores on the machine. However, if the work distribution varies in the array, static partitioning may lead to load imbalance.


To use on-demand load-balancing partitioning, use Partitioner.Create() method, passing in the true value for the loadBalancing parameter: 


int[] input = ...

Partitioner partitioner = Partitioner.Create(input, true);

var q = partitioner.AsParallel().Select(x => Foo(x)).ToArray();



Related tags

PLINQ, Parallel Programming, .NET Framework 4.0, Parallel Computing