Thursday, September 21, 2017

Exception handling of Tasks in .NET

Exception Handling rules


Task Parallel Library (TPL) handles exceptions well.

If a task throws an exception E that goes unhandled:

  • task is terminated
  • E is caught, saved as part of an AggregateException AE, and stored in task object's Exception property.
  • AE is re-thrown when one of the following is called: .Wait, .Result, or .WaitAll

Example with no exception handling:

Task<int> t = Task.Factory.StartNew(code);
int r = t.Result;

Simple example with exception handling:


Task<int> t = Task.Factory.StartNew(code);
try { int r = t.Result; }
catch (AggregateException ae) 
{
Console.WriteLine(ae.InnerException.Message);
}

Working with AggregateException graph (preferred way)

The above handling is not really sufficient. When catching the AggregateException it can be an graph of Exceptions connected via the InnerExceptions property. For example, AggregateException.InnerExceptions can contain another AggregateExceptions or Exceptions. We need to look at the leaves of the graph to get the real exceptions of interest. An easy way to do this is to use the Flatten() method;

Task<int> t = Task.Factory.StartNew(code);
try { int r = t.Result; }
catch (AggregateException ae) 
{
ae = ae.Flatten();
foreach (Exception ex in ae.InnerExceptions)
Console.WriteLine(ex.Message);
}


Example of what happens when an exception is NOT "observed"

Task t = Task.Factory.StartNew(() =>
{
int d = 0;
int answer = 100 / d;
}
);

This will cause the application to crash when garbage collection is executed.

Exception handling Design and the need for observing

It is highly recommended that you "observe" all unhandled exceptions so that when the task is garbage-collected the exception will be re-thrown then. Unfortunately, this is not the ideal place to handle the exception.

To observe you can do it by doing one of the following:
  • call .Wait or touch .Result - the exception is re-thrown at this point
  • call Task.WaitAll - the exception(s) are re-thrown when all have finished
  • touch the task's Exception property after the task has completed
  • subscribe to TaskScheduler.UnobservedTaskException which is particularly useful if a third party block of code throws the exception.
NOTE: Task.WaitAny() does NOT observe the exception.

Wait Example

try { t.Wait(); }
catch (AggregateException ae) { ... }

Exception accessed Example

if (t.Exception != null)
{ ... }

Result accessed Example

try { var r = t.Result; }
catch (AggregateException ae) { ... }

Last resort exception handling

If you don't "observe" an exception you can still handle it by subscribing to the TaskScheduler.UnobservedTaskException. 

Use cases for using this method:

  • speculative tasks that you don't cancel and don't really care about the result once you get one result. If one of those tasks throws an exception you don't want it to be thrown when garbage collection takes place.
  • Using a third party library that you don't trust and don't know if it will throw any exceptions.
You only want to subscribe once to the event. Good places to do this are in the application startup code, a static constructor, etc.

How to subscribe:

Task.Scheduler.UnobservedTaskEvent += new EventHandler<UnobservedTaskExceptionEventArgs>(MyErrorHandler);

Example handling the error

static void MyErrorHandler(object sender, UnobservedTaskExcpetionEventArgs e)
{
Console.WriteLine($"Unobserved error:{e.Exception.Message}");
e.SetObserved();
}

Note the call to e.SetObserved(). This is what tells .NET that we have observed the Exception and now it will not be thrown at garbage collection.

Reference

Content is based on Pluralsight video called Introduction to Async and Parallel Programming in .NET 4 by Dr. Joe Hummel. 

Tuesday, September 12, 2017

Waiting for a task to finish and harvesting Result in .NET 4

The below techniques work for code tasks and facade tasks.

Explicitly Waiting for a single task

Sometime when working with task you need to wait for a computation to complete before you can do something with the result (such as write it to the screen) as shown below.

Task t = Task.Factory.StartNew( /* code */ );
t.Wait();
Console.WriteLine(t.Status);

The result would be one of the following:
  • RanToCompletion
  • Canceled
  • Faulted

Explicitly Waiting for Multiple tasks (Ordered)

decimal min = 0;
decimal max = 0;
Task t_min = Task.Factory.StartNew(() => {min = data.Price.Min();});
Task t_max = Task.Factory.StartNew(() => {max = data.Price.Max();});

t_min.Wait();
t_max.Wait();

Console.WriteLine(min);
Console.WriteLine(max);

Note: You can use Wait() any task as well if they are dependent on each other. This reduces the benefits of parallelism.

Harvesting Result (Implicitly Waiting for Multiple tasks Sequentially)

While there is nothing wrong with explicitly waiting there is a cleaner way to do it and with less code code.

Task<decimal> t_min = Task.Factory.StartNew(() => {return data.Price.Min();});
Task<decimal> t_max = Task.Factory.StartNew(() => {return data.Price.Max();});

Console.WriteLine(t_min.Result);
Console.WriteLine(t_max.Result);

Notice no variables that could have a race condition if more than one task was accessing one of them. Notice no calls to Wait() because Result implicitly calls Wait() before it returns the value.

Explicitly Waiting for All tasks to complete

The above waiting scenario implies that the order that we wait for the tasks to finish matters. If it does not matter and we want to wait for all tasks to complete before continuing then semantically it is better to use WaitAll().

Here is the same code as above, but instead using a WaitAll();

Task<decimal> t_min = Task.Factory.StartNew(() => {return data.Price.Min();});
Task<decimal> t_max = Task.Factory.StartNew(() => {return data.Price.Max();});

Task.WaitAll(new Task[] {t_min, t_max});
Console.WriteLine(t_min.Result);
Console.WriteLine(t_max.Result);

Explicitly Waiting for ANY task to complete

There are scenarios searches that are returning the first response and displaying it and ignoring the other responses that make sense. In this scenario, we don't care which one is first, but we do want to wait for one to complete.

Here is a new example using WaitAny(). It takes a list of tasks as input and returns the index in that array that finished first. That index can be used to get the results of the task that completed first as shown below.

Task<string> t_msn = Task.Factory.StartNew(() => {return SearchMsn;});
Task<string> t_google = Task.Factory.StartNew(() => {return SearchGoogle();});

Task[] tasks = new Task[] {t_msn, t_google};
int firstIndex = Task.WaitAny(tasks);
Task firstTask = tasks[firstIndex];
Console.WriteLine(firstTask.Result);

WaitAllOneByOne pattern

There are scenarios where you want to wait for all tasks to finish, but process results as each one completes. Another way to think of this is, imagine we want to start several tasks and we don't know in what order they will complete, but we want to process them as they are finished. This assumes that we don't need to wait for them all before we start processing the results.

This pattern is useful when:
  • Some tasks may fail - discard / retry
  • Overlap computations with result processing - aka hide latency
There is no built in feature for this and this is more of a pattern that you can implement if this is your scenario. Here is one conceptual implementation.

while (tasks.Count > 0)
{
int taskIndex = Task.WaitAny(tasks.ToArray());

// process tasks using tasks[taskIndex].Result;

tasks.RemoveAt(taskIndex);
}

Notice this is NOT a tight while loop that will spin while the tasks are processing. The Task.WaitAny() blocks execution of the while loop until a task completes then in loops again and starts another Task.WaitAny() and continues this until all tasks have been processed.

Task Composition (One-to-One)

Sometimes we want the completion of one task to trigger the start of another task. This can be done as noted before using simple Wait() in the second task. This is the TPL approach to Wait().

That might look like this

Task t1 = Task.Factory.StartNew(() => { /* code here */ });
Task t2 = Task.Factory.StartNew(() => { t1.Wait(); /* more code here */ });

The above implementation will work, however t2 actually starts BEFORE t1 is finished. This is probably reasonable if we remember to do a t1.Wait() first thing in the code of t2. From an orchestration perspective and responsibility point of view t2 should not be concerned with waiting for t1. This should be at a higher level. This will make testing the code for t2 much easier.

To fill the deficiencies noted above we can use the ContinueWith() method wait to start t2 until after t1 has finished. One added benefit is this allows .NET to optimize the scheduling such that both t1 and t2 would be on the same thread and could actually optimize away the wait.

That code might look like this.

Task t1 = Task.Factory.StartNew(() => { /* code here */ });
Task t2 = T1.ContinueWith((antecedent) => { /* more code here */ });

Note, the parameter antecedent references the task that just finished. This can be used to check the status of the task (t1 in this case), get the result, etc.

For example to get the result from t1 the above code would be changed to look like this:

Task t1 = Task.Factory.StartNew(() => { /* code here */ });
Task t2 = T1.ContinueWith((antecedent) => { var result = antecedent.Result; /* more code here */ });

Task Composition (Many-to-one)

TPL has alternatives for WaitAll() and WaitAny() also. Instead of using WaitAll() and WaitAny(), you can use ContinueWhenAll() and ContinueWhenAny respectvely.

ContinueWhenAll

Using our example for WaitAll() we had:

Task<decimal> t_min = Task.Factory.StartNew(() => {return data.Price.Min();});
Task<decimal> t_max = Task.Factory.StartNew(() => {return data.Price.Max();});

Task.WaitAll(new Task[] {t_min, t_max});
Console.WriteLine(t_min.Result);
Console.WriteLine(t_max.Result);

We could convert that to use ContinueWhenAll() as follows:

Task<decimal> t_min = Task.Factory.StartNew(() => {return data.Price.Min();});
Task<decimal> t_max = Task.Factory.StartNew(() => {return data.Price.Max();});

Task[] tasks = new Task[] {t_min, t_max};

Task.Factory.ContinueWhenAll(tasks, (setOfTasks) => 
{
Console.WriteLine(t_min.Result);
Console.WriteLine(t_max.Result);
});

ContinueWhenAny

Using our example for WaitAny() we had:

Task<string> t_msn = Task.Factory.StartNew(() => {return SearchMsn;});
Task<string> t_google = Task.Factory.StartNew(() => {return SearchGoogle();});

Task[] tasks = new Task[] {t_msn, t_google};
int firstIndex = Task.WaitAny(tasks);
Task firstTask = tasks[firstIndex];
Console.WriteLine(firstTask.Result);

We can convert that to use ContinueWhenAny() as follows:

Task<string> t_msn = Task.Factory.StartNew(() => {return SearchMsn;});
Task<string> t_google = Task.Factory.StartNew(() => {return SearchGoogle();});

Task[] tasks = new Task[] {t_msn, t_google};

Task.Factory.ContinueWhenAny(tasks, (firstTask) =>
{
Console.WriteLine(firstTask.Result);
}

The code is similar to using WaitAll() and WaitAny(), but again a bit easier to test because the responsibility of the orchestration is at a higher level. It also reduces a few lines of code.

Reference

Content is based on Pluralsight video called Introduction to Async and Parallel Programming in .NET 4 by Dr. Joe Hummel. 

Monday, September 11, 2017

Closures and Race Conditions

Closure = code + supporting data environment

int x = 123;

Task. Factory.StartNew( () =>
{
    x = x + 1;
}
);
...
Console.WriteLine(x);

In the example above x is called a closure variable. The compiler has to figure out how to pass the value x to the lambda expression. The compiler does this By Reference, not by value. Think of By Reference as pointers to the same memory space for the shared variable x.

So, everywhere you see x in the above example it all references the same x. This means if x is changed anywhere by any of the code including the one if the separate thread we have a race condition (since the variables are read and written).

The program could either print out 123 or 124 depending on which thread finishes first. The result is not consistent.

This means we need to take steps to make sure only one thread is read or writing the variable at a time.

Behind the scenes

The compiler will generate the following code for the code above


private sealed class c__DisplayClass2
{
public int x;
public void b__0()
{
this.x = this.x + 1;
}
}

cgobj = new c__DisplayClass2();
cgobj.x = 123;
delegate = new Action(cgobj.b__0);
Task.Factory.StartNew(delegate);
...

Console.WriteLine(cgobj.x);

Reference

Content is based on Pluralsight video called Introduction to Async and Parallel Programming in .NET 4 by Dr. Joe Hummel. 

Lambda Expression

With Parameters

A method is a named block of code

public void DoSomething(parameters)
{
    code...
}

Lambda expression = unnamed block of code

They are easy to identify because they have a => in the statement to identify the lambda expression.

(parameters) => 
{
     code...
}

No Parameters

A method with NO parameters would look like this

public void DoSomething()
{
    code...
}

The lambda expression would look like

() => 
{
     code...
}


Behind the scenes

Lambda express = custom class and delegate

The compiler will create a class with an arbitrary name and a method with an arbitrary name and implemented with the same signature as the lambda expression.

Actions

You can think of Actions as pointers to methods.
An Action is an instance of a delegate.
When you see a signature for a method that requires a parameter that is of type Action what it is asking for is a lambda expression.

NOTE: To create an Action you can also use a method as shown below.
Action action = new Action(DoSomething());

Reference

Content is based on Pluralsight video called Introduction to Async and Parallel Programming in .NET 4 by Dr. Joe Hummel. 

Basic overview of Tasks and Task-based Programming in .NET 4+

What and Why

Async Programming - hide latency of potentially long-running or blocking operations (i.e. I/O) by starting them in the background.

Parallel Programming - reduce time of CPU-bound computations by dividing workload & executing simultaneously.

Here we are talking about Tasks and Task Parallel Library (TPL) which gives us:
  • Cancelling
  • Easier Exception Handling
  • Higher-level constructs
  • ...

NOTE: Before this we had
  • Threads
  • Async Programming Model (i.e. async delegate invocation)
  • Event-based Async Pattern (i.e. BrackgroundWorker class)
  • QueueUserWorkItem

Async / Parallel Components of .NET 4

Task Parallel Library (TPL) - library of functions for tasks and the notion of a task
Task Scheduler - responsible for mapping tasks to available worker threads
Resource Manager - manages pool of worker threads
Parallel LINQ (PLINQ) - like link, but runs in Parallel
Concurrent Data Structures - queue, bag, dictionary

Use Cases

Interactive UI
In the UI if there is a non-async chunk of code executing in the UI thread then while the computation is running the UI will be unresponsive. For example you can't drag the window properly or click any other buttons, etc.

Processing requests in parallel such as a website or doing independent computations.

Creating a task

using System.Threading.Tasks;
Task t = new Task( code );
t.Start();

code = computation to perform
Start() = tells .NET that task *can* start, then returns immediately. The program "forks" and now there are two code streams that are executing concurrently (original and T).

Task

A task = object representing an ongoing computation
Provides:

  • Check status
  • wait
  • harvest results
  • store exceptions
Think of a task as an object having the following properties
  • Status
  • Result
  • Exception

Types of Tasks

Code Tasks

Executes given operation.
Example: 
Task t1 = Task.Factory.StartNew(() => { /* code */});

Facade Tasks

Task over existing operation. You use a facade task to provide a common API to the different task technologies that have been used over the years. So, instead of rewriting existing async code you can still benefit from the higher level constructs of a Task based api.
Example: 
var op = new TaskCompletionSource<T>();
Task t2 = op.Task;

Execution model

  • Code-based tasks are executed by a thread on some processor
  • Thread is dedicated to task until task completes.
  • If there is only one core then the threads share the core. This has extra overhead, but allows UI and tasks to be running concurrently. UI isn't intensive so not a problem sharing on thread and switching between the two of them. 
  • Ideally there are multiple cores and each task runs on a different core such that they are running concurrently and in parallel with less overhead.

Actions

When you create a new Task object one of the signatures accepts an Action as the parameter. An easy way to pass your chunk of code as an Action is with a Lambda expression as shown below.

Task t = new Task( () => { many lines of code here} );
t.Start();

Equivalent, but slightly more efficient way of creating the task and starting it in one line:

Task t = Task.Factory.StartNew( () => { many lines of code here } );

The multiple lines of of code will now execute in a background thread.

Reference

Content is based on Pluralsight video called Introduction to Async and Parallel Programming in .NET 4 by Dr. Joe Hummel. 

Friday, September 8, 2017

How to Mock Entity Framework when using Repository pattern and async methods

What we are testing

Imagine you are using the repository pattern or something similar and it is implemented something like this. I like this as the DbContext (InvoiceModel is a subclass of it) is not held for just the duration of this method. This is a very simple example with no parameters or corresponding Where() being used.

public class InvoiceRepository
    {
        public async Task<IEnumerable<Invoice>> GetInvoicesAync()
        {
            using (var ctx = new InvoiceModel())
            {
                return await ctx.Invoices.ToListAsync();
            }
        }
    }

The first challenge and solution

The problem with this that it is not really possible to mock out the Entity Framework because we can't get a reference to the DbContext (InvoiceModel) since it is created here. That is no problem, we can use the Factory pattern to allow us to have the advantage of holding it just for the duration of this method, but also allowing us to mock it. Here is how we might amend the above InvoiceRepository class to use the InvoiceModelFactory might be used.


public class InvoiceRepository
    {
        private IInvoiceModelFactory _invoiceModelFactory;

        public InvoiceRepository(IInvoiceModelFactory invoiceModelFactory)
        {
            _invoiceModelFactory = invoiceModelFactory;
        }


        public async Task<IEnumerable<Invoice>> GetInvoiceAsync()
        {
            using (var ctx = _invoiceModelFactory.Create())
            {
                return await ctx.Invoices.ToListAsync();
            }
        }
    }

What's the factory pattern?

The factory pattern is useful delaying the creation of an object (our InvoiceModel in this case). If we also create an interface for the factory (IInvoiceModelFactory) and also gives us the ability to change the factory in testing to create whatever kind of implementation of IInvoiceModel that we want to.

public interface IInvoiceModelFactory
    {
        InvoiceModel Create();
    }

public class InvoiceModelFactory : IInvoiceModelFactory
    {
        public InvoiceModel Create()
        {
            return new InvoiceModel();
        }
    }

Mock the Entity Framework 

When I say mock the Entity Framework, it really ends up being DbContext which is InvoiceModel in our case. I'm using NSubstitute, but any mocking framework should be able to be used. To help with mocking the entity framework I recommend using EntityFramework.NSubstitute. There are version of it for most mocking frameworks. It provides the implementation of SetupData() below.

NB. If you are using a newer version of NSubstitute than EntityFramework.NSubstitute requires you can get the source and build it yourself. It is really only 6 files.

Helpers methods for improved readability of tests

There are some methods I created to wire up the required mocks and make the tests easier to read.

private IInvoiceModelFactory GetSubstituteInvoiceModelFactory(List<Invoice> data)
{
var context = GetSubstituteContext(data);
var factory = Substitute.For<IInvoiceModelFactory>();
factory.Create().Returns(context);
return factory;
}

private InvoiceModel GetSubstituteContext(List<Invoice> data)
{
var set = GetSubstituteDbSet<Invoice>().SetupData(data);
var context = Substitute.For<InvoiceModel>();
context.Invoices.Returns(set);
return context;
}

private DbSet<TEntity> GetSubstituteDbSet<TEntity>() where TEntity : class
{
return Substitute.For<DbSet<TEntity>, IQueryable<TEntity>, IDbAsyncEnumerable<TEntity>>();
}

Writing the Test

Now it is pretty straight forward to write the actual test. We create the invoice data that would be in the database as a List of Invoices. 

[TestMethod]
public async Task GetInvoicesAsync_ReturnsInvoices()
{
//Arrange
var invoices = new List<Invoice>
{
new Invoice(),
new Invoice()
};

var factory = GetSubstituteInvoiceModelFactory(invoices);
        var invoiceRepo = new InvoiceRepository(factory);

//Act
var result = await invoiceRepo.GetInvoicesAsync();

//Assert
Assert.IsTrue(result.Any());
}

That is it. :)

Friday, September 1, 2017

Testing Private Methods

There are times where it is much easier to test a private method instead of the public method. I could agrue this is a not a good idea, but let's assume this is what we want to do.


Option 1: Reflection

You can access anything via Reflection, but it can be a bit difficult to read, particularly for a test. This is probably the hardest to figure out and I would not recommend it.

Option 2: PrivateObject

You can use PrivateObject to invoke the private method in your unit test. 

The syntax would be something like this:

var calculator = new Calculator();
var privateLogic = new PrivateObject(calculator);
privateLogic.Invoke("Add", 1, 1);

To make this easier and more user friendly you could wrap this logic up into a method. A step further you could create an extension method for Calculator in your test project that has an extension method called Add with the same parameters as the private Add method. Then from the test perspective it would act like the private method is public.


Option 3: ReflectionMagic

Another more intuitive option is ReflectionMagic available on Nuget that uses dynamic objects to expose private bits. It can be used as a syntactically easy way to access most anything you can using Reflection. The upside of this is that you don't need to do anything special to access the private method. Unfortunately there is no compiler or Intellisense to help you with the parameters to pass it, but no special coding is needed.

You could use it something like this:

var calculator = new Calculator();
calculator.AsDynamic().Add(1,1);

This is so easy, and requires no special code it probably is not worth writing the wrapper method as I talked about the the PrivateObject option.

NOTE: I had mixed results with ReflectionMagic, but the same raw source code did work.