Monday, March 23, 2015

Make sure my Windows Service is enabled and running

The situation

I recently had a need to keep the ASP.NET State Service (aspnet_state) running. For some reason it was getting disabled and stopped sometimes. I assume a corporate security policy is doing it periodically, but I don't know exactly how or why it happens. However, I do know that every time the ASP.NET State Service goes down so does my web applications on my web farm because they are using the ASP.NET State Service.

The Workaround

Luckily, PowerShell makes it very easy to set the StartupType to Automatic and then start a Windows Service. No need to test state before calling since it seems to only change something if it is needed.

Set-Service -name aspnet_state -StartupType Automatic
Set-Service -name aspnet_state -Status Running

Save the two above lines in a file with any name you like ending in .ps1. I'll use EnsureStateService.ps1 for this example.

Automating the workaround

Every minute I want to execute the above PowerShell file. That way at most I will have 1 minute of downtime (minus the time it takes to enable and start the ASP.NET State Service). To create a scheduled task in Windows Task Scheduler do the following:

1. Create a task
2. Set the task to be executed as the user System.
3. I used Windows Vista, Windows Server 2008 configuration, but I don't think it matters in this case.
4. For the trigger, I set it to run Daily, running every 1 days, Repeated task every 1 minute, and stopped task if running for more than 1 minute as shown below:

5. For the Action, do the following:
For Program/script field enter PowerShell
For Add argument (optional): field enter the path to the powershell script file surrounded by double-quotes. In this example, it is "D:\MDM\EnsureStateService.ps1"

Friday, February 20, 2015

Scrum Activities Overview

Sprint Planning Meeting

First Half

Duration: 2 hours (for a 2 weeks spring), but should be scaled to the length of the sprint. For example, 4 hours for a 4 week sprint.
Frequency: Before starting a sprint
  • Scrum Master
  • Development Team
  • Product Owner
  • Steakholders


  • Product Backlog (user stories) that is prioritized and estimated in terms of hours or difficulty.
Description: Agree on what should be in the sprint

Second Half

Duration: 2 hours (for a 2 weeks spring), but should be scaled to the length of the sprint. For example, 4 hours for a 4 week sprint.
Frequency: Before starting a sprint

  • Scrum Master
  • Development Team
  • Product Owner
  • Steakholders

  • Sprint Backlog that contains development tasks that will fit that the development team thinks can be completed during the sprint
  • Plot ideal burn-down chart
Description: Hashout plan for sprint

Daily Scrum Meeting (aka Stand-up meeting)

Duration: 15 minutes
Frequency: Daily

  • Scrum Master
  • Development Team
  • Product Owner
  • Steakholders
  • Impediments (recorded by Scrum Master).
    NOTE: Not resolved during this meeting.
The scrum master asks each person on the team three questions:
  • What did you accomplish yesterday?
  • What are you working on today?
  • What impediments are in your way?

Daily Updates

Update Sprint Burndown Chart: At the end of each day each developer updates the number of hours remaining for the task s/he is working on.

Sprint Review Meeting

Duration: Max of four hours
Frequency: At the end of the sprint
  • Scrum Master
  • Development Team
  • Product Owner
  • Steakholders
  • Feedback from product owner, steakholders
Demonstration of what was implemented in the spring

Sprint Retrospective Meeting

Duration: Max of 3 hours
Frequency: At the end of the sprint
  • Scrum Master
  • Development Team
  • Product Owner
  • Steakholders
  • Document:
    • What went well during the sprint?
    • What could be improved in the next sprint
  • Improve process as needed
Reflect on the sprint.


Monday, February 2, 2015

Grant Permissions to create, alter, or view definition of a view in SQL Server

Let's assume you have a SQL Server user called reporting_user and it has select only access to some tables. To give that user the ability to create views use the following statements (be sure to change reporting_user to the name of your user).

grant create view to reporting_user
grant select on schema :: dbo to reporting_user
grant view definition on schema :: dbo to reporting_user

Friday, January 16, 2015

HBase Basics

What is Hadoop

HBase is the Hadoop database modeled after Google's Bigtable. It is an Apache Top Level Project. This means it is open source. It is however embraced and supported by IBM, etc. It is used by industry heavy hitters like Facebook, Twitter, etc to access BigData. It is written in Java, but there are other API to access it. It has the following characteristics:

  • Sparse - data is scattered
  • Distributed - spread out over commodity hardware
  • Persistent - data will be saved
  • Multi-dimensional - may be multiple versions of data
  • Sorted Map - need a key to access the data

NoSQL Technology

  • HBase is a NoSQL datastore
  • NoSQL stands for "Not only SQL"
  • Not intended to replace a RDBMS
  • Suited for specific business needs that require
    • Massive scaling to terabytes and petabytes and larger
    • Commodity Hardware used for scaling out solution
    • Not knowing schema upfront

Why HBase

  • HBase CAN replace costly implementations of RDBMS for BigData applications, but is not meant to replace RDBMS entirely because 
    • It doesn't support SQL
    • Not for transactional processing
    • Does not support table joins
  • Horizontal scaling for very large data sets
  • Ability to add commodity hardware without interruption of service
  • Don't know data types in advance. This allows for a flexible schema.
  • Need RANDOM read/write access to BigData. Reads and writes are very quick and efficient.
  • Sharding - sharing the data between nodes
NOTE: Everything is stored as an array of bytes (except timestamp which is stored as a long integer).

HBase vs. RDBMS

Topic HBase RDBMS
Hardware architecture Similar to Hadoop. Clustered commodity hardware. Very affordable. Typically large scalable multi-processor systems. Very expensive.
Typical Database Size Terabytes to Petabytes - hundreds of millions to billions of rows Gigabytes to Terabytes - hundreds of thousands to millions of rows.
Data Layout A sparse, distributed, persistent, multi-dimensional, sorted map. Rows or column oriented
Data Types Bytes only Rich data type support
Transactions ACID support on a single row only Full ACID compliance across rows and tables
Query Language API primitive commands only, unless combined with Hive or other technologies. SQL
Indexes Row-Key only unless combined with other technologies such as Hive or IBM's BigSQL Yes. On one or more columns.
Throughput Millions of queries per second Thousands of queries per second
Fault Tolerance Built into the architecture. Lots of nodes means each is relatively insignificant. No need to worry about individual nodes. Requires configuration of the HW and the RDBMS with the appropriate high availability options. 

Data Representation Example (RDBMS vs HBase)

RDBMS might look something like this
ID (Primary Key) LName FName Password Timestamp
1234 Smith John Hello, world! 20130710
5678 Doe Jane wysiwyg 20120825
5678 Doe Jane wisiwig 20130916
Logical View in HBase
Row-Key Value (Column-Family, Qualifier, Version)
1234 info {'lName': 'Smith', 'fName': 'John' }
pwd {'password': 'Hello, world!' }
5678 info {'lName': 'Doe', 'fName': 'Jane' }
pwd {'password': 'wysiwyg'@ts 20130916,
'password': 'wisiwig'@ts 20120825 }

HBase Physical (How it is stored on disk)

Logical View to Physical View

Let's assume you want to read Row4. You will need data from the both physical files. In the case of CF1, you will get two rows since there are two versions of the data.

HBase Components


  • This is where the rows of a table are stored
  • Each region stores a single column family
  • A table's data is automatically sharded across multiple regions when the data gets too large.

Region Server

  • Contains one or more regions
  • Hosts the tables, performs reads and writes, buffers, etc
  • Client talks directly to the Region Server for their data.


  • Coordinating the Region Servers
  • Detects status of load rebalancing of the Region Servers
  • Assigns Regions to Region Servers
  • Multiple Masters are allowed, but only one is the true master, and the others are only backups.
  • Not part of the read/write path
  • Highly available with ZooKeeper


  • Critical component for HBase
  • Ensures one Master is running
  • Registers Region and Region server
  • Integral part of the fault tolerance on HBase


  • The Hadoop file system is where the data (physical files) are kept


  • The Java client API.
  • You can also use SQL is you use Hive to access your data.
Here is how the components relate to each other.

HBase Shell introduction

Starting HBase Instance

Stopping HBase Instance

Start HBase shell
HBASE_HOME/bin/hbase shell

HBase Shell Commands

See a list of the tables

Create a table
create 'testTable', 'cf'
NOTE: testTable is the name of the table and cf is the name of the column family

Insert data into a table
Insert at rowA, column "cf:columnName" with a value of "val1"
put 'testTable', 'rowA', 'cf:columnName', 'val1'

Retrieve data from a table
Retrieve"rowA"from the table "testTable"
get 'testTable', 'rowA'

Delete data from a table
delete 'testTable', 'rowA', 'cf:columnName', ts1.

Delete a table:
disable 'testTable'
drop 'testTable'

HBase Clients

HBase Shell - you can do the above crud operations using the HBase Shell. However will be limiting for more complicated tasks.
Java - you can do the above crud operations and more using Java. It will be executed as a MapReduce job.

NOTE: Some of this material was copied directly from the BigData University online class Using HBase for Real-time Access to your Big Data - Version 2.If you want hands on labs, more explanation, etc I suggest you check it out since all the information on this post comes from there.

Wednesday, December 17, 2014

Query SQL Server from Powershell using ADO.NET objects

If you want to execute a query against a SQL Server database using Powershell there are a couple of options to consider.

1. SQL Server Invoke-Sqlcmd cmdlet

You can always use the Invoke-Sqlcmd cmdlet, but you will need to have the SQL Server client tools installed, get some paths correct, and add the cmdlet so your script can access it.

2. Use ADO.NET objects.

Since ADO.NET has been part of .NET since the beginning and Powershell can access .NET objects, no additional objects are needed to execute a database query.

Here is a function for reuse.

function Invoke-SQL {
        [string] $dataSource = ".\SQLEXPRESS",
        [string] $username = $(throw "Please specify a username."),
        [string] $password = $(throw "Please specify a password."),
        [string] $database = "master",
        [string] $sqlCommand = $(throw "Please specify a query.")

    $connectionString = "Data Source=$dataSource; " +
            "User Id=$username; Password=$password; " +
            "Initial Catalog=$database"

    $connection = new-object$connectionString)
    $command = new-object$sqlCommand,$connection)

    $adapter = New-Object System.Data.sqlclient.sqlDataAdapter $command
    $dataset = New-Object System.Data.DataSet
    $adapter.Fill($dataSet) | Out-Null



Here is how you call it.
Invoke-SQL "serverHere" "userHere" "pwdHere" "dbHere" "SELECT GETDATE() AS TimeOfQuery;"  

NOTE: The basis of this script is from here. I did extend it a little bit to work with named users instead of using integrated security.

Friday, November 28, 2014

Pig Basics

What is Pig

Pig is a program that reads files written in PigLatin to generate a MapReduce program automatically. It is generally much easier to use Pig than writing your own MapReduce programs. Grunt is the interactive shell for Pig.

Ways to start Pig

Pig in interactive local mode

Runs in single virtual machine
All files are in the local file system
>pig -x local

Pig in Interactive MapReduce mode

Runs in a Hadoop cluster
Is the default mode
>pig -x

Pig executing Pig script file in local mode

Script is written in PigLatin
>pig -x local myscript.pig

Pig executing Pig script file in MapReduce mode

Script is written in PigLatin
>pig myscript.pig

Pig Data

Tuple - like a row in a file, or a row in a table, but not all don't have to have the same number of items. example: They can contain scalar types such as int, chararray, double, etc, or even bags. (Jeff, {apple, orange, pear}). Parenthesis are used to indicate the tuple datatype
Bag - bag of tuples. Curly braces are used to indicate the bag datatype.
Relation - an outer bag. Generally it is what you get back when you filter, group, sort, join, etc data. In terms of a database, it is kind of like a view or result set.

Data TypeDescriptionExample
intsigned 32-bit integer300
longsigned 64-bit integer300L or 300l
float32-bit floating point3.2F, 3.2f, 3.2e2f, 3.2E2F
double64-bit floating point3.2, 3.2e2, 3.2E2
chararraya stringabcde Basically a string
bytearraya blob
tupleordered set of fields (4, Brent, 388.25) Kind of like a row in a database
bagcollection of tuples {(4, Brent, 388.25) (20, Amanda,36.7)}
kind of like multiple rows in a database. Could also be thought of as an array, list, collection, etc
mapset of key value pairs

Pig Latin Basics

Terminate with a semi-colon
/*...*/ commend block
-- single line comment
Names of relations and fields are case sensitive
Function names are case sensitive
keywords such as LOAD, USING, AS, GROUP, BY, etc are NOT case sensitive

Loading Data

A = load '/datadir/datafile' using PigStorage('\t');
NOTE: tab is the default delimiter
NOTE: If the path to a file is a directory then all files in the directory will be loaded

The default is PigStorage, but there is also BinStorage, TextLoader, JsonLoader, and you can code your own loader as well.
You can also define your schema so you can refer to fields by name (f1,f2, f3, etc)
A = load '/datadir/datafile' using PigStorage('\t') as (f1:int, f2:chararray, f3:float);

If you don't specify the schema you need to use the position. For example, $0 is the first position.


Opposite of load.
Formats: PigStorage(), BinStorage(), PigDump(), JsonStorage()
DUMP writes the results to the screen.


Arthmetic: +-/*%?
Boolean: and, or, not
Comparison: ==, !=, <, >, is null, is not null


Parameters can be passed into a pig script via a parameter file or the command line.
Parameters are referenced using $

Relational Operators


Selects tuples from a relation based on some criteria
b = filter data by pubyear == 2014


Sorts a relation on one or more fields
b = order data by author ASC;


Projects fields into a new relation. Under the hood this just is a foreach loop that loops through each of the elements in the data. For example, if you want to only return a subset of the fields. A calculation can also be done here. For example, algebra between fields.


Groups together tuples that have the same group key; the group key can be a single field or multiple fields (enclose multiple fields with parentheses).  The result of a Group is a relation that contains one tuple per group. The tuple has two fields (group and value (a bag with one tuple in it)).

data =
myGroup = group data by f1;



Same as GROUP operator, but by convention used when grouping multiple (up to 127) relations at the same time. Similar results to GROUP except resulting tuple has 1 group field and then one field for each relation we are cogrouping by. So if, we are cogrouping using two relationships then each resulting tuple would  be (group, value for relation1, value for relation2) where relation1 and relation2 would be bags of tuples just like with the GROUP operator.


Allows us to reference a field in a tuple or bag that is outside the scope of the current operator. This can be used with the FOREACH operator.


Removes duplicate tuples found in a relation


Merges the contents of two or relations.
NOTE: The relations do NOT have to have the same number of fields in them like you would in SQL.


Partitions a relation into two or more relations based on some condition


Computes the cross product of two or more relations


Performs a join (equijoin) on two or more relations using one or more common field values. Like a SQL join.

JOIN / OUTER (full, right, left)

Performs a join on two or more relations using one or more common fields. Works like you would expect if you are familiar with SQL outer joins.


Evaluation Functions 


  • COUNT - Counts the number of elements in a bag
  • COUNT_STAR - Computes the number of elements in a bag
  • MAX - Computes the maximum value in a single-column bag
  • MIN - Computes the minimum value in a single-column ba
  • SUM - Computes the sum of the numeric values in a single-column bag
  • AVG - Computes the average of the number values in a single-column bag


  • CONCAT - Concatenates two columns
  • DIFF - Compares two fields in a tuple
  • IsEmpty - Checks if a bag or map is empty
  • SIZE - Computes the number of elements based on any Pig data type
  • TOKENIZE - splits a string and outputs a bag of words

Math Functions

Based on Java Math class
Samples: ABS, CEIL, etc

String Functions

Based on Java String class

Tuple, Bag, and Map Functions

  • TOTUPLE - converts one or more expressions to tuple
  • TOBAG - converts one or more expressions to type bag
  • TOMAP - converts pairs of expressions into a map

External Type Operators

  • MAPREDUCE - Executives native MapReduce jobs inside a Pig script
  • STREAM - Sends data to an external script or program
  • REGISTER - Registers a JAR file so that the UDFs in the file can be used.
  • fs - invokes any FSShell command from within script or the Grunt shell
  • grunt > exec myscript.pig
  • EXPLAIN - displays the execution plan. Used to review the logical, physical, and MapReduce execution plans


Big Latin Basics- great reference
Introduction to PIG at the Big Data University - nice training class for free. Nearly all the information above is from this class. In some cases copied.