Showing posts with label project management. Show all posts
Showing posts with label project management. Show all posts

Monday, May 25, 2015

Lifecycle of the data analytics project

Data Analytics Project Lifecycle


What is different about data analytics projects


How data analytics projects (those that are related to the building of the models used for predictions, decision making, classification, etc.) are different from traditional software development projects? While final deliverable of the project is still typically some form of automated software system, the project stages are different. 

First of all, there is a new essential role in such projects - Data Scientist, a specialist, who possesses a skillset that is not there in a common software development project team. This person is not a Software Engineer, a Business/System Analyst or a System Architect. This is a professional who can make sense of the data arrays, to apply statistical and mathematical methods to the datasets to identify the hidden relations inside, and finally to validate the candidate models.

As far as the whole project success is greatly dependent on the result of the Data Scientist's work, this fully determines the lifecycle of the project.

Most popular models for data analytics projects lifecycle


There are several existing project lifecycle models for data science/analytics related projects, I see as most significant of them these ones:

1. CRISP-DM (CRoss Industry Standard Process for Data Mining) is widely accepted by some big players like IBM Corporation with its SSPS Modeler product.
2. EMC Data Analytics Lifecycle developed by EMC Corporation.
3. SAS Analytical Lifecycle developed by SAS Institute Inc.

Generic data science related project lifecycle


While the most popular models mentioned above use a bit different terminology and numbers of lifecycle phases proposed are different too, there are big similarities in all of these models. In general the phases can be described as follows:

1. Everything begins with business domain analysis.
2. Datasets accumulated as a result of business operations are being understood and prepared (extracted/transformed/normalized/cleaned-up/etc.).
3. A model, based on the datasets is being planned and built.
4. The model is evaluated/validated (including communication of the results to the upper management as this is a business value validation).
5. Operationalization and deployment of the model, including all required software development.

The lifecycle is iterative, the adjacent phases themselves can make several iterations.

As you can see only the operationalization phase is about software development in its traditional form, while all the preceding phases are related to the data science.

Sunday, May 24, 2015

Why not to use Data Science for advanced project management?

Data science and big data analytics projects

These days there are a lot of on-going data science related projects that are promissing and delivering great benefits for the businesses in different domains (from telco to retail and so on).
However the project management itself is mostly not benefiting from the value of the data the project generates.

Project plan execution data

During the execution of each project plan there is a constantly growing dataset of accumulated project execution day-by-day metrics (these are: current task completion percentage, actual man/hours, velocity, estimated man/hours, current resource allocation/availability, etc.)
Usually there is some more or less static WBS (work breakdown structure) that organizes the units of work identified (tasks) and their functional relationships. The relationships between the tasks and resources assigned to them are tracked also.
There is also even more static calendar data that includes the resources availability with regards to the dates.
Whatever a process methodology (Agile, RUP, etc.) is the key factual underlying metrics and work structure and calendars are the pretty much the same.

Project plan execution with data analytics

In order to support the project management activities it is possible to automatically monitor and support the execution with data analytics technics that are widely considered to be a part of the Data Science methods.
The areas of project management decision-making support can be as follows:
  1. Key project metrics forecasting (with Time Series Analysis, etc.).
  2. Identifying the project data patterns and correlations based on history; proactive alarming using these patterns (Linear Regression, Logistic Regression, etc.)
  3. What-if analysis (Linear Regression, etc.)
I believe this list can be much longer being elaborated/structured sufficiently. So let's make "The shoemaker's son always goes barefoot" irrelevant to the software project development itself :).