Sunday, May 31, 2015

Risk Identification in Cloud-based Software Development Projects



Risk Management in Software Development Projects


Managing project risks is a usual area of responsibility for any project manager. It makes sense, anyway to recall the main definitions of the risk before discussing any specifics related to the cloud software projects.

According to the ISO 31000:2009, risk is "the effect of uncertainty on objectives".

PMBOK's definition is "Risk is an uncertain event or condition that, if it occurs, has a positive or negative effect on the projects objectives."

So the risk is all about uncertainty and it's effect.

Does cloud nature add more uncertainty to the software development project?


Of course, cloud infrastructure itself has its own associated risks (most discussed are security risks in the public cloud, for example). However, I would like to look into the software development project's risks that are introduced when the software under development is intended to run in the cloud.

First of all there are two basic types of the clouds, risks for the projects relying on them are different let's consider the risks by type.

Risks introduced to the project by the private cloud 


Let's go through the main (in my opinion) uncertainty sources one by one. Each of them can introduce multiple risks.

  1. Low "maturity level" of the in-house cloud infrastructure. How comfortable is the organization with its cloud infrastructure? Is this the first project to really rely on it?
  2. Insufficient cloud skill level of internal IT stuff. Will it be possible to rely on IT specialists to resolve the possible problems?
  3. Unknown/weak SLA. What are the real service levels for the cloud? Will it be enough to address the project goals?
  4. Other cloud tenants' priority within the organization. Will your resource and communication needs  get enough priority for the project to succeed?

Risks introduced to the project by the public cloud 


Let's proceed with the public cloud, I have three specific points (again, each can mean multiple risks caused by it):

  1. External CSP (Cloud Services Provider) dependency. Yes, after all this is a one more 3-rd party you depend upon. Public clouds blackouts are still possible, your link to the CSP is critical too.
  2. Unclear costs, complicated calculations. Modern public clouds (like Amazon Web Services) are notorious for their hard to calculate prices (yet fully transparent post factum), so your budgeting can be not very easy.
  3. Some unexpected limitations can apply. This can be any technical thing like allowed traffic amount for your load testing or CPU steal time for some server instance types.

Summary


Of course the level of the cloud risks is dependent on how confident you are in the cloud-related development skills of your team and the cloud infrastructure you use. Yet it looks like for the private cloud risks are mostly dependant on the level of the cloud adoption/maturity within the organization, so if this is not the first project relying on the same private cloud, chances are much higher. For the public cloud experience matters too, but part of the risks is fully external.

As soon as we can see the cloud uncertainty sources (to identify the cloud development risks), we can come up with the plan how to mitigate them. The risk management plan should address those cloud risks and further they should be monitored and kept under control.

Wednesday, May 27, 2015

Cloud-related projects: when your backend is really based on cloud services?


A person in charge of the project leadership and finally project success should understand the meaning of cloud-based development project, as this is a really big point (although this game changer is already mature enough).

Cloud-as-a-Buzzword


Due to the big hype for the last 5-7 years, we encounter the word "cloud" and projects related to "cloud services" development very often. In fact, big deal of such cloud-based projects are not about cloud at all. 

Sometimes backend is called "cloud services" just for the marketing purposes (yes, cloud is still cool), but this can be also a lack of understanding of what the "real cloud" means. You can meet a lot of developers (even experienced enough) that can say something like "cloud is actually nothing really different from the traditional client/server with backend in some virtualized infrastrucure (or even just Internet)". It is not like this, however.

When you really develop cloud services and the project is about cloud?


First of all, to be clear, by the "cloud services" I mean not the services CSP (Cloud Service Provider, like Amazon Web Services) provides to the consumers, but the services that represent the back-end of the system which really benefits from the cloud-based architecture. 

Does this mean that any services, being deployed to the public cloud (AWS, Microsoft Azure, etc.) or private cloud (based on OpenStack, etc.) are automatically becoming cloud services? Well, no. Only those services that can really use the cloud's nature are cloud services.

What is the cloud? I think NIST (National Institute of Standards and Technology) gives very good definition for this type of the infrastructure. According to this widely accepted definition. these are the essential characteristics of the cloud infrastructure (with my explanations):


  1. On-demand self-service. This means the cloud consumer should be able to perform all cloud provisioning actions without any help from cloud provider. This can be fully automated with different APIs or done manually through "service catalogue" (usually Web-based user interface).
  2. Broad network access. Stays for ability to consume the software services in the cloud using different types of clients through standard platform-independent protocols (it is not about broadband connection).
  3. Resource pooling. This is related to the fact that cloud infrastructure is shared among different "tenants" to optimize the utilization. This makes it efficient and usually is a big supporting point for the switch to the cloud infrastructure.
  4. Rapid elasticity. Truly great ability of the cloud to quickly grow the amount of provisioned resources for "tenants" when needed and to contract them when not needed any more.
  5. Measured service. This means in the cloud all resources are "commodities" that are standardized and consumption is measured (like electricity consumption).


Most significant enablers for the services developers there are "rapid elasticity", "broad network access" and "on-demand self-service", to my mind.

OK, how to use this knowledge?


It is important that the cloud is not just virtualized infrastructure, not just virtual machines, etc. This is highly automated, elastic, controllable, efficient environment that should be employed in a right way to benefit from it.

You'd better ask your System Architect, Lead Developer/Expert or other person in charge of the technical decisions in your team how your backend services are compatible with cloud-deployment scenarios, how they address scaling in the cloud, what are the interfaces to them that enable services consumption, how is cloud-backed automation levereged, and so on. 

Cloud services can make a real difference for the system/product you develop (especially in terms of scalability and automation) if this is really about using the features of the cloud infrastructure.



Monday, May 25, 2015

Lifecycle of the data analytics project

Data Analytics Project Lifecycle


What is different about data analytics projects


How data analytics projects (those that are related to the building of the models used for predictions, decision making, classification, etc.) are different from traditional software development projects? While final deliverable of the project is still typically some form of automated software system, the project stages are different. 

First of all, there is a new essential role in such projects - Data Scientist, a specialist, who possesses a skillset that is not there in a common software development project team. This person is not a Software Engineer, a Business/System Analyst or a System Architect. This is a professional who can make sense of the data arrays, to apply statistical and mathematical methods to the datasets to identify the hidden relations inside, and finally to validate the candidate models.

As far as the whole project success is greatly dependent on the result of the Data Scientist's work, this fully determines the lifecycle of the project.

Most popular models for data analytics projects lifecycle


There are several existing project lifecycle models for data science/analytics related projects, I see as most significant of them these ones:

1. CRISP-DM (CRoss Industry Standard Process for Data Mining) is widely accepted by some big players like IBM Corporation with its SSPS Modeler product.
2. EMC Data Analytics Lifecycle developed by EMC Corporation.
3. SAS Analytical Lifecycle developed by SAS Institute Inc.

Generic data science related project lifecycle


While the most popular models mentioned above use a bit different terminology and numbers of lifecycle phases proposed are different too, there are big similarities in all of these models. In general the phases can be described as follows:

1. Everything begins with business domain analysis.
2. Datasets accumulated as a result of business operations are being understood and prepared (extracted/transformed/normalized/cleaned-up/etc.).
3. A model, based on the datasets is being planned and built.
4. The model is evaluated/validated (including communication of the results to the upper management as this is a business value validation).
5. Operationalization and deployment of the model, including all required software development.

The lifecycle is iterative, the adjacent phases themselves can make several iterations.

As you can see only the operationalization phase is about software development in its traditional form, while all the preceding phases are related to the data science.

Sunday, May 24, 2015

Why not to use Data Science for advanced project management?

Data science and big data analytics projects

These days there are a lot of on-going data science related projects that are promissing and delivering great benefits for the businesses in different domains (from telco to retail and so on).
However the project management itself is mostly not benefiting from the value of the data the project generates.

Project plan execution data

During the execution of each project plan there is a constantly growing dataset of accumulated project execution day-by-day metrics (these are: current task completion percentage, actual man/hours, velocity, estimated man/hours, current resource allocation/availability, etc.)
Usually there is some more or less static WBS (work breakdown structure) that organizes the units of work identified (tasks) and their functional relationships. The relationships between the tasks and resources assigned to them are tracked also.
There is also even more static calendar data that includes the resources availability with regards to the dates.
Whatever a process methodology (Agile, RUP, etc.) is the key factual underlying metrics and work structure and calendars are the pretty much the same.

Project plan execution with data analytics

In order to support the project management activities it is possible to automatically monitor and support the execution with data analytics technics that are widely considered to be a part of the Data Science methods.
The areas of project management decision-making support can be as follows:
  1. Key project metrics forecasting (with Time Series Analysis, etc.).
  2. Identifying the project data patterns and correlations based on history; proactive alarming using these patterns (Linear Regression, Logistic Regression, etc.)
  3. What-if analysis (Linear Regression, etc.)
I believe this list can be much longer being elaborated/structured sufficiently. So let's make "The shoemaker's son always goes barefoot" irrelevant to the software project development itself :).