A data science life cycle is an iterative set of data science steps you take to deliver a project or analysis. Because every data science project and team are different, every specific data science life cycle is different. However, most data science projects tend to flow through the same general life cycle of data science steps.
We’ll illustrate how a hypothetical project progresses through a typical data science life cycle framework.
Some data science life cycles narrowly focus on just the data, modeling, and assessment steps. Others are more comprehensive and start with business understanding and end with deployment.
And the one we’ll walk through is even more extensive to include operations. It also emphasizes agility more than other life cycles.
This life cycle has five steps:
These are not linear data science steps. You will start with step one and then proceed to step two. However, from there, you should naturally flow among the steps as necessary.
Several small iterative steps are better than a few larger comprehensive phases.
The above generic life cycle is one of the dozens (hundreds?) you can find on-line. We’ll explore some of the more popular ones.
These three classic data mining processes have been thrown under the general umbrella of data science life cycles. All of them hail from the 90s. These tend to be more myopic. Specifically, the KDD Process and SEMMA focus on the data problem and not the business problem. Only CRISP-DM has a deployment phase. None of them have an operations phase.
The below life cycles are more modern approaches that are specific to data science. Like the data mining processes, OSEMN is more focused on the core data problem. Most others, especially Domino’s, tend to focus on the fuller solution.
There are numerous data science life cycles to choose from. Most communicate the same basic steps necessary to deliver a data science project but often have a distinct angle.
The angle of this life cycle stresses the need for agility and the broader data science product life cycle.
Regardless of the life cycle you use, combine it with a collaboration process so that your team can effectively coordinate with each other and stakeholders.
Good luck. This journey is a challenge. But it can be fun. Have a blast in your next data science project!