Tag: pipeline

Building a Predictive Insights Factory

When trying to transform data into something useful and, in particular, something that is predictive, it might seem as though the process is straight-forward. Basically there should be three steps:

 

A simple approach might involve three basic steps.

A simple approach might involve three basic steps.

 

Simple, right? Well, it is rarely that easy. And even if it were, remember that the end result is driven by the questions we are trying to answer. What if those questions change? What if we need more data? What if we want to change what we are predicting and where we make those predictions? Each of the three basic steps needs to be part of a system that is set up for iterative improvement. At DataInsighter, we use our software development experience to build a stable, maintainable and scalable system that can easily be adapted over time. Consider the process shown below:

 

By dividing each step of the process into smaller parts, each section can be revisited and improved.

By dividing each step of the process into smaller parts, each section can be revisited and improved.

 

Now, we see that the process is made up of many pieces that should be continually improved. Using a combination of off-the-shelf tools and custom software, we want to build an insights factory that we can mold over time. Taking a look more closely, we have expanded on our original three steps.

 

Data Management

Data is cleaned, but then it should be transformed and stored in a way that is easy to share and maintain. After we have our cleaned data, we can look at our goals and decide if the information is available in a form that will make an impact.

 

Data Analysis

When it comes to modeling, rarely do we know exactly which type of algorithm best suits our needs. Prototyping is essential. It too should be focused on the broader goals of the system. Where will the model be deployed? Does it need to generate results in real time or are we analyzing data in batches? What level of performance are we trying to achieve and how can we measure it?

 

Data Reporting

Once we have a model (or models), these are then deployed, monitored and visualized. We focused on the data collected to build the model, but what about the data the model is generating? The impact of the system must be measured and quantified. At this point, we can start asking if we have really achieved our original goal, then improve or extend our system to meet the next challenge.

Predictive modeling is not a single-shot process and any serious system should be built to reflect this. Like any good scientist, we want to generate and test multiple hypotheses and have intermediate results inform our next steps. With the right combination of software engineering, statistics and domain expertise, we can unlock the power of your data.