Advanced Analytics
Advanced analytics goes beyond the historical reporting and data aggregation of traditional business intelligence (BI), and uses mathematical, probabilistic, and statistical modeling techniques to enable predictive processing and automated decision making.
Advanced analytics solutions typically involve the following workloads:
- Interactive data exploration and visualization
- Machine Learning model training
- Real-time or batch predictive processing
Most advanced analytics architectures include some or all of the following omponents:
- Data storage. Advanced analytics solutions require data to train machine learning models. Data scientists typically need to explore the data to identify its predictive features and the statistical relationships between them and the values they predict (known as a label). The predicted label can be a quantitative value, like the financial value of something in the future or the duration of a flight delay in minutes. Or it might represent a categorical class, like “true” or “false,” “flight delay” or “no flight delay,” or categories like “low risk,” “medium risk,” or “high risk.”
- Batch processing. To train a machine learning model, you typically need to process a large volume of training data. Training the model can take some time (on the order of minutes to hours). This training can be performed using scripts written in languages such as Python or R, and can be scaled out to reduce training time using distributed processing platforms like Apache Spark hosted in HDInsight or a Docker container.
- Real-time message ingestion. In production, many advanced analytics feed real-time data streams to a predictive model that has been published as a web service. The incoming data stream is typically captured in some form of queue and a stream processing engine pulls the data from this queue and applies the prediction to the input data in near real time.
- Stream processing. Once you have a trained model, prediction (or scoring) is typically a very fast operation (on the order of milliseconds) for a given set of features. After capturing real-time messages, the relevant feature values can be passed to the predictive service to generate a predicted label.
- Analytical data store. In some cases, the predicted label values are written to the analytical data store for reporting and future analysis.
- Analysis and reporting. As the name suggests, advanced analytics solutions usually produce some sort of report or analytical feed that includes predicted data values. Often, predicted label values are used to populate real-time dashboards.
- Orchestration. Although the initial data exploration and modeling is performed interactively by data scientists, many advanced analytics solutions periodically re-train models with new data — continually refining the accuracy of the models. This retraining can be automated using an orchestrated workflow.
We can help you get your data in order and identify what’s most valuable to your business. We’ll start with understanding your business objectives and your vision for growth, underpinned with a solid understanding of your data universe and your existing business processes. And we’ll help you transform your organization into an intelligent enterprise, ready for the future.