Data Analytics: How is it Accomplished?
Updated: May 22, 2019
By Tom Goodwin
As noted in the first blog of this series, data analytics (or analysis) is the process of collecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. The concept is straightforward, but the complexity emerges with the details. Specific questions asked and how data is handled in each of these areas is what makes data analytics useful to an organization.
Data is collected from a variety of sources. It is typically gathered from a company’s operations as well as additional sources such as sensors, (e.g., traffic cameras, satellites, recording devices). It may also be obtained through interviews, downloads from online sources, or reading documentation. The data is the result of customer or operational interactions, collected by the organization’s IT personnel, and is quite often given over to a Business Data Analyst, who processes it for answers to questions and identifying trends.
Once data is collected, it may be incomplete, contain duplicates, or contain errors. Data cleaning is the process of preventing and correcting these issues. The need for it will vary based on the nature of the data and the way it is entered and stored. Common cleaning tasks include record matching, identifying inaccuracy of data, overall quality of existing data, deduplication, and column segmentation.
Data transformation can be divided into the following steps, applicable based on the complexity of the transformation required.
Data discovery - apply profiling tools or manually-written profiling scripts to understand the structure and characteristics of the data for decisions on transformation needs.
Data mapping - define how fields are mapped, modified, joined, filtered, aggregated etc. to produce the final desired output.
Code generation - generate executable code to transform the data based on the desired and defined data mapping rules
Code execution - execute against the data to create the desired output; typically, this code is tightly integrated into the transformation tool.
Data review - ensure the output data meets the transformation requirements.
Modeling and Algorithms
Mathematical formulas or models called algorithms may be applied to the data to identify relationships among the variables, such as correlation or causation. In general terms, models may be developed to evaluate a variable based on other variable(s) in the data, with some residual error depending on model accuracy (i.e., Results = Data + Model + Error).
Activities – High and Low Level
Some analytics are used to explore the data or find high-level trends (e.g. looking at types of injuries/illnesses by patient age, time of year, and gender). In some cases, there may be interest in finding points within a data set to answer specific questions. This is considered lower-level analysis. Such low-level analytic activities include:
The variety data analysis techniques allow for the retrieval of valuable and actionable information for an organization. Understanding the complexity makes it easier to get to that valuable information in an efficient and effective manner. Armed with this information, organizations gain new insights and ideas about future improvements and growth.
How is your organization taking advantage of data analytics? Share your comments with us or send us a message.
About the Author - Tom Goodwin is the Vice President of Marketing at HigherGround. His background in telecommunications and data networking has been augmented with work in data analytics and automated reporting prior to joining HigherGround. Click here for more information on Tom and the rest of the HigherGround team!
HigherGround, Inc. provides best-in-class, reliable data capture and interaction storage solutions that enable clients to easily retrieve critical information. Our interaction recording and incident reconstruction solutions transform data into actionable intelligence, allowing optimization of operations, enhanced performance, and cost reduction.