After developing business understanding and data understanding, the next big objective in the CRISP-DM methodology is to prepare the data for modelling and analysis. This involves selecting, cleaning and transforming the data which will be used for the project. While this isn't flashy work, it typically accounts for 60% to 80% of the effort for a project.
Corporate reporting is a prime candidate for automation if you can clearly explain the process to produce it, and the process remains consistent over time. Automating your reports has many potential benefits, it can save time, reduce errors, and alleviate the boredom caused by performing repetitive tasks.
PowerBI is Microsoft's data exploration and dashboarding tool. While it hasn't risen to desktop prominence like Excel and Outlook have for the majority of knowledge workers, it is an incredibly capable tool which allows you to quickly visualize data from a number of data sources and explore the data using a graphical interface.
Previously we looked at how you can combine R and Markdown to create reports directly from your R scripts, and also how to send email from R using Microsoft Outlook. In this post, we'll take these concepts a step further and look at how we can use R to embed images in email messages or even use Markdown to create entire messages.
Robotic process automation (or RPA) is transforming the way many businesses handle their repetitious, labour intensive tasks such as reporting, making basic decisions, and providing services. Using software these tasks can be automated; reducing the time to complete tasks while also improving their accuracy and consistency. If you want to get started down the RPA path without incurring licensing costs, there are free tools you can start using today.
Having developed business understanding and a deep knowledge of the problem you are trying to solve, the next step in the CRISP-DM framework is to develop that same level of understanding around the data itself. This step isn't analysis, but rather looking at the structure and shape of the data in order to determine what information is available and how to go about building your analysis.
One underappreciated feature in R is the ability to easily create beautiful reports using Markdown. Markdown files contain a combination of code and text, allowing you to write your analysis alongside your code and publish both the analysis document and code in a wide variety of formats with little effort.
As big data transforms our businesses, governments and society, it also presents us with new moral and ethical dilemmas that we need to consider. As is typical with new technology, we often tend to implement first, and consider the ethical issues later. Cathy O'Neil's book Weapons of Math Destruction is an introduction to the ethical issues raised by the widespread use of data to drive decisions in our lives.
Buzz words have the unfortunate tendency to be often used but seldom clearly defined. Today we are going to tackle the popular phrase "big data" and strip it down to a clear definition. Overall the term is fairly self explanatory, it refers to large data sets, but there are 5 defining characteristics specific to big data which differentiate it from the data-sets of yesterday. These 5 characteristics are known as the 5 V's of big data.
When using the CRISP-DM framework, the first step in the data mining process is to develop your business understanding. This stage of the process is about gaining knowledge of the business, the issues they face, opportunities for improvement, their objectives, their constraints and creating your project plan