There has been a lot of discussion recently about self-service business intelligence. In my discussions with CIOs within the #CIOChat, it appears that they connect self-service business intelligence and data lake requirements. This means that effective data lakes are not just about data storage but also about access and end user and data scientist self-exploration of the information contained therein. CIOs in general view these concepts positively and feel that the establishment of these new business intelligence options are to the benefit of the community served by IT. Several #CIOChat members’ positions are summarized below:
“Self Service BI allows our broader community to be engaged in relevant, up to data, analytics based decision making.” Higher Educational CIO
“This is a huge opportunity allowing end-users to gain control of their data plus analyze quickly.” Higher Educational CIO
“Information is the power to provide value back to the company, more people having information can add business value back.” CIO Consultant
CIOs relate this topic to the emergence of “Citizen Data Scientists”. Dr. Kirk Borne, Principal Data Scientist at Booz Allen Hamilton, is credited with being the first to espouse the notion of “Citizen Data Scientist”. According to CIO Isaac Sacolick, Self Service BI and citizen data scientists are, “a key ingredient to becoming a data driven organization.” One reason that CIOs like this notion is because recruiting large numbers of data scientists has proved difficult. Given this, as self-service becomes popular, it is important to consider not only making data available to data scientists of all stripes but also the need to protect this valuable and often embedded sensitive information from internal or external misuse.
The concept of citizen data scientists hinges around the differences between self-service users versus historical business intelligence users. According to Derek Strauss, the former CDO for TD Ameritrade, three personas traditionally made use of business intelligence and data warehousing:
These personas demanded a data warehouse that provided the same thing over and over again. It also by nature tended to contain less the sensitive data. Strauss also identifies two personas that were left out of traditional data warehousing and that he says have become the real internal drivers for self-service BI and data lakes.
Strauss claims that today for the first time we have the technology to addresses the needs of explorers and miners. These two populations are excited because data lakes provide them an opportunity to get fit-for-purpose assets and be directly involved in the data preparation process themselves. These folks do not require data in a fully curated state because they are looking for new patterns and signals that have yet to be hypothesized or discovered – they want to tell a story with data.
So why haven’t organizations embraced more explorers and miners now that they can be served? There are many reasons, but consider the business risk represented by more unauthorized proliferation of data as well as increasing regulatory requirements. Organizations trying to balance autonomy for its new data citizens while ensuring security and governance of information, clearly face a conundrum. Organizations struggle between the extremes of data anarchy and data tyranny when trying to understand how much freedom should be provided to this new emerging group of data citizens. But this struggle is unnecessary – technologies that intelligently understand data can simultaneously enable greater autonomy and control, providing the foundation for data democratization.
Well over 50% of self-service BI and data lake projects are in one way or another focused upon sales and marketing. Clearly, there is a race to drive customer intimacy so protecting data should not impact those trying to use it but business risk is created with a derived total view of customer. Typically, this includes the sensitivity of personally identifiable information (PII), social media data, purchase history, and the list goes on. This derived data is at risk even when the instance is a backroom experiment, as was discovered by Home Depot.
Data needs to be secure and easy to use in order to compliantly inspire consumer trust, discover relationships and apply predictive analytics to drive offers to existing and potential customers. Data scientists in particular want access to raw data without people or processes impeding their ability to reduce the time to new insights. The big challenge, therefore, is providing necessary access to information while maintaining appropriate privacy and confidentiality. Put another way, this needs to be about accelerating the time to value for data so the mantra needs to be, protect and enable.
What is needed to do this well? CIOs say security today needs to be systematic, with the ability to centrally govern data access and enforce protection policies across every location that data flows – at rest, in use, and in motion. They see protecting data itself throughout its lifecycle as essential to the success of self-service business intelligence models, regardless of the nature of data (structured, semi-structured, or unstructured) and irrespective of how it is stored (traditional database, a big data file system, or cloud based BI systems like Amazon RedShift).
So with miners and explorers we have a big opportunity to lower the cost of increasing the business relevance of the things that we measure. By enabling them to safely work together in what will effectively become the front end of business intelligence process, their collective efforts can be something organizations will use for competitive advantage and the benefit of their brand.