Can Machine Learning Manage Data Resource Management Issues within Organizations?
Abstract
Doing business in a digital age creates both opportunities and challenges, with regard to data resource management. While some companies like AirBnB have revolutionized businesses with machine learning technologies and generating stupendous business growth, there are still others who are yet to tap the potential. By conducting a literature review, the report understands that DRM issues exist as (i) data-related challenges, (ii) process-related challenges and (iii) management challenges, with maximum issues happening in the process phase. Machine learning, which is also heavily dependent on efficient data mining, is observed to be useful in addressing DRM in the first two stages. However, further research is required to understand if it can also address the management challenges pertaining to data.
Introduction
Companies today are constantly inundated by Big Data problems – the sheer volumes of data posing a huge challenge for running businesses effectively, the only choice being is to manage and leverage the potential in these large data pool with efficient data mining. Data mining is a process that companies adopt to discover patterns in large data sets, thus turning the raw data into meaningful pieces of information that can grow the business (Twin, 2020). Discovering patterns provides valuable insights on consumer behaviors, helping the company to re-strategize. For data mining to be effective, the data collection must be effective, along with data warehousing and consequent data processing (Twin, 2020). It also helps develop machine learning models, which are increasingly found to improve data resource management (DRM) across organizations. Most DRM issues can be solved with machine learning.
AirBnB, the vacation broker, is one such company boosting its business growth in leaps and bounds with effective data mining and machine learning, thus taking the start-up space by storm. This document endeavors to explore if machine learning can help address DRM issues in organisations by drawing a practical reference to the AirBnB business case. The research is based on an extensive literature review, its following results and discussions.
Overview of the Business Issue
Data is at the heart of AirBnB’s business, without which it is crippled. This is more because AirBnB is entirely an online business thriving on Big Data. Each day, the company creates 20 TB data and archives about 1.4 petabytes of data (DeZyre, 2020). Naturally therefore, managing this huge data pool has always been an issue for the company. Not just customers, every day the company is bombarded with huge data volume also from its hosts, locations, and for the rental demands.
Traditional data warehousing generally generates daily end-of-day totals, but cannot show interim data. This is serious data loss, which an online-based company like AirBnB cannot afford. As a solution, they began effective data mining and consequently developed machine learning models. Taking it another leap forward, AirBnB launched Zipline, its data management platform, which solved its enterprise-level DRM issues for machine learning (Koidan, 2019). [Refer to Appendix for details of the case.]
Before Zipline, AirBnB’s ML team spent about 60% of their time to collect and script transformations for ML functions; with Zipline this effort sees substantial reduction from months to only a few days (Simha & Hoh, 2018). Zipline ensures online-offline data consistencies, data quality, effective data monitoring, improved data search and integration with end-to-end workflow (Koidan, 2019).
Research Approach
Using the AirBnB case as a practical reference, this report aims to find out answers to the following research question:
- Can machine learning manage data resource management issues within organisations?
The research approach to find answers to the above research question is based entirely on literature review. The literature review has a two-pronged approach in this document –
- To gain understanding about the extent to which organisations suffer from DRM issues, and
- To analyse and discuss how companies are using effective data mining and machine learning to solve DRM issues and drive business growth.
Therefore, the literature review will be undertaken in two parts. One, to research and study the extensiveness of DRM issues in today’s organizations, which organizations suffer more from inefficient DRM and how these impact their profitability and business opportunities. Second, to explore more business cases of machine learning application in organisations to solve DRM issues.
The scope of review will not include investigative techniques like focus group discussions, interviews with company management or surveys and questionnaires. Instead, the results and observations will rest totally on the varied literature review – both the primary research and secondary research sources explored online.
Literature Review
A deluge of data is only natural in a digital age. According to one estimate, 2.5 quintillion bytes of data is produced every day from various sources (Mund, 2016). Indeed, data is generated every second, from everywhere and through all kinds of devices. This data deluge often poses significant challenges for organizations in terms of collecting, sorting, managing and interpreting these data to get real value out of it (Mund, 2016). Managing these numerous data challenges is simply referred to as data resource management (IGI Global, n.d.). With effective data resource management or DRM, organizations can “describe, interpret, and forecast and provision economic and business activities” and can also “decide for the next direction” (Saleh, et. al., 2018, p.1383).
To help manage DRM issues, today’s companies deploy varied smart techniques towards data mining and data analytics of huge volumes of data flowing in from various sources. While some organizations rely on business intelligence software (PAT Research, n.d.), others depend on artificial intelligence and machine learning to unlock the value embedded in large data sets (Koidan, 2019). The machine learning algorithms help tap market trends, outliers and business boosters in a prompt and seamless manner, so that organizations save vital time while still being able to derive valuable information from the data deluge (The European Business Review, 2020). According to Saleh, et. al. (2018), the challenge is much higher as 90% of data that is produced every day is unstructured data (generating in the form of images, audios, videos, email messages, etc). This mix of structured and unstructured data leads to DRM issues such as data storage, data mining and data analysis. Established data managing technologies are constantly failing to keep pace with the immense volume of data generated daily. The better the data mining and analysis in an organisation, more informed its decision-making (Chen, et. al., 2013).
These DRM challenges are not industry-specific, although DRM is critical to industries dealing with healthcare, energy, catastrophe forecasts, insurance, economic improvements, manufacturing, banking, etc. (Ramageri & Desai, 2013; Yi, et. al., 2014). There is also significant DRM opportunities and challenges for retailers who are using data mining tools to understand and predict trends, analyse customer behaviours and apply target marketing (Ramageri & Desai, 2013). Most retailers find it hard to identify the appropriate customers for product campaigns. Data mining comes of major use to them. The case of AirBnB, the online vacation rental, also reflects how companies overcome DRM challenges with efficient data mining and machine learning models (Koidan, 2019).
Interestingly, DRM issues existed as long back as in the later part of the 20th century as well. This is evidenced in the work of Rabinovitcg (1999) where the author discussed how a Utah-based department store chain ZCMI undertook data mining initiatives for customer data integration into several merchandising systems and deriving business value. Xu and Cheung (1997) discussed the case of a fund-management firm, LBS Capital Management Inc.1, that used genetic algorithms, neural networks and efficient data systems to handle portfolios to the tune of USD600 million. Data mining technologies have also been used by numerous other companies like the American Greetings, Procter and Gamble, Walmart, Coke, Macys West, Pepsi, Penske Logistics, etc. (Betts, 2002; Ramageri & Desai, 2013).
Data mining has also benefitted healthcare by improving infection control, hospital ranking, identification of high-risk patients, etc. (Biranbaum, 2004). In manufacturing companies, data mining helps predict machine failures, thus saving maintenance costs (Bergmann, 2012). In the financial sector, data mining has been found to prevent credit card frauds, predict market trends, manage successful customer relations, etc. (Preethi & Vijayalakshmi, 2017). Therefore, the potential of data mining has long been explored across sectors, which also indicates that DRM issues existed for decades and technologies to manage these issues are still evolving. The future of DRM seems to rest with machine learning and artificial intelligence that will be fundamentally based on a robust data mining infrastructure.
DRM issues in organisations are varied. Depending on the data lifecycle, Akerkar (2014) and Zicari (2014) categorise these issues into (a) data challenges, (b) process challenges and (c) management challenges.
Sivarajah, et. al. (2017) studied 227 articles on data management issues in companies and observed that data mining and data cleansing (part of process challenges) appear to be most significant DRM issue in organisations today, as about 43% of these articles mentioned the importance of data mining in a world of maximum unstructured data. Unless patterns are discovered in the heavy load of structured and unstructured data, valuable insights will not emerge for any company to take data-driven decisions. However, machine learning is more evolved from data mining – it drives value to another level by learning from the patterns and thus predicting future trends. As Davies (2018) rightly observed, data mining acts as the information source which machine learning banks on. It learns from the trained datasets (established by the data mining process) and predicts the outcomes. The algorithms are repeatedly fed, and then the computational intelligence offers near-perfect predictions that help in important decision-making for the company.
Results
Based on the literature review above, the following observations and findings emerge:
- Big Data is as much a challenge for companies, as it is an opportunity. Therefore, the better managed it is, the more beneficial it is to the company’s bottomline.
- DRM issues are not new and so, data resource management has long existed for decades in organisations across all sectors.
- DRM is more crucial for some sectors like healthcare, financial services, etc. But it has benefitted many other sectors such as retail, manufacturing and travel and tourism.
- DRM issues can pertain to any or all of the three identified categories – data-related, process-related and management-related.
- Most companies face DRM challenges with regard to data mining and data cleansing. It is a mammoth task to discover patterns from the large pool of data flowing in every day, every minute. But, unless data is mined and cleansed well, it cannot generate meaningful insights for developing business strategies.
- 90% of data produced every day is unstructured data, making data mining even more essential.
- Data deluge leads to common DRM issues like data storage, data mining and data analysis. Of these, research found data mining to be of significant importance today.
- Data mining alone is not enough to manage data. Machine learning is the future of data management in organizations.
- Machine learning is not a prerogative of big companies anymore; it is equally accessible to smaller businesses and especially, start-ups are tapping into its potential and getting promising results.
- Machine learning is based on effective data mining. For machine learning to learn from the discovered patterns and build algorithms, the patterns need to be appropriately identified through data mining.
- Machine learning thrives on data and good data. Poor quality data leads to poor machine learning outcomes and faulty predictions.
- Research found that machine learning has been adopted by the majority of companies, with a smaller percentage still awaiting implementation of machine learning.
- The commonest reason for machine learning adoption is business analytics. This means that turning raw data into useful information is the primary driver behind machine learning technology implementation. However, machine learning is not deterministic – it cannot determine outcomes in absolute terms. Rather, it is stochastic, that is, the predictions are pattern-based and not meant to be precise.
- AirBnB’s ground-breaking data management initiatives are not just exemplary, but also indicate how much potential machine learning holds for DRM.
Discussions
It is obvious that no business today can afford to ignore data management. We live in an age of Big Data and without these data (inputs), significant value generation (output) is practically impossible by any company, big or small. Decades back, data resource management challenges seem to be relatively less complex than now. Today, most businesses have a digital footprint, making it normal for data to pour in volumes. Earlier, the major concerns around DRM were storage, and then mining and analysis. Now, the major concern for companies is to scale up with the huge data influx, to take recourse to technologies that can keep pace with the velocity and volume at which data flows in. Data mining alone cannot suffice, companies have begun to recognize that a better DRM future rests with machine learning technologies. However, machine learning is also heavily dependent on efficient data mining. AirBnB’s astronomical growth pivoted around remarkable data mining and consequent machine learning algorithms serves as a perfect business case for the companies who are suffering from DRM challenges and are yet to adopt machine learning.
Suggestions for future work
The literature review indicates the applicability of data mining and machine learning in addressing the data and process challenges of the data lifecycle. Most of the research point to the fact that, in essence, machine learning technologies are capable of managing DRM issues in the first two stages of the data lifecycle, but little is known about its ability to address the management challenges like data governance, data privacy, security, etc. Therefore, it is recommended that further research be conducted to uncover its potential in managing data issues in the last stage of its lifecycle.