Special Seminar in CMS
Ehsan Valavi is a Ph.D. candidate in Technology and Operations Management at Harvard Business School. His research interest is at the interface of information systems, operations management, and strategy. He is currently interested in studying the growth of digital firms and the challenges they face in various business areas. His recent research has focused on the scalability of Artificial Intelligence (AI) based solutions and the value of data for digital firms. He completed his undergraduate studies in Electrical Engineering (Telecommunications) at the University of Tehran and has a master's degree in communication systems from the Swiss Federal Institute of Technology at Lausanne (EPFL). He also holds another master's degree in Decision, Risk, and Operations Management from Columbia Business School.
In this presentation, we investigate the effectiveness of time-dependent data in improving the quality of AI-based products and services. Time-dependency means that data loses its relevance to problems over time. This loss causes the algorithm's performance deterioration and, thereby, a decline in created business value. We model time-dependency as a shift in the probability distribution and derive several counter-intuitive results.
We, theoretically, prove that even an infinite amount of data collected over time may have limited relevance for predicting the future, and an algorithm trained on a current dataset of bounded size can attain a similar performance. Moreover, we show that an ideal growth strategy of a firm includes a shift in attention from the stock of available data as the primary value driver to the flow of data as the dataset size grows.
We complement our theoretical results with an experiment. We empirically measure the value loss in text data for the next word prediction task. The empirical measurements confirm the significance of time-dependency and value depreciation in AI-based businesses. For example, after seven years, 100MB of text data becomes as useful as 50MB of current data for the next word prediction task.