Written by Juho Kerttula

Fast Data Apps with Streamlit – Case: How Much Will the Pandemic Cost My Company?

Any business organization must constantly respond to changes in their operating environment. Measure the change, predict the implications for you, and adapt – simple as that. In practice, no matter how data-driven you are, measuring the right things in a new situation can be challenging, and often new data sources and approaches are needed. Collecting data from internal and external sources, building decision-support dashboards and machine learning models with customized user interfaces can be a few-month job for a sizable data team! If the change is sudden, you need to react quickly, – but when the impact is uncertain, launching a large-scale development project might be an expensive overreaction. In this post we applied a fast data app prototyping tool, Streamlit, on exploring revenue impact scenarios in the “new normal”.

The Cost of COVID-19

For COVID-19, there are a number of dashboards available for monitoring every aspect of the pandemic. About the most famous one developed at Johns Hopkins, it was particularly impressive how quickly the creators were able to find and aggregate all the data sources when the virus began to attract public interest (they were professionals before it was cool). A global dashboard, however, is hardly actionable for most organizations. How do we gain relevant insight from this abundance of data for, say, a major European retailer?

The publicly available, high-quality data collected for the above mentioned dashboard seems like a good starting point for this purpose as well. We could also utilize regional mobility trends, e.g. by Google (available for the time being) to better understand current consumer behavior. It would be nice to present the interesting parts of the data together with some internal company data, so maybe we should build data warehouse pipelines and modify existing sales dashboards, or make new ones, . and pPerhaps we would also like to have our data scientists exploring and building machine learning models for supporting difficult decisions and automating simple ones with this up-to-date data. On second thought, maybe just look elsewhere, wait out the pandemic, and hope the company is still standing after the dust has settled? With Streamlit, you can deliver all this before the need to commit to any significant investment – no return calculations necessary.

Based on the “new normal” indicated by retail mobility data, and given current COVID-19 case data, epidemiologic SEIR-simulation, and up-to-date sales data, our example retailer can expect roughly a 35 % deficit in brick-and-mortar sales, and a 5% surplus in e-commerce revenue due to the pandemic in the Finnish market in 2020Q2-Q3.

Enter Streamlit – Tool for the Python Data Specialist

Released a few months ago, steamlit.io is an open-source app framework for building an ad-hoc fast data app “in hours, not weeks”. Streamlit essentially turns a Python script into a responsive web app that you might think was custom-built with a modern front-end framework, all with very little added effort. To create an interactive front-end widget for controlling a variable, just wrap it in a Streamlit call. Caching the result of a data pipeline or an expensive computation only takes a Streamlit decorator. Advanced caching and hot reload make Streamlit scripting as fast-paced and addictive as front-end development, even when working with fairly complex data apps. When you need an interactive interface instead of an API, a notebook does not quite cut it, but a custom full-stack web app would be too much, Streamlit might be just what you need. Attractive use cases include interactive data exploreationsrs, quick machine learning app prototyping, and what-if tooling to reduce iteration between business stakeholders and data developers.

To model the sales impact of COVID-19, we can hop on the citizen epidemiologist bandwagon, and implement a simple model to simulate the susceptible, exposed, infected, and recovered (SEIR) fractions of the population in the region of interest. With data on actual confirmed cases available, we can approximate the current date in terms of simulation days, while adjusting the duration and magnitude of social distancing (reduced disease transmission), as appropriate given the emergency measures of the local government at the moment. With regional mobility trends, we can estimate the reduction in consumer retail activity, and adjust the timeframe of our anomalous event (with case-specific start/end definitions) accordingly. Finally, we can fit a time series model to our sales data prior to the pandemic and estimate the revenue impact for the expected duration of the event e.g. separately for retail stores and ecommerce. Here we used Prophet to readily model e.g. seasonalities and national holidays, but in simple cases reasonable estimates could be achieved by only looking at linear trends and constant intervals. With a single pure-Python script, we end up with the estimated sales impact for user-selectable region of interest, adjustable social distancing effects, and lovely browser-UI responsivity due to cached data pipelines and computation – all in a day’s work without even bothering our busy front-end professionals.

Final thoughts on a fast data app

The above impact calculator barely scratches the surface of Streamlit, which works seamlessly with deep learning models as well. To entertain your bored family at home, for instance, add about 30 lines of Streamlit code to a neural style transfer script to magically turn it into a web app on your idle gaming PC personal GPU-workstation that a kid with a phone can use to make family photos look like they were painted by renaissance masters. There are various tutorials and demos available to get started with Streamlit, one of our favorites being the neat GAN face generator that now also runs on “The Kraken” at our office to hopefully cheer up the occasional quarantine-escapee. If you prefer Docker, you can try our image instead with docker run –gpus all -p 8501:8501 ahtonen/demo_face_gan:latest.

These office mates don’t exist.

These office mates don’t exist.

Written by Juho Kerttula. Juho is a Senior Data Scientist at DAIN Studios, based in Helsinki.

 

Featured photo by Ryan Stone.

#Gettingstartedwith is DAIN Studios’ new blog series for those of you that are eager to get acquainted with data and AI.

Curiosity is in our core, and many DAINians are constantly experimenting on things. We owe it to our customers to stay on top of new technologies, and to be honest, being a bit on the nerdy side, it comes quite naturally! Taking time to learn is also something we as employers want to encourage, and make sure some time is reserved each week for sharing knowledge with co-workers.
The blogs and posts we publish with the #gettingstartedwith tag will be more of an introductory level for those of you that want get familiar with the world of data and AI. It may be tips and tricks, showing how to build an API in 60 minutes, or a fun project on experimenting on text analysis for instance. Or we may be sharing links to good reading or watching, e-learning available or people to follow.

If there is something you’d like us to cover, feel free to reach out by email at info@dainstudios.com!