Exploring the Google Play store: App analysis and recommender

Tawney Kirkland
5 min readMar 26, 2021
Photo by Christian Wiediger on Unsplash

Introduction

With over 3 million apps, the Google Play app store is a rich resource for both app developers and users. In the figure below, we see that despite a drop in 2018 as Google removed apps affected with malware,¹ the volume of apps is massive and continues to grow.²

Number of apps in the Google Play store (2015–2020)

Against this backdrop, this project aimed to use data from the store to help app developers understand what contributes to a good rating and to provide a fun recommender engine for users to find new apps.

Methods

For the analysis, I used both quantitative and qualitative methods as well as a range of tools. This is shown in the figure below.

Methods and tools

Data

The sample included data for 22,000 apps which were scraped from the Google Play store, using google-play-scraper. I focused on apps with installs between 1 million and 5 million, to provide insights for new and aspiring developers, as well as to help users find fun, lesser-known apps.

As you can see in the figure below, there is a left skew in the distribution of app ratings, which is the target for the predictive model. As I progressed through the analysis, it was important to keep this in mind to identify and improve where the model was having issues.

Distribution of app ratings

Findings

Topic and sentiment of user reviews

I used non-negative matrix factorization (NMF) to conduct topic modeling on approximately 2 million user reviews. The analysis revealed seven core review topics and the top words within each topic, as shown in the table below.

Topics of user reviews

This part of the analysis was very helpful, as it enabled me to integrate details related to app experience and usability from the perspective of the user.

I then conducted sentiment analysis of the user reviews to provide an additional layer of insight across the seven topics. Below, we see that while the majority of reviews appear to be neutral, the share of negative reviews exceed positive reviews across most of the topics.

Sentiment of topics

Interestingly, the Payments topic has a larger share of negative reviews than Bugs. This suggests an opportunity to improve Payments experience for the apps included in the analysis.

Predicting app ratings

I combined these layers of analysis with the data scraped online to predict average user ratings for each app. The model explains 31% of the variance in user ratings. On average across all apps, its predictions are 0.28 above or below the true app score (the mean absolute error). It performed the best on apps with a higher rating, generally in the range of 3.8 to 4.5 stars. In contrast, it performed less well on the apps with lower average ratings, as these apps were less common, sitting in the left tail of the distribution above.

However, the model identified the strongest features in predicting app ratings. Some of these are shown in the figure below.

Strongest predictors from the regression model

Interestingly, among all of the topics from the topic modeling, the average sentiment of Feature-related reviews was the strongest positive predictor of app ratings. This was followed by a feature I engineered, which I called Top developer. This feature indicates whether the average score across all of the apps from that developer is above or below a threshold.

In contrast, apps in the Dating and Simulation categories are more likely to have lower ratings than apps in other categories, while apps which do not contain ads and are free are also correlated with lower scores.

Developing an app recommender

Finally, I used NMF on app descriptions to develop an app recommender for users. This is shown in the demo below.

Demo of the app recommender developed in Streamlit

The recommender uses cosine similarity to identify apps that are most similar to the keywords provided by the user. For instance, if I type in ‘rainbow princess games,’ the model returns dress up and hair styling games for children.

I can then filter down using the app category to get a list of recommendations more tailored to what I am looking for. These recommendations are fun because they are similar to the search term I provided but also include recommendations that are a bit different and offer the possibility of coming across fun, different apps.

Insights

Findings from this analysis provide important insights for app developers and a fun tool for users.

User sentiment regarding features is the strongest predictor of ratings. Therefore, it is important to ensure that the app addresses its core purpose before focusing on the bells and whistles. In other words, give the user what they are looking for when they download the app.

Notwithstanding this fact, the payments results suggest there is an opportunity to improve payments process and experiences for apps in this analysis.

Apps by Top Developers tend to have better ratings. Therefore, it is important to establish and maintain the brand as users come to trust the quality of the brand and the apps released.

Apps that are free and do not contain ads tend to have lower scores. This may be due to a number of reasons such as limited opportunities to monetize or it may have been developed by an individual who just thought it was fun and wanted to share the app with others. However, serious developers should investigate ads and / or payment models in order to generate revenue that can then be invested into improving the app.

[1] F Lardinois, Google says it removed 700K apps from the Play Store in 2017, up 70% from 2016 (2018), TechCrunch.

[2] Number of available apps at Google Play from 2nd quarter 2015 to 4th quarter 2020 (2021), Statista.

--

--