Wikipedia Android: Scalability

In this essay, we will investigate the scalability of the Wikipedia Android app. Since our project concerns the client application rather than the Wikipedia infrastructure and we described API design in an earlier essay, we will not focus on infrastructure or API scalability. Rather we will examine how the Wikipedia Android app performs on low-resource devices such as entry-level phones and compare that to performance on high-end phones. We will do this by setting up two emulators simulating an entry-level and a high-end Android phone and collecting several performance metrics.

Android Go phones

For a few years, Google has released an Android version specifically built for entry-level phones: Android Go 1. By using an operating system and system apps specifically built for low-resource devices, manufacturers can offer phones for very competitive pricing. This in turn allows people who cannot afford a regular smartphone to still connect to important services and information sources (such as Wikipedia).

To illustrate the type of devices typically running Android Go, we listed some of the most popular Android Go devices below:

  • Nokia 1.3 2: Features a large 5.71” 720p screen with 1GB of RAM and 16GB persistent storage for around €100,-.
  • Alcatel 1 3: Has a 5” 480p screen, 1GB of RAM and 8GB of persistent storage for around €70,-.
  • ZTE Blade L8 4: This phone has a 5” 480p display and contains 1GB of RAM and 16GB persistent storage for around €70,-.

Comparing performance on entry-level and high-end phones

In order to compare the performance and functionality of the Wikipedia Android app across different phones, we set up two virtual devices in Android Studio using the built-in Android Emulator. The specifications of the virtual devices are listed below:

  • Entry-level phone: 5” screen with 960x480 resolution and 1GB of RAM running Android 5.0 (comparable to Android Go phones, according to an Android developer 5).
  • High-end phone: 6.4” screen with 1080x2400 resolution and 8GB of RAM running Android 12.0 (comparable to Google Pixel 6, Google’s latest high-end smartphone).

An important metric for Android applications is cold startup latency: the time it takes an app to complete drawing the first screen after the user clicks the app icon, while the app is not running in the background. It is generally advised to keep this time below 500 ms and have little variation between startup times 6. On both virtual devices, we measured cold startup latency five times. The entry-level device reported startup latencies of 374, 636, 337, 384, and 629 ms (average: 472 ms), and the high-end device reported 2487, 2187, 838, 1594, and 1927 ms (average: 1807 ms). The high-end device reported significantly higher startup times, which was not what we expected. There can be two explanations for this difference. First, emulating the high-end device might be more expensive for the host computer and introduce more latency than emulating the entry-level device. Second, different Android versions could have different startup latencies because of their different architectures.

In another test, we recorded CPU and memory usage of the Wikipedia app while scrolling through the feed for 20 seconds and then opening an article from the feed. We recorded these metrics using the profiler tool that comes with Android Studio. The recorded traces can be seen in the figures below. From top to bottom we can see: screen touches, opened activity (screen), CPU usage, and memory usage.

Figure: Entry-level virtual device profiled while scrolling the feed and opening an article.

Figure: High-end virtual device profiled while scrolling the feed and opening an article.

From the profiler recordings, we can see two interesting phenomena. Both CPU and memory usage are lower on the entry-level virtual device than on the high-end virtual device. This suggests that either the operating system or the app itself accommodates the fewer resources that are available or that there are significant changes between Android versions. Furthermore, we can see that opening an article leads to an increase in memory usage on the entry-level device but not on the high-end device. This might be due to pre-allocating memory on the high-end device since more is available anyway.

Finally, we noticed that the article view doesn’t scale well to low-resolution devices, as can be seen in the screenshots below. Clearly, this is a problem for usage on lower-end devices and should be resolved to achieve better compatibility.

Figure: The article view is not rendered correctly on the low-resolution entry-level device (left) when compared to the high-end device (right).

Use of storage space

Entry-level or older Android phones are infamous for not having sufficient persistent storage to store the apps and content that users want to have on their phones. Therefore it is interesting to see how the Wikipedia app scales regarding storage space requirements. First, we need to account for the space required to store the app itself. On the entry-level device, the app takes up 80 MB of space, on the high-end virtual device however this is only 15MB. Here again, we see the major difference in resource utilization across Android versions.

An important feature of the Wikipedia app is to save articles for offline use. This is convenient when the user will travel to an area with slow or no cellular reception. However, saved articles also take up storage space. To see how storage requirements scale with the number of offline articles, we saved the current top 44 articles listed on the Wikipedia feed. The articles took up a total of 121 MB, amounting to about 121/44 = 2.75 MB per popular article that is saved. The offline articles include the full article text, pictures, and references. There is unfortunately no option for downloading lower-quality or no images to save on storage space.

Network

When it comes to network usage, we should consider two things: internet speed and data usage. While mobile internet bundles in many countries are falling in price 7, there are still many countries both in the developed and developing world where data is expensive 8. This is especially true in countries where the PPP is low, meaning that data bundles are relatively more expensive. For an application like Wikipedia, where the goal is to have free, reliable information for everyone, it is important to always be accessible. If it uses too much data, it would not be worth using for many people across the world, simply because they can not afford to. There is another advantage to optimizing the data usage of your application. 4G and 5G coverage have both been expanding, but do not have full coverage in many countries yet. Even in places where only slower networks are available, the application should still be usable. If less data is being sent, the app also requires less bandwidth.

Data usage

Below is an example where we scrolled for about 10 seconds on the homepage feed. The resulting network usage was about 1.1 MB. A large part of this data was JSON files containing data about the articles being shown, including everything from mobile and desktop links, to extracts, URLs to the edit history, and view counts of recent days. We feel that much of the data that is being sent could be left out or compressed.

Figure: Network usage when scrolling the homepage feed for 10 seconds.

When an article is opened, closed, and then opened again, we can observe the Wikipedia app using caching of an article so that it does not have to be retrieved again. This is beneficial both to the user and Wikipedia since users have to use less data and the Wikipedia infrastructure has to handle no unnecessary duplicate requests.

Figure: Opening an article twice, we can observe that a large portion of the article is cached.

Network speed & availability

Globally mobile internet access and speeds differ drastically. According to the ITU (International Telecommuncation Union) in 2020 65 countries had less than 50 mobile broadband subscriptions per 100 inhabitants 9. Most of these countries are either in Africa or Asia, and this list also includes many island nations. There is a general trend towards more usage though: out of the 199 countries measured, 133 had an increase in their mobile broadband usage. When looking at the internet speeds per country, ITU’s statistics show that the available bandwidth goes as low as 0.4375 KB/s per user. If we take the scrolling example from before, it would take 1100/0.4735 = 2514 seconds to load the same amount of data, over 40 minutes. Taking this into consideration, it would make sense to make any optimization possible, so that Wikipedia remains accessible for all.

Proposals for improving scalability

Rather than suggesting large architectural changes to the Wikipedia Android app, we provide a list of proposals to improve the scalability of the app by increasing its compatibility and accessibility for its global community.

  • The article view is an essential interface of the Wikipedia app and it is important that it scales well on low-resolution devices. The Android developer documentation offers guidelines on ensuring UI compatibility across all devices.
  • Allowing the user to control the storage size of offline articles by exposing an option for lower quality or no pictures could ensure the storage requirements of offline articles scale better for devices that are low on storage space.
  • Similarly, a low-data usage mode could be a welcome addition for users with either slow internet or limited data. In this mode, only crucial information would be sent: no images and only articles a user searches for.
  • Finally the size of network requests can be further decreased by using a binary communication protocol such as ProtocolBuffers rather than JSON to reduce communication overhead.

In this essay, we have analyzed the horizontal scalability of the Wikipedia Android app focussing on compatibility and accessibility for the global Wikipedia community. We analyzed the performance of the application based on emulations of both low and high-end phones, measuring loading times and storage space. Then, we compared the network usage of the application, comparing it to statistics regarding network availability and usage. Finally, we proposed solutions for making the app more accessible, taking into account that the app should run regardless of the economical situation and location of the user.


  1. https://www.android.com/versions/go-edition/ ↩︎

  2. https://tweakers.net/pricewatch/1540228/nokia-13-zwart/specificaties/ ↩︎

  3. https://tweakers.net/pricewatch/1764134/alcatel-1-2021-8gb-opslag-blauw/specificaties/ ↩︎

  4. https://tweakers.net/pricewatch/1493366/zte-blade-l8-zwart/specificaties/ ↩︎

  5. https://www.reddit.com/r/androiddev/comments/81dnt8/ideas_for_best_emulating_an_android_go_device/ ↩︎

  6. https://developer.android.com/studio/profile/measuring-performance ↩︎

  7. https://www.cable.co.uk/broadband/pricing/worldwide-comparison/ ↩︎

  8. https://www.cable.co.uk/mobiles/worldwide-data-pricing/ ↩︎

  9. https://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx ↩︎