{ RMarkdown Source | Analysis Codebase }

Introduction

The Wikimedia Foundation’s Web team is working on researching and building out improvements to the desktop experience to make Wikimedia wikis more welcoming and to increase the utility amongst readers while maintaining utility for existing editors.

As part of this effort, the Web team deployed a new location of the search bar, moving the search bar to a more prominent location on the top of the page. The team ran an A/B test of the new location from 20 October 2020 through 4 November 2020 to assess the efficacy of this feature. The test included all logged-in users on the early adopter wikis (Basque Wikipedia, French Wikipedia, French Wiktionary, Hebrew Wikipedia, Persian Wikipedia, and Portuguese Wikiversity). In the test, 50% of logged-in users saw the search bar in the new location, while the other 50% continued to see the search bar in the previous location.

The new location (as shown in Figure 1) was also deployed as default for anonymous users on our early adopter wikis, and by preference for all other users.

**Figure 1**: Example of the new location of the search bar on desktop. The search bar was moved to a more prominent location on the top of the desktop page.

Figure 1: Example of the new location of the search bar on desktop. The search bar was moved to a more prominent location on the top of the desktop page.

The primary goal of the AB Test was to test the hypothesis that the group with the search bar in the new location will initiate more search sessions. The target was identified as 2.5% overall increase in search sessions initiated. The other primary questions we wanted to answer are:

  • Which group has a higher rate of search sessions completed? How does this differ per wiki?
  • Have any other interesting search trends emerged?
  • For logged-out users, are there any perceived changes in search behavior before/after the change?

Upon conclusion of the test on 4 November 2020, a total of 61,602 search sessions had been initiated across both groups. You can find more information on this change and other feature deployments on the desktop improvement project page.

Methodology

The AB test was a run on a per wiki basis and users included in the test were randomly assigned to either the control (search header in the old location) or treatment (search header in the new location) using their user ID and received the same treatment the duration of the test.

Data was collected in the SearchSatisfaction event logging table.

See the following Phabricator tickets for further details regarding the instrumentation and implementation of the AB test:

Search Sessions Initiated

A search session begins when a person starts typing in the search widget. We measured the number of unique search sessions started for each search location type and wiki included in the AB Test.

We initally considered using a Welch two-sample t-test to determine if there is statistical difference between the two means; however, since there are only 6 observations (each wiki serves as a single observation in this case), there was not enough evidence to confirm a difference between the number of search sessions initiated between the two groups.

As an alternative, we conducted a pre and post deployment analysis comparing changes in search sessions initiated for the 6 test wikis to a set of wikis not included in the test from 6 October 2020 through 3 November 2020 (two weeks before and after deployment of the test).

We selected similar wiki projects to each of the test wikis for comparison. These wikis were selected based on their size (number of articles) and desktop pageview traffic during the reviewed time period.

Test Wiki Similar Wikis
Basque Wikipedia Slovenian Wikipedia, Belarusian Wikipedia
French Wikipedia German Wikpedia
French Wikitionary Russian Wiktionary, Spanish Wiktionary
Hebrew Wikipedia Catalan Wikipedia, Danish Wikipedia, Romanian Wikipedia
Persian Wikipedia Arabic Wikipedia, Indonesian Wikipedia
Portuguese Wikiversity Japanese Wikiversity

Note: For wikis not included in the AB test, the new search location was displayed for all users that had enabled the new vector skin in their user preferences. In this analysis, we removed these users that had enabled the new search location by only reviewing search sessions initatiated on the old (‘legacy’) vector skin.

Search Sessions Initiated Counts by Search Location and Wiki in AB Test

Table 1: Number of search sessions initiated by search location and wiki during the AB test.
New Search Location (Treatment) Old Search Location (Control) Both Groups
Basque Wikipedia 872 649 1521
Persian Wikipedia 22750 26235 48985
French Wikipedia 1943 1709 3652
French Wikitionary 1047 1394 2441
Hebrew Wikipedia 2527 2457 4984
Portuguese Wikiversity 12 7 19
All 6 wikis 29151 32451 61602
**Figure 2**: Total number of search sessions initiated by search location group and Wiki included in the AB test.

Figure 2: Total number of search sessions initiated by search location group and Wiki included in the AB test.

The total number of distinct search sessions initiated range from 48985 search sessions on French Wikipedia to only 19 search sessions on Portuguese Wikiversity.

The search location with the higher number of search sessions initiated varies for each of the test wikis. Basque Wikipedia, Persian Wikipedia, French Wikitionary, and Portuguese Wikiversity had more search sessions initiated when the search widget was displayed in the new location while French Wikipedia and Hebrew Wikipedia had more search sessions initiated when the search widget was displayed in the old location.

**Figure 3**: The boxplot shows the distribution of the search sessions initiated across all of the test wikis. A logarithmic scale was applied to the number of search sessions initiated (y-axis) to address the large discrepancies between data for each wiki.

Figure 3: The boxplot shows the distribution of the search sessions initiated across all of the test wikis. A logarithmic scale was applied to the number of search sessions initiated (y-axis) to address the large discrepancies between data for each wiki.

The overall distribution between the two test groups are similar. The middle half of the data for each test group has little variability in the search sessions initiated and each has two outliers: French Wikipedia and Portuguese Wikiversity. The new search location has a slightly higher overall number of search sessions initiated compared to the old search location.

Search Sessions Initiated Pre and Post AB Test Deployment

To determine if the search location move had an impact, we calculated the total number of unique search sessions initiated for each Wiki in the AB test and search location group for a two-week period before and after the AB test. We compared these results to search sessions initiated on similar size wikis over the same time period.

Unfortunately, we are unable to identify logged-in or anonymous users prior to 20 October 2020 when theisAnon field was added to the SearchSatisfaction schema and the AB test was started. As a result, we reviewed both logged-in and logged-out users when comparing search sessions initiated pre and post deployment of the AB test; however, only logged-in users were included in the AB test. Logged-out users on the test wikis received the new search location by default.

**Figure 4**: Total daily search sessions initiated before and following deployment of the AB test (5 October 2020 through 4 November 2020.). The figure includes a comparison of overall daily search sessions for Wikis included in the AB test and a similar set of Wikis not included in the AB test. Post deployment search sessions are highlighted in yellow

Figure 4: Total daily search sessions initiated before and following deployment of the AB test (5 October 2020 through 4 November 2020.). The figure includes a comparison of overall daily search sessions for Wikis included in the AB test and a similar set of Wikis not included in the AB test. Post deployment search sessions are highlighted in yellow

Overall, there appears to be no sudden change in daily search sessions initiated following deployment of the AB test on the test wikis.

We fit a linear regression model to correctly infer the impact of the search location move on the number of search sessions initiated and confirm any statistical difference between pre and post deployment data.

**Figure 5**: Search Sessions Initiated Pre AB Test vs Post AB Test by Search Location Group. We applied the log transformation to both the pre and post deployment data to address skew in the data caused by outlier Wikis. The lines represent fitted regression lines to the data for the Wiki Projects Not in the AB Test (red line) and Wiki Projects in the AB test (blue line).

Figure 5: Search Sessions Initiated Pre AB Test vs Post AB Test by Search Location Group. We applied the log transformation to both the pre and post deployment data to address skew in the data caused by outlier Wikis. The lines represent fitted regression lines to the data for the Wiki Projects Not in the AB Test (red line) and Wiki Projects in the AB test (blue line).

As shown in Figure 5, the slopes for each test group are similar since the overall effect of test group is so small. If there had been a bigger positive effect by the AB test on search sessions initiated, we would see a significantly steeper slope for the blue line (Wiki Project in Test) compared to the red line (Wiki Project Not in Test).

We obtained the following estimates fitting the model to the data using a simple linear regression model (lm) function. We applied the log transformation to stabilize variance since the data are highly skewed by the outliers of French Wikipedia and Portuguese Wikiversity. Because the response variable is on the log scale, we then took the multiplicative effect of the treatment, and estimated the 95% confidence interval using the delta method.

  log(post)
Predictors Estimates CI p
(Intercept) -0.06 -0.37 – 0.25 0.699
pre [log] 1.01 0.98 – 1.04 <0.001
test_group
[WikiProjectInTest]
0.02 -0.15 – 0.20 0.771
Observations 16
R2 / R2 adjusted 0.998 / 0.997
Table 2: Parameter estimates obtained by fitting the model to the search sessions initiated data using lm R package.
Estimate SE 2.5 % 97.5 %
exp(test_groupWikiProjectInTest) 1.024328 0.0828304 0.8619834 1.186673

While the estimate of the effect of the new search location is 2.4% increase in search sessions initiated, the 95% confidence interval is [-13.8%, 18.7%], meaning we do not have sufficient evidence to draw definitive conclusions. Note that the effect from the search move is dampened because it was only deployed to 50% of logged-in users as part of the AB test.

Search Sessions Completed

We also calculated the percent of all search sessions in the AB test that included a click to one of the results returned by test wiki and search location. Data was restricted to only sessions that had more than zero results returned to them.

We used the internally-developed Bayesian Categorical Data Analysis (BCDA) (Popov, n.d.) package for Bayesian statistical analysis and confidence intervals.

Clickthroughs by Session

We first reviewed the number of sessions with at least one clickthrough to a provided search result by wiki in the AB test and overall.

Overall

`

Group 1 Group 2 Pr(Success) in Group 1 Pr(Success) in Group 2 Difference Relative Risk Odds Ratio
27875 30959 3.169% (2.967%, 3.380%) 2.890% (2.708%, 3.078%) 0.279% (0.005%, 0.564%) 1.098 (1.002, 1.204) 1.101 (1.002, 1.211)
x
Table 3: Parameter estimates obtained by fitting the model to the search sessions completed data using the BCDA R package.
**Figure 6**: How likely new search location users were to click on a search result at least once in a session across all Wikis in the AB test. A relative risk greater than 1 indicates the new search location group was more likely to complete a session, while a relative risk less than 1 indicates the new search location group was less likely to complete a session

Figure 6: How likely new search location users were to click on a search result at least once in a session across all Wikis in the AB test. A relative risk greater than 1 indicates the new search location group was more likely to complete a session, while a relative risk less than 1 indicates the new search location group was less likely to complete a session

Table 3 and Figure 6 shows the relative risk – how much more likely the new search location group is to click on a search result at least once in a session than the old search location group. Overall, users that saw the new search location are 1.098 times more likely to click on at least 1 search result in a session compared to users that saw the header in the old location with a 95% credible interval of (1.002, 1.204).

By Wiki

**Figure 7**: Search sessions completed by search location by each Wiki included in the AB Test, where a completed search session is defined as a session with at least 1 click to a provided search result.

Figure 7: Search sessions completed by search location by each Wiki included in the AB Test, where a completed search session is defined as a session with at least 1 click to a provided search result.

Table 4: How likely new search location users were to click on a search result at least once in a session by wiki.
Wiki Relative Risk 95% CI
Basque Wikipedia 0.953 (0.927, 0.978)
French Wikipedia 0.996 (0.993, 0.999)
French Wikitionary 1.010 (0.998, 1.022)
Hebrew Wikipedia 0.980 (0.964, 0.995)
Persian Wikipedia 1.029 (1.017, 1.043)
Portuguese Wikiversity 1.037 (0.759, 1.362)
**Figure 8**: How likely new search location users were to click on a search result at least once in a session by wiki. A relative risk greater than 1 indicates the new search location group was more likely to complete a session, while a relative risk less than 1 indicates the new search location group was less likely to complete a session

Figure 8: How likely new search location users were to click on a search result at least once in a session by wiki. A relative risk greater than 1 indicates the new search location group was more likely to complete a session, while a relative risk less than 1 indicates the new search location group was less likely to complete a session

On a per wiki basis, there were more completed search sessions (defined by sessions with at least 1 click to search result) for the new search location group than the old search location group on Persian Wikipedia, French Wiktionary, and Portuguese Wikiversity while the number of completed search sessions were lower on Hebrew Wikipedia, Basque Wikipedia, and French Wikipedia.

Table 4 and Figure 8 shows the relative risk or how much more likely each respective test group is to click on a search result at least once in a session in the new search location (test group) than the old search location (control group). On Persian Wikipedia, users that saw the new search location are about 1.029 times more likely to click on at least one result during a session than users that saw the old search location, with a 95% credible interval of 1.017-1.043. For the other 5 wikis in the test, the 95% credible intervals contain 1 indicating do not have sufficient evidence to draw definitive conclusions for these.

Clickthroughs by Searches

As an alternate measure of user engagement with the provided results, we also reviewed clickthrough rate defined as the number of clicks to a search result divided by the number of searches by wiki and overall.

**Figure 9**: Average number of searches and average clicks per session by search location and Wiki included in the AB Test

Figure 9: Average number of searches and average clicks per session by search location and Wiki included in the AB Test

Users in both the new and the old search location groups have a similar average number of clicks per session, ranging from 0.94 to 1. There is a little more variance in the average number of searches per session for the test wikis and search location groups. Hebrew Wikipedia only averages between 1 to 2 searches per session for both search location groups while the other wikis average searches per session range from 3 to 7 searches per session.

Overall

Table 5: How likely new search location users were to click on a search result during a search across all Wikis in the AB test.
Comparison Relative Risk 95% CI
Test vs Control 0.905 (0.892, 0.919)
**Figure 10**: How likely new search location users were to click on a search result during a search across all Wikis in the AB test. A relative risk greater than 1 indicates the new search location group was more likely to click on a search result during a search, while a relative risk less than 1 indicates the new search location group was less likely to click on a search result during a search

Figure 10: How likely new search location users were to click on a search result during a search across all Wikis in the AB test. A relative risk greater than 1 indicates the new search location group was more likely to click on a search result during a search, while a relative risk less than 1 indicates the new search location group was less likely to click on a search result during a search

Table 5 and Figure 10 shows the relative risk – how much more likely users were to click on a search result during a search. Estimates indicate users are less likely to click on a result during a search overall but we do not have sufficient evidence to draw definitive conclusions.

By Wiki

**Figure 11**: Clickthrough rates for each search location group, split by wiki.

Figure 11: Clickthrough rates for each search location group, split by wiki.

On a per wiki basis, there were more completed searches (defined by total number of clicks over total number of searches) for the new search location than the old search location on Hebrew, Persian, and Portuguese Wikiversity, while the completed search rates were lower on French Wikipedia, Basque Wikipedia, and French Wiktionary.

Table 6: How likely new search location users were to click on a search result during a search by wiki.
Wiki Relative Risk 95% CI
Basque Wikipedia 0.958 (0.868, 1.049)
French Wikipedia 0.910 (0.895, 0.925)
French Wikitionary 0.804 (0.754, 0.852)
Hebrew Wikipedia 1.082 (1.021, 1.146)
Persian Wikipedia 1.009 (0.956, 1.064)
Portuguese Wikiversity 1.206 (0.370, 2.351)
**Figure 12**: How likely new search location users were to click on a search result during a search by wiki. A relative risk greater than 1 indicates the new search location group was more likely to click on a search result during a search, while a relative risk less than 1 indicates the new search location group was less likely to click on a search result during a search

Figure 12: How likely new search location users were to click on a search result during a search by wiki. A relative risk greater than 1 indicates the new search location group was more likely to click on a search result during a search, while a relative risk less than 1 indicates the new search location group was less likely to click on a search result during a search

Table 6 and Figure 12 above shows the relative risk or how much more likely each respective test group is to click results in the new search location (test group) than the old search location (control group). On Hebrew Wikipedia, users that saw the new header location are about 1.082 times more likely to clickthrough during a search than users that saw the old header location, with a 95% credible interval of 1.021-1.145. For the other 5 wikis in the test, we do not have sufficient evidence to infer any impact from the search location move.

References

Mikhail Popov and Os Keyes (2021). wmfdata: R Tools For Wikimedia Foundation’s Analysts And Data Scientists. R package version 0.9.1. https://github.com/wikimedia/wmfdata-r

Popov, Mikhail. n.d. BCDA: Tools for Bayesian Categorical Data Analysis. https://github.com/bearloga/BCDA.

Screenshot by Alex Hollender available on Wikimedia Commons, licensed under CC BY-SA 3.0.