{ RMarkdown Source | Analysis Codebase }

Introduction

The Wikimedia Foundation’s Editing team is working to improve how contributors communicate on Wikipedia using talk pages through a series of incremental improvements that will be released over time.

As part of this effort, the Editing team introduced a new workflow for replying to specific comments with the intention of making participating productively on talk pages easier and more intuitive. The reply tool is an extra button that appears at the end of a post on a talk page (as shown in the screenshot below). When you click on it, it opens a reply form that automatically signs and indents wikitext talk page comments and offers a quick way for pinging other users among other features 1.

**Figure 1**: Example of the reply tool. The reply tool is an extra button that appears at the end of a post on a talk page.

Figure 1: Example of the reply tool. The reply tool is an extra button that appears at the end of a post on a talk page.

The team ran an AB test of the Reply Tool from 11 February 2021 through 10 March 2021 2 to assess the efficacy of this new feature, specifically on Junior Contributors (defined as having under 100 cumulative edits). The test included logged-in users that have not previously interacted with the reply tool (defined as users whose discussiontools-editmode preference is empty) and viewed one of the 22 participating Wikipedias (see full list of participating Wikipedias) during the duration of the AB test. During this test, 50% of users included in the test had the Reply tool automatically enabled, and 50% did not. Users at these Wikipedias were still able to turn the tool on or off the tool in Special:Preferences.

You can find more information about features of this tool and project updates on the project page.

Purpose

The primary goal of the AB test was to test the hypothesis that using the reply tool will increase the likelihood of a Junior Contributor publishing a comment they start without a significant increase in disruption.

For this analysis, our key performance indicator to evaluate the impact of the tool is comment completion rate, which we define as:

Of the contributors that opened the Reply tool or page editing to make a comment, the percent of contributors that successfully published at least one comment on a talk page

We assessed disruption by looking at the percent of comments made to talk pages that are reverted within 48 hours and the percent of contributors who are blocked after making an edit to a talk page.

The results of this analysis will be used to determine if the reply tool should be deployed as default to all Wiki projects, as an opt-out user preference. Please see further details about the hypotheses, key performance indicators, and decision scenarios in the task description.

Methodology

The AB test was run on a per Wikipedia basis and contributors included in the test were randomly assigned to either the control (reply tool disabled by default) or treatment (reply tool enabled by default) based on their user ID. Contributors within each group also had the option to explicitly turn the tool on or off in their preferences; however, these contributors remained in the same group they were bucketed in for the duration of the test.

Upon conclusion of the test on 10 March 2021, we recorded a total of 10,175 comment attempts initiated across both test groups by 4,690 distinct contributors across all experience levels. A total of 2,612 (55.7%) of these contributors were identified as Junior Contributors. Data was collected in EditAttemptStep.

In this test, a user can complete a comment using the Reply Tool or using reply workflows available with wikitext full-page and section editing. For the purpose of this analysis, these two types of editing experiences are defined as follows:

Reply Tool Comment: Any edit to a talk page namespace made with the reply tool. The reply tool allows edits using both wikitext and source mode. Reply tool edits exclude edits to create new sections or new pages on a talk page using either the new discussion tool or wikitext editing. Reply tool events were sampled at 100%.

Recorded in EditAttemptStep as: event.action = 'init', event.integration = 'discussiontools', event.init_type = 'page'

Page Editing Comment: Any edit to a talk page namespace that was not made with the reply tool and not edits to create new sections or new pages using either the new discussion tool or wikitext editing. Note that it’s possible that some of the edits were corrective edits (i.e fixing a signature) but current instrumentation does not decipher between this type of edit and a reply. These events were sampled at a rate of 1/16, or 6.125%

Recorded in the EditAttemptStep as: event.action = 'init', event.integration = 'page' , event.init_type = 'section' or 'page', event.init_mechanism = 'click'

See the following Phabricator tickets for further details regarding instrumentation and implementation of the AB test:

Comment completion rate for Junior Contributors

We first calculated the comment completion rate for Junior Contributors by editing experience (i.e. Was the contributor able to successfully save at least one comment with or without using the reply tool?) overall and across each participating Wikipedia.

For this analysis, we are defining the comment completion rate as the percent of contributors that successfully published (event.action = 'saveSuccess') at least one comment after opening a particular editing interface (event.action = 'init') during the time of the AB test. Note that this does not take into account the number of attempts it took for the user to publish or the duration of their editing sessions.

Overall comment completion Rrte by Junior Contributors

Junior contributors comment completion rate
across all participating Wikipedias
Editing experience Number of users attempted Number of users completed Comment completion rate1
Page editing2 1301 359 27.6%
Reply tool3 1311 956 72.9%

1 Defined as percent of contributors that made a comment attempt and publish at least 1 comment.

2 Sampling rate for Non-Reply Tool events is 6.25%

3 Sampling rate for Reply Tool events is 100%

**Figure 2**: Percent of Junior Contributors that completed at least one comment attempt on a talk page during the AB test.

Figure 2: Percent of Junior Contributors that completed at least one comment attempt on a talk page during the AB test.

Comment completion rate by participating Wikipedia

Junior contributors comment completion rate by participating Wikipedia
Wikipedia Editing experience1,2 Number of users attempted Number of users completed Completion rate3
Afrikaans Wikipedia Reply tool 1 1 100%
Bengali Wikipedia Page editing 17 8 47.1%
Bengali Wikipedia Reply tool 8 6 75%
Chinese Wikipedia Page editing 52 15 28.8%
Chinese Wikipedia Reply tool 43 25 58.1%
Dutch Wikipedia Page editing 38 9 23.7%
Dutch Wikipedia Reply tool 45 40 88.9%
Egyptian Wikipedia Page editing 6 2 33.3%
Egyptian Wikipedia Reply tool 3 2 66.7%
French Wikipedia Page editing 191 65 34%
French Wikipedia Reply tool 303 240 79.2%
Hebrew Wikipedia Page editing 91 36 39.6%
Hebrew Wikipedia Reply tool 57 35 61.4%
Hindi Wikipedia Page editing 25 5 20%
Hindi Wikipedia Reply tool 11 6 54.5%
Indonesian Wikipedia Page editing 47 6 12.8%
Indonesian Wikipedia Reply tool 14 9 64.3%
Italian Wikipedia Page editing 174 52 29.9%
Italian Wikipedia Reply tool 237 173 73%
Japanese Wikipedia Page editing 81 13 16%
Japanese Wikipedia Reply tool 44 29 65.9%
Korean Wikipedia Page editing 24 7 29.2%
Korean Wikipedia Reply tool 15 8 53.3%
Persian Wikipedia Page editing 88 36 40.9%
Persian Wikipedia Reply tool 55 32 58.2%
Polish Wikipedia Page editing 58 16 27.6%
Polish Wikipedia Reply tool 74 46 62.2%
Portuguese Wikipedia Page editing 83 20 24.1%
Portuguese Wikipedia Reply tool 142 116 81.7%
Spanish Wikipedia Page editing 241 47 19.5%
Spanish Wikipedia Reply tool 209 154 73.7%
Swahili Wikipedia Page editing 1 0 0%
Swahili Wikipedia Reply tool 1 1 100%
Thai Wikipedia Page editing 12 3 25%
Thai Wikipedia Reply tool 8 6 75%
Ukrainian Wikipedia Page editing 37 13 35.1%
Ukrainian Wikipedia Reply tool 28 19 67.9%
Vietnamese Wikipedia Page editing 35 6 17.1%
Vietnamese Wikipedia Reply tool 13 8 61.5%

1 Sampling rate for Non-Reply Tool events is 6.25%

2 Sampling rate for Reply Tool events is 100%

3 Defined as percent of contributors that make a comment attempt and publish at least 1 comment.

**Figure 3**: Percent of Junior Contributors that completed a comment attempt on a talk page. There were a limited number of AB test events recorded for Swahili, Afrikaans, and Egyptian Wikipedia and no recorded AB test events for Amharic and Oromo Wikipedia. As a result, these Wikipedia projects were removed from the chart above as we are not able to conclude any effects from the reply tool on these specific projects.

Figure 3: Percent of Junior Contributors that completed a comment attempt on a talk page. There were a limited number of AB test events recorded for Swahili, Afrikaans, and Egyptian Wikipedia and no recorded AB test events for Amharic and Oromo Wikipedia. As a result, these Wikipedia projects were removed from the chart above as we are not able to conclude any effects from the reply tool on these specific projects.

Junior contributors had a much higher success rate posting a comment using the reply tool compared to page editing. Overall, 72.9% of all Junior Contributors that made a comment attempt were able to successfully publish at least 1 comment with the reply tool, while only 27.6% of all Junior Contributors successfully published a comment using page editing. This represents a 164% (2.6x) observed increase in comment completion rate.

This trend is reflected consistently for each participating Wikipedia as well. Junior Contributors had a higher comment completion rate using the reply tool compared to non-reply tool editor interfaces on every participating Wikipedia.

Indonesian, Japanese, Dutch and Spanish Wikipedias saw the highest percent increases in comment completion rates with the reply tool. We observed the two lowest percent increase in edit completion rate for Persian (42% increase) and Hebrew Wikipedias (55% increase). These are both right-to-left languages, which might impact the reply tool experience and workflows for contributors on these projects; however, we have limited data recorded for right to left languages in the AB test to confirm any impact from language direction.

Modeling the impact of the reply tool

We next explored different models to correctly infer the impact of the reply tool on whether a comment was completed or not and account for the random effects by the user and wiki. This allows us to confirm if the observed increase above is statistically significant (did not occur due to random chance).

Comment attempts completed on the same Wikipedia and by the users on that Wikipedia are related to each other. Therefore, we can more accurately infer the impact of the reply tool by accounting for the effect of the user and wiki on the success probability of a Junior Contributor completing an edit.

We used a Bayesian Hierarchical regression model to model this structure. In this model, the user and Wikipedia are random effects and whether the reply tool was used is the fixed effect or predictor variable.

`

Posterior summary of model parameters
Point Estimate 95% CI1
Parameter
(Intercept) −1.036 (−1.263, −0.832)
Using reply tool 1.974 (1.775, 2.234)
Function of parameter(s)
Multiplicative effect on odds 7.200 (5.903, 9.334)
Maximum Lift 49.4%2 (44.4%, 55.8%)
Average lift 45.6%3 (41.6%, 50.5%)

1 CI: Credible Interval

2 Maximum lift calculated using the divide-by-4-rule

3 Average lift = Pr(Success|Reply Tool) - Pr(Success|Page Editing) = logit-10 + β1) - logit-10)

Since the model parameters are on the log-odds scale, we needed to apply the following transformations to make sense of them. * We used the “divide-by-4” rule suggested by Gelman, Hill, and Vehtari 2021 3 to approximate the maximum increase in the probability of success corresponding to which editing interface (reply tool or page editing) was used. Using the bayesian model, we can also directly calculate the average lift. * Since the model parameters are on the log-odds scale, we need to take the exponentiation of the effect (exp(β1)) to determine the multiplicative effect on the odds of a Junior Contributor successfully publishing at least 1 comment.

Based on estimates from the model, we found that Junior Contributors who open the reply tool are about 7 times more likely to successfully publish a comment than Junior Contributors who use page editing.

We also found there is an average 45% increase (maximum 49% increase) in the probability of a Junior Contributor publishing a comment when they switch from using page editing to the reply tool.

We can confirm statistical significance at the 0.05 level for all of these estimates (as indicated by credible intervals that do not cross 1).

Accounting for experience level

Comment completion rate across all experience levels

As the purpose of this test, we primarily focused on determining if the reply tool had an impact on Junior Contributors’ comment completion rate; however, we also reviewed the reply tool impact across all contributors’ comment completion rates to provide insight into differences due to experience level.

In this analysis, we’ve defined experience level as two levels: junior and non-junior: Junior Contributors are contributors with under 100 cumulative edits and non-junior are contributors with over 100 cumulative edits.

Note that this binary definition doesn’t fully capture the gradual growth of an editor. For example, using this binary definition, a contributor with 99 edits would just have to make a 1 more edit to be redefined as a Senior Contributor and their probability of completing an edit would suddenly increase. Also, changes in lower edit counts (i.e. 1 to 2 edits) indicate a higher impact than changes in higher edit counts (e.g. 5,000 to 5,001 edits). However, we used the binary definition as it aligns with how we’ve defined the target audience and helps simplify the model for the purposes of this analysis.

Contributors comment completion rate
Across all experience levels and participating Wikipedias
Editing experience Number of users attempted Number of users completed Completion rate1
Page editing2 2650 1369 51.7%
Reply tool3 2040 1406 68.9%

1 Defined as percent of contributors that make a comment attempt and publish at least 1 comment.

2 Sampling rate for Non-Reply Tool events is 6.25%

3 Sampling rate for Reply Tool events is 100%

**Figure 4**: Percent of contributors that completed a comment attempt on a talk page across all contributor experience levels and participating Wikipedias.

Figure 4: Percent of contributors that completed a comment attempt on a talk page across all contributor experience levels and participating Wikipedias.

Across all contributor experience levels, there was a 35% increase in the percent of contributors that were able to successfully publish a comment using the reply tool compared to contributors using page editing methods. This percent change is much lower than what we found when focusing only on Junior Contributors’ comment completion rates indicating that the experience level has a significant effect on the impact of the reply tool.

68.9% of Contributors across all experience levels were able to publish one comment using the reply tool. This is slightly lower than the percent of Junior Contributors (72.9%) that published a comment using the reply tool. However, the comment completion rate for page editing comments is much higher when looking at Contributors across all experience levels. 51.7% by Contributors across all experience levels were able to complete at least one comment using non-reply tool editing interfaces compared to only 27% of Junior Contributors.

Based on this observed data, it appears that experience level has a small impact on the ability of contributors to publish a comment using the reply tool but a large impact on the ability of contributors able to publish a comment using page editing.

Comment completion rate by experience level

Contributors comment completion rate by experience level
Across all participating Wikipedias
Experience level Editing experience Number of users attempted Number of users completed Completion rate1
Non-Junior Contributor2 Page editing 1365 1010 74%
Non-Junior Contributor Reply tool 738 450 61%
Junior Contributor3 Page editing 1301 359 27.6%
Junior Contributor Reply tool 1311 956 72.9%

1 Defined as percent of contributors that make a comment attempt and publish at least 1 comment.

2 Defined as having over 100 cumulative edits

3 Defined as having under 100 cumulative edits

**Figure 5**: Percent of contributors that completed a comment attempt on a talk page by contributors’ experience level. A Junior Contributor is a contributor with under 100 cumulative edits and a non-Junior Contributor is a contributor with over 100 cumulative edits

Figure 5: Percent of contributors that completed a comment attempt on a talk page by contributors’ experience level. A Junior Contributor is a contributor with under 100 cumulative edits and a non-Junior Contributor is a contributor with over 100 cumulative edits

When comparing Junior Contributor comment completion rate to Non-Junior Contributors, we see a clear difference between the two experience levels. Junior contributors comment completion rate with page editing methods is lower than the comment completion rate observed for non-junior contributors during the AB test. However, using the reply tool, Junior contributors’ comment completion rate was almost the same as the Non-Junior contributors comment completion rate using page editing.

Modeling the impact of the reply tool

Since comment completion rates seem to vary significantly based on the contributor’s experience level, we adjusted the Bayesian Hierarchical Regression Model to include the Contributors’ experience level as an interaction term in the model in addition to the effects of the user and wiki on comment completion rate.

**Figure 6**: The conditional effects of a Contributor’s experience level and type of editor used on the likelihood of completing an comment.

Figure 6: The conditional effects of a Contributor’s experience level and type of editor used on the likelihood of completing an comment.

The above plot shows the predicted effects of the Contributor’s experience level and the type of editor that the used (reply tool vs page editing) on the probability of successfully publishing a comment that they started on a talk page.

Based on the model, we can confirm the following: - A junior contributor is significantly more likely to successfully publish an edit than a junior contributor using page editing. - A junior contributor using the reply tool is, roughly, just as likely to post as a non-junior contributor using page editing. - A non-junior contributor using page editing is more likely to publish an edit using page editing then the reply tool.

The higher comment completion rate we see for Non-Junior Contributors may be due a tendency to stick with what they already know. In addition, our page editing definition currently includes corrective edits made to the page which are frequently conducted by more Senior Contributors and also not possible with the reply tool.

Guardrail analysis

We also wanted to ensure that enabling the reply tool did not result in an increase in the number of disruptive edits being made to talk pages.

To evaluate any disruption caused by the reply tool, we determined the percent of comments made to talk pages that were reverted within 48 hours and the percent of contributors blocked after making a comment to a talk page.

Comment revert rate for Junior Contributors

Methodology

For this analysis, we reviewed data recorded in mediawiki_history to identify the percent comments posted by the reply tool (identified by the revision tag: discussiontools-reply) on talk pages that are reverted within 48 hours 4.

We compared the revert rate for comments published using the reply tool to the revert rate for comments made using page editing during the same timeframe.

The reviewed data excludes wikitext edits to create new pages and edits to start new topics using the new discussion tool.

Overall revert rate by editor type

Junior contributors comment revert rate across all participating Wikipedias
Across all participating Wikipedias
Editing experience1 Number of comments reverted Number of comments published Revert rate2
Page editing 2648 24615 10.76 %
Reply tool 60 2716 2.21 %

1 Data comes from mediawiki_history

2 Defined as percent of comments reverted within 48 hours.

**Figure 7**: Percent of comments made by Junior Contributors on  talk pages that are reverted within 48 hours of being published.

Figure 7: Percent of comments made by Junior Contributors on talk pages that are reverted within 48 hours of being published.

Overall, across all participating Wikipedia, we observed a 79.5% decrease in the revert rate for comments made with the reply tool compared to page editing. The reply tool seems to enable Junior Contributors to not only successfully complete a comment but reduce the number of errors in the published comment that might lead to the comment to being reverted.

Revert rate by wiki

Junior Contributors comment revert rate by participating Wikipedia
Wikipedia Editing experience1 Number of comments reverted Number of comments published Revert rate2
Afrikaans Wikipedia Page editing 0 50 0 %
Afrikaans Wikipedia Reply tool 0 5 0 %
Amharic Wikipedia Page editing 0 9 0 %
Amharic Wikipedia Reply tool 0 1 0 %
Bengali Wikipedia Page editing 93 898 10.36 %
Bengali Wikipedia Reply tool 1 11 9.09 %
Chinese Wikipedia Page editing 63 1003 6.28 %
Chinese Wikipedia Reply tool 2 89 2.25 %
Dutch Wikipedia Page editing 29 539 5.38 %
Dutch Wikipedia Reply tool 2 170 1.18 %
Egyptian Wikipedia Page editing 4 94 4.26 %
Egyptian Wikipedia Reply tool 0 2 0 %
French Wikipedia Page editing 139 3882 3.58 %
French Wikipedia Reply tool 11 651 1.69 %
Hebrew Wikipedia Page editing 160 1379 11.6 %
Hebrew Wikipedia Reply tool 0 110 0 %
Hindi Wikipedia Page editing 241 726 33.2 %
Hindi Wikipedia Reply tool 0 15 0 %
Indonesian Wikipedia Page editing 64 523 12.24 %
Indonesian Wikipedia Reply tool 1 17 5.88 %
Italian Wikipedia Page editing 134 2332 5.75 %
Italian Wikipedia Reply tool 5 493 1.01 %
Japanese Wikipedia Page editing 252 1636 15.4 %
Japanese Wikipedia Reply tool 3 103 2.91 %
Korean Wikipedia Page editing 163 1349 12.08 %
Korean Wikipedia Reply tool 1 23 4.35 %
Oromo Wikipedia Page editing 0 3 0 %
Persian Wikipedia Page editing 306 3367 9.09 %
Persian Wikipedia Reply tool 1 102 0.98 %
Polish Wikipedia Page editing 114 877 13 %
Polish Wikipedia Reply tool 2 143 1.4 %
Portuguese Wikipedia Page editing 307 1805 17.01 %
Portuguese Wikipedia Reply tool 3 330 0.91 %
Spanish Wikipedia Page editing 429 2818 15.22 %
Spanish Wikipedia Reply tool 25 336 7.44 %
Swahili Wikipedia Page editing 0 21 0 %
Swahili Wikipedia Reply tool 0 6 0 %
Thai Wikipedia Page editing 14 168 8.33 %
Thai Wikipedia Reply tool 0 18 0 %
Ukrainian Wikipedia Page editing 65 662 9.82 %
Ukrainian Wikipedia Reply tool 2 68 2.94 %
Vietnamese Wikipedia Page editing 71 474 14.98 %
Vietnamese Wikipedia Reply tool 1 23 4.35 %

1 Data comes from mediawiki_history. Sampling rate is 100% all events

2 Defined as percent of comments reverted within 48 hours.

**Figure 8**: Percent of comments made by Junior Contributors on  talk pages that are reverted within 48 hours of being published. No published talk page comments were recorded for Oromo Wikipedia during the duration of the AB test and limited data were recorded for Afrikaans, Amharic,  Swahili , and Egyptian Wikipedias As a result, these Wikipedia projects were removed from the chart above as we are not able to accurately determine a revert rate representative of the population.

Figure 8: Percent of comments made by Junior Contributors on talk pages that are reverted within 48 hours of being published. No published talk page comments were recorded for Oromo Wikipedia during the duration of the AB test and limited data were recorded for Afrikaans, Amharic, Swahili , and Egyptian Wikipedias As a result, these Wikipedia projects were removed from the chart above as we are not able to accurately determine a revert rate representative of the population.

Some per participating Wikipedia trend highlights :

  • Comments made with the reply tool had lower revert rates compared to comments made with non-reply tool editing on each participating Wikipedias.
  • We observed the highest reply tool revert rates on Bengali Wikipedia (9.09%), Spanish Wikipedia (7.44%), and Indonesian Wikipedia (5.88%). Reply tool revert rates for all the other participating Wikipedias were under 5%.

Revert rate By experience level

Contributors comment revert rate by experience level
Experience level1 Editing experience2 Number of comments reverted Number of comments published Revert rate3
Non-Junior Contributor Page editing 4177 272605 1.53 %
Non-Junior Contributor Reply tool 136 7147 1.9 %
Junior Contributor Page editing 2648 24615 10.76 %
Junior Contributor Reply tool 60 2716 2.21 %

1 Junior contributor defined as having under cumulative 100 edits. Non-Junior Contributor is defined as having over 100 cumulative edits

2 Data comes from mediawiki_history

3 Defined as percent of comments reverted within 48 hours.