MainImage Circle Circle Circle
The Open-Source Software Community
Understand the Underlying Inequalities in the Movement That is Changing the Face of Technology


The Problem

Open-source software (OSS) is now ubiquitous, forming a solid digital infrastructure for our everyday life. This digital infrastructure requires tremendous efforts to develop and maintain. Without constant support, destructive consequences such as "The Heartbleed Bug" could occur. Because of this, maintaining OSS is undoubtedly crucial.

However, several obstacles hinder OSS's sustainability. One major obstacle is the inadequate diversity, especially in gender, which causes a subsequent unwelcoming culture. Moreover, since gender diversity is shown to be associated with higher productivity, increasing gender diversity can boost a team's performance.

Hands
Our Mission This website presents a census on gender diversity ,
one of the many diversity measures, in the OSS community.
Through this site, you can find the following information:

We used two data sets for pre-processing

Registered OSS Libraries

We downloaded the list of registered OSS libraries, released in Jan 2020, from libraries.io. We considered each package manager as one ecosystem and then we retrieved each projects’ commit history from World of Code. This dataset is denoted as OSS on the website.

Github Public Repositories

We used data from GHTorrent, which consists of all GitHub activities until Mar 2021. Because GitHub projects may contain projects for personal or educational use, we excluded projects with fewer than four people as a heuristic. This dataset is denoted as PUBLIC on the website.

*Before counting the number of contributors, we performed identity merging and bot removal.

gender

Gender Inference

We used computational approaches to infer binary gender from names. In this study, we chose to use NamSor, an automatic tool that can infer gender based on one’s name and cultural background, for its high accuracy among similar tools**.

We acknowledge that binary gender does not reflect the current perception of gender. Moreover, name-based inference has limitations.

However, our results can still provide insights into the gender diversity status in OSS. Future researchers could use the results to conduct more targeted studies.

*   Please see our annotated bibliography
** Sebo, P. (2021). Performance of gender detection tools: a comparative study of name-to-gender inference services. Journal of the Medical Library Association: JMLA, 109(3), 414.

Data Analysis

We explored and analyzed the OSS contributor
community data based on several different factors

tailwind-card-image
Year

We aggregated data by time and visualized how gender distribution and female participation in OSS has changed from year to year.

tailwind-card-image
OSS Ecosystems

Our analysis considered each package manager registered at libraries.io as one ecosystem. We then aggregated data based on each ecosystem and visualized how gender distribution varies across different ecosystems.

tailwind-card-image
Contributors

We divided contributors into two categories: core and peripheral. To identify core contributors for each ecosystem, we identified projects whose number of commits were in the top 10% of their ecosystem. Then, within each of the top projects, we identified each project's core developers as those who made more than 10% of the commits within that three-month window. We analyzed and compared gender distributions among core contributors and all contributors.