Data underlying the BSc project: "An analysis of Java release practices on GitHub"
This dataset contains the following inside a tar.zst file:
A list of all Java repositories on GitHub in a CSV formatThe POM.xml file from those repositories if there was one at the root of the repoA sample of 500 000 repositories thatHave been searched recursively for POM.xml filesOf those that have a POM.xml file an 'effective' POM.xml has been createdOf those that have distribution repositories configured, GitHub workflow files if they exista report.json file that contains aggregate information of the sample
The scraper written to retrieve this data is also included.
This dataset was created for a Computer Science Bachelor Research Project titled "An analysis of Java release practices on GitHub" by Vivian Roest.