Data

The COKI Open Access Dataset is available in JSON Lines format. See below for dataset releases, license, how to cite the website and dataset, attributions and the dataset schema.

Releases

2024-03-14Download coki-oa-dataset.zip
2024-01-25Download coki-oa-dataset.zip
2023-12-19Download coki-oa-dataset.zip
2023-12-12Download coki-oa-dataset.zip
2023-11-15Download coki-oa-dataset.zip

License

The COKI Open Access Dataset © 2022 by Curtin University is licensed under CC BY 4.0.

Citing

To cite the COKI Open Access Dashboard please use the following citation:

Diprose, J., Hosking, R., Rigoni, R., Roelofs, A., Chien, T., Napier, K., Wilson, K., Huang, C., Handcock, R., Montgomery, L., & Neylon, C. (2023). A User-Friendly Dashboard for Tracking Global Open Access Performance. The Journal of Electronic Publishing 26(1). doi: https://doi.org/10.3998/jep.3398

If you use the website code, please cite it as below:

James P. Diprose, Richard Hosking, Richard Rigoni, Aniek Roelofs, Alex Massen-Hane, Kathryn R. Napier, Tuan-Yow Chien, Katie S. Wilson, Lucy Montgomery, & Cameron Neylon. (2022). COKI Open Access Website. Zenodo. https://doi.org/10.5281/zenodo.6374486

If you use this dataset, please cite it as below:

Richard Hosking, James P. Diprose, Aniek Roelofs, Tuan-Yow Chien, Lucy Montgomery, & Cameron Neylon. (2022). COKI Open Access Dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6399462

For other citation formats follow the doi.org links in the above citations.

Dataset Attributions

The COKI Open Access Dataset contains information from:

Schema

FieldTypeDescription
idStringThe country id; an ISO 3166-1 alpha-3 country code.
nameStringThe country name.
subregionStringThe name of the subregion the country is located in.
regionStringThe name of the region the country is located in.
start_yearIntegerThe start year of data used to calculate the statistics.
end_yearIntegerThe end year of data used to calculate the statistics.
statsPublicationStatsThe aggregated publication statistics for this country, for all time.
yearsList<Year>The publication statistics for each year.

Table 1. Country Schema.

FieldTypeDescription
idStringThe institution id; a Research Organization Registry identifier.
nameStringThe institution name.
country_nameStringThe name of the country where the institution is located.
country_codeStringThe three letter an ISO 3166-1 alpha-3 code of the country where the institution is located.
subregionStringThe name of the subregion where the institution is located.
regionStringThe name of the region where the institution is located.
institution_typesList<String>A list of institution types that apply to this institution. Each instance can be one of: Education, Healthcare, Company, Archive, Nonprofit, Government, Facility, Other.
start_yearIntegerThe start year of data used to calculate the statistics.
end_yearIntegerThe end year of data used to calculate the statistics.
statsPublicationStatsThe aggregated publication statistics for this institution, for all time.
yearsList<Year>The publication statistics for each year.

Table 2. Institution Schema.

FieldTypeDescription
n_citationsIntegerThe total number of outputs cited.
n_outputsIntegerThe total number of outputs published.
n_outputs_openIntegerThe total number of open outputs.
n_outputs_publisher_openIntegerThe total number of outputs published as Publisher Open.
n_outputs_publisher_open_onlyIntegerThe total number of outputs published only as Publisher Open (and not Other Platform Open or Closed).
n_outputs_bothIntegerThe total number of outputs published that are both Publisher Open and Other Platform Open.
n_outputs_other_platform_openIntegerThe total number of outputs published as Other Platform Open.
n_outputs_other_platform_open_onlyIntegerThe total number of outputs published only as Other Platform Open (and not Publisher Open or Closed).
n_outputs_closedIntegerThe total number of outputs published as Closed.
n_outputs_oa_journalIntegerPublisher Open Breakdown: the total number of outputs published in an Open Access Journal.
n_outputs_hybridIntegerPublisher Open Breakdown: the total number of outputs made accessible in a Subscription Journal with an open license.
n_outputs_no_guaranteesIntegerPublisher Open Breakdown: the total number of outputs made accessible in a Subscription Publisher with no reuse rights.
p_outputs_openFloatThe percentage of open outputs.
p_outputs_publisher_openFloatThe percentage of outputs published as Publisher Open.
p_outputs_publisher_open_onlyFloatThe percentage of outputs published only as Publisher Open (and not Other Platform Open or Closed).
p_outputs_bothFloatThe percentage of outputs published that are both Publisher Open and Other Platform Open.
p_outputs_other_platform_openFloatThe percentage of outputs published as Other Platform Open.
p_outputs_other_platform_open_onlyFloatThe percentage of outputs published only as Other Platform Open (and not Publisher Open or Closed).
p_outputs_closedFloatThe percentage of outputs published as Closed.
p_outputs_oa_journalFloatThe percentage of Publisher Open outputs published in an Open Access Journal.
p_outputs_hybridFloatThe percentage of Publisher Open outputs made accessible in a Subscription Journal with an open license.
p_outputs_no_guaranteesFloatThe percentage of Publisher Open outputs made accessible in a Subscription Publisher with no reuse rights.

Table 3. PublicationStats Schema.

FieldTypeDescription
yearIntegerThe year that this record applies to.
dateDateThe date that this record applies to, in the format YYYY-MM-DD. The day and month are always the end of the year in question, i.e. the 31st of December.
statsPublicationStatsThe aggregated publication statistics for the year that this record applies to.

Table 4. Year Schema.

Share

Share

Share