A number of data sets are useful for researchers studying the science of science funding.  A short list is provided here:

Dimensions:  https://www.dimensions.ai/

Researchers can access the free Dimensions application covering 97 million publications, contextualized with grants, patents and clinical trials by visiting https://app.dimensions.ai/.

Lens:  https://www.lens.org/

  • Lens hosts most of the world's patent information and also scholarly literature (like PUBMED, CrossRef and Microsoft Academic), creating open public innovation portfolios of individuals and institutions. 

Risis:  http://datasets.risis.eu/

  • The EU funded RISIS covers data sets on public sector research, research careers, and a repository on research and innovation policy evaluations.  It includes inter alia:   EUPRO dataset comprises information on R&D projects and all participating organizations funded by the European Framework Programmes (FP); PROFILE is a longitudinal, multi-cohort panel study focusing on the situation of doctoral candidates and their postdoctoral professional careers. The sample consists of doctoral candidates at universities and funding organizations in Germany; The Science and Innovation Policy Evaluation Repository (SIPER) is a database consisting of science and innovation policy evaluations from across the world;  The RISIS-ETER facility is a set of databases providing a register of European Higher Education Institutions and containing basis statistical information on them, including descriptors, geographical information, students and graduates, personnel, finances, and research activities;The CWTS Leiden Ranking is a database of a university ranking focusing on output and impact of research. 

Marx/Fuegi patent-to-paper linkages:  https://zenodo.org/record/3238722

  • We link non-patent literature (NPL) citations from the front page of USPTO patents since 1947 to academic articles since 1800 from the Microsoft Academic Graph (MAG).  Each linkage has the original NPL, patent number, MAG id, examiner/applicant indicator, and confidence score for the linkage.  Also included is a redistribution of MAG so researchers can merge in DOIs, dates, authors, journals, and other bibliometric info.

Data on federally-funded patents

Patent text data and code for calculating text-based similarity between any two utility patents granted by the USPTO between 1976 and 2013, or between any two patent portfolios

For more information, see Arts, Sam, Bruno Cassiman, and Juan Carlos Gomex.  "Text matching to measure patent similarity." Strategic Management Journal 39, no. 1 (2018): 62-84