A data paper is a scholarly publication describing a particular dataset or a group of datasets, published in the form of a peer-reviewed article in a scholarly journal.
Unlike a conventional research article, the main goal of a data paper is to describe the dataset(s) focusing on collection method, distinguishing features, access and potential reuse rather than on data processing and analysis. Data papers are usually assigned an unique identification number (e.g. DOI), thus can be cited just like conventional journal articles.
Different publishers may have different requirements, so you should read their submission guidelines carefully beforehand. As a general guideline, here are what usually happens before a data paper is published:
Refer to the section below for a list of journals where you can publish a data paper.
You may want to make your data freely accessible online for different reasons. Maybe the funder or publisher requires authors to make data publicly available, or you may want more people to find your research or cite your paper.
You can deposit your dataset with SMU Research Data Repository (RDR). The benefits include:
Visit SMU RDR User Guide for instructions. Email the librarian if you have any questions/need help.
There are numerous free or fee-based data repositories online. Some are subject-based while others may be multi-disciplinary. One of the good websites that you should check out is re3data.org, a registry of online research data repositories.
Listed below are several big data repositories that you can explore:
Repository Name | Subject | Cost/Access Control | About |
DataVerse | Multi-disciplinary | Free for individual researchers | Dataverse is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others, and allows you to replicate others' work more easily. Researchers, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility. |
FigShare | Multi-disciplinary | Unlimited storage if you make your data publicly available |
figshare is a repository where users can make SMU is an institutional subscriber of FigShare. Visit SMU Research Data Repository (RDR) to use the service. |
ICPSR | Social Sciences | Fee-based if you want to make your data open to the public; otherwise only ICPSR members can access the data | An international consortium of more than 700 academic institutions and research organizations, ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. It hosts 16 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields. |
Databrary | Psychology; Developmental Science | Free | Databrary is a video data library for developmental science. Share videos, audio files, and related metadata. Discover more, faster. |
CodePlex | Computer Science | Free | CodePlex is Microsoft's free open source project hosting site. You can create projects to share with the world, collaborate with others on their projects, and download open source software. |
GitHub | Computer Science | Free for public and open source projects; Fee-based for unlimited private repositories |
GitHub is a web-based repository hosting service. It offers all of the distributed revision control and source code management (SCM) functionality of Git as well as adding its own features. GitHub provides access control and several collaboration features such as bug tracking, feature requests, task management, and wikis for every project. |
Launchpad | Computer Science | Free | Launchpad is a software collaboration platform that provides functionality such as code hosting, bug tracking, code reviews, etc. |
SourceForge | Computer Science | Free | SourceForge allows the user to find, create and publish open source software for free. |
Journal | Publisher | Overview | Subject Areas |
Scientific Data | Nature Publishing Group | Scientific Data is a new open-access, online-only publication for descriptions of scientifically valuable datasets. Scientific Data exists to help you publish, discover and reuse research data. | multidisciplinary; natural sciences; social sciences; business and industry |
International Journal of Robotics Research | SAGE Publications | International Journal of Robotics Research (IJRR) was the first scholarly publication on robotics research; it continues to supply scientists and students in robot and related fields - artificial intelligence, applied mathematics, computer science, electrical and mechanical engineering - with timely, multidisciplinary material on topics from sensors and sensory interpretations to kinematics in motion planning. IJRR also publishes peer reviewed data papers and multimedia extensions alongside articles. | artificial intelligence, applied mathematics, computer science, electrical and mechanical engineering |
Applied Informatics | SpringerOpen | Applied Informatics covers the theory and application of informatics in various scientific, technological, engineering and social fields. Aiming to inspire new multidisciplinary research, the journal acts as an integrative venue that collects high-quality original research papers and reviews on various aspects of applied informatics, with the foundations of informatics (information theory, statistical modeling, machine learning, etc) as the driving core and the interactions between essential realms as the promoting focuses; particularly important are the interactions between (a) life sciences (bioinformatics, medical informatics, bioengineering, etc); and (b) intelligence sciences (neural and cognitive informatics, multimedia, pattern recognition, etc), and (c) community sciences (social networks, affective computing, big data analytics, etc). | applied informatics |
SpringerPlus | SpringerOpen | SpringerPlus accepts manuscripts from all disciplines of Science. We accept manuscripts describing original research as well as case descriptions and methods, and we expressly encourage submission of data reports and large datasets. | all disciplines of science, technology, engineering, medicine and humanities & social sciences |
Journal of Open Psychology Data | Ubiquity Press | The Journal of Open Psychology Data (JOPD) features peer reviewed data papers describing psychology datasets with high reuse potential. Data papers may describe data from unpublished work, including replication research, or from papers published previously in a traditional journal. We are working with a number of specialist and institutional data repositories to ensure that the associated data are professionally archived, preserved, and openly available. Equally importantly, the data and the papers are citable, and reuse is tracked. | psychology |
Research Data Journal for the Humanities and Social Sciences | Brill | Research Data Journal for the Humanities and Social Sciences (RDJ) is a peer reviewed e-only open access journal, which is designed to comprehensively document and publish deposited data sets and to facilitate their online exploration. In this way it wants to contribute to transparency of research, accelerate dissemination and foster reuse. The journal concentrates on the Humanities and Social Sciences, covering history, archaeology, language and literature in particular. The publication languages are English and Dutch. The RDJ contains data papers: scholarly publications of medium length (with a maximum of 2500 words) containing a non-technical description of a data set and putting the data in a research context. A data paper gets a persistent identifier and provides publication credits to the author, who is usually (but not necessarily) also the data depositor. Research Data Journal for the Humanities and Social Sciences is published in collaboration with Data Archiving and Networked Services (DANS). | humanities and social sciences |