Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
The use of electronic resources must comply with the Appropriate Use of Electronic Resources Policy and Singapore Management University Acceptable Use Policy
Updated: Jun 4, 2013
Data Sets from Different Research Projects
To organize data sets generated or processed from different research projects, put them in separate folders with names reflecting the research projects. It is good practice to keep a README file under each folder logging any changes.
- Use brief and meaningful file names.
- Use no more than 260 characters for a file name.
- Recommended name style ProjectName_DatasetName_Version_YearMonthDay_Editor.XXX (.XXX stands for the file extension, for example ".xls" for excel files).
- Use the segment _version_ to indicate the version or type of data set. For example _v1.0.1_ or _regression_.
- Add or delete components in recommended name style to suit your needs. Make sure the name contains sufficient information to identify the data set and differentiate it from other files, especially when there are a large number of data files.
(Source: Organizing project data – files and folders by Florian Hollender)
When the size of the data set is not large, e.g. several megabytes, using Git for version control is a good choice. Git is an open source version control system, widely adopted by programmers. It is also suitable for version control of any documents including data set files. If you are totally new to Git, start here. It is innovative and yet easy to use.
Some Git service providers are:
- GitHub offers free acounts with unlimited public repositories, which means others can search and browse your documents.
- Bitbucket is another great option that offers free account and unlimited private repositories.
- Assembla also provides free accounts with the option of a private repository where total document size cannot exceed 1GB.
Research data sets will change as the research project progresses. For efficient management of research data, and for ease the re-use of research data sets, keep a well-maintained record of changes in your data sets. To document data sets, you need to choose a metadata standard and record all changes.
Metadata covers different components and includes:
- Description of data
- Data collection methods
- Context of data collection
- Algorithms used
- Description of steps taken to clean and manipulate raw data
- Software and systems used for analysis
- Format information
- Creator and contributor information
- Rights information
Choose a metadata standard:
- Adopt the required metadata standard of the data repository service used
- Follow the common practices in your discipline, e.g. for social and behavioral sciences use DDI metadata standard (examples)
- Adopt Dublin Core (examples)
Use metadata tools:
- DDI tools endorsed by DDI Alliance.
- DeXtris, a generic tool to explore XML statistical metadata. Supports Statistical Data and Metadata Exchange (SDMX), the Data Documentation Initiative (DDI) (including the draft release of DDI 3.0)
- Nesstar Publisher, editor for the preparation of metadata and data for publishing in an online catalog. The metadata produced is compliant with the DDI 2.n and the Dublin Core XML metadata standards.
- SDA, a set of programs for the documentation and web-based analysis of survey data, developed and maintained by the Computer-assisted Survey Methods Program (CSM) at the University of California, Berkeley. SDA programs can produce DDI-format metadata from SDA datasets and from other metadata formats.
(Source: Data Management Planning from UC Merced Library; Research Data Management from University of Oregon.)