Research Guides: Research & Data Services: Analysis

Data Cleaning

Description: We offer guidance on preparing messy or raw data for analysis. Our workshops and consultations cover techniques for detecting/removing errors, handling missing values, deduplicating records, transforming variables, reshaping data, and standardizing formats. We are able to support tools/languages like OpenRefine, Python and R.

Tools we support: OpenRefine, Python (pandas), R (tidyverse)

How we help: Imagine you've collected 1000 survey responses, but the data contains inconsistences, errors and requires some cleaning up prior to analysis. We offer:

Introductory workshops on data cleaning tools and techniques
Consultations to discuss your specific data cleaning challenges
Guidance on choosing the right tools and methods for your project

Speech Transcription & Diarization

Description: Imagine you have hours of important audio recordings from meetings, interviews, or lectures that need to be converted into searchable text with speaker identification. We offer:

Consultations to discuss your specific transcription needs and challenges
Guidance on preparing your audio files for optimal transcription results.

Tools we support: Whisper and PyAnnote (via Hugging Face)

How we help:

The library will cover the cost of transcription and speaker diarization for up to 4 hours of audio content.

Customized coding workshops to complete research tasks (co-designed with domain experts)

How we help: For users with no prior experience in Python or R programming, the library offers co-designed workshops tailored to the specific research tasks in business, social science, and economics disciplines.

Tools we support:

For python: Jupyter Lab, VS Code
For R: RStudio

Case example: These are some of the workshops that we have conducted in the past:

Introductory R for Social Science (organized with materials co-created with SOSS)
- Also conducted for IDIS100 class
Python for Macroeconomics Research, materials co-created with SEIC and DSA

Thematic Coding / Qualitative Data Analysis workshops to complete research tasks

How we help: Analyzing qualitative data such as interview transcripts or free text includes discovering and identifying themes, establishing the relationship between them, and linking themes to theoretical models.

For users with no prior experience in thematic coding or qualitative data analysis, the library offers workshops to understand the underlying principles and how to apply them to your research.

Tools we support:

Atlas.Ti
NVivo

Case Example: These are some of the workshops that we have conducted in the past:

Thematic Coding Workshop (in Prof Tsai Ming-Hong's SMU-X Social Science Practicum)

Data Anonymization

How we help:

Provide guidance on anonymizing/de-identifying sensitive data to protect individual privacy while preserving data utility. This applies to both tabular data tables as well as qualitative data such as interview transcripts. We can help by conducting a consultation on introductory concepts, pointing you to the tools and resources, or work with you closely to come up with a strategy to anonymize your dataset.

We recommend anonymizing your dataset before analysis or sharing with collaborators. This protects participant information, prepares data for repository deposit (e.g., SMU RDR), ensures compliance with ethical standards and regulations, and facilitates collaboration and data sharing.

Tools we support:

Amnesia
Textwash
Other NER tools

Case Example:

Conducted 1-hr workshop "Data Anonymization 101 for Researchers"

Research Data Management

How we help:

Support researchers with various aspects of Research Data Management (RDM), including:

Data management planning
Data organization and documentation
Data anonymization for sensitive/personal data
Data deposit and publishing with SMU Research Data Repository
Enabling FAIR (Findable, Accessible, Interoperable, Reusable) data practices
Compliance with journal, publisher, and institutional requirements for research data availability

We offer guidance and support throughout the research lifecycle to ensure best practices in RDM.

Tools we support:

SMU Research Data Repository for data storage, collaboration, and publishing
DMPtool for Data Management Planning

Access to Research Software

How we help:

We provide researchers, undergraduate and postgraduate students with access to a diverse range of research software spanning the entire research lifecycle. This service offers opportunities to explore new tools, gain access to subscription and license-based software, and enhance research capabilities. Software is available through remote access or on-site at the Investment and Data Studio (IDS).

Tools available:

Quantitative and statistical tools (e.g., SPSS, Matlab, STATA)
Qualitative analysis software (e.g., NVivo, Atlas.ti)
Document preparation tools (e.g., Overleaf)

Refer to this page for the full list of available software and tools.

Case Example:

A research staff member wanted to learn STATA but didn't have a license. Through our service, they gained access to STATA at the IDS, allowing them to develop their skills and apply the software to their research project.
A group of undergraduate students needed to use SPSS for a course assignment. They were able to use IDS terminals to complete their assignment without having to purchase the software.
A student used IDS computers to complete a data-intensive assignment that their laptop couldn't handle. The better processing power and memory of IDS machines allowed them to efficiently run Python for analyzing large datasets, overcoming the limitations of their personal devices.

Survey Collection and Analysis

How we can help:

We provide customized workshops or guidance on the following:

Structuring survey questions and question types to ensure it fits your research objectives
Distribution strategy to help ensure your survey reach your target audience.
Data collection and cleaning strategy to prepare your data for analysis.
Generate a simple descriptive report on your survey responses.
Getting started with using Qualtrics or R to conduct survey data analysis and visualization.

Tools we support:

Qualtrics (for data collection and descriptive reporting)
R & RStudio (for analysis and visualizations)