Common research activities in this phase include data collection (through existing datasets, first-hand collection, or experiments), data analysis, and documenting/maintaining the protocols or instruments used in the analysis.
OpenRefine is an open source application used for data wrangling (data cleaning, data transformation, data parsing, etc.). You can also use the application to fetch additional data from API endpoints such as Crossref API and many others.
Power Query is a tool for data preparation that is integrated by default in Excel (Office 365 or Excel 2016) or Power BI. It enables you to connect to external data sources such as API or Sharepoint sites, and combine, reshape, and manipulate the data. Official documentation of Power Query
RStudio is an open-source integrated development environment (IDE) for R. It comes with debugging tools, syntax highlighters, and other features that makes working with R easier and more manageable. Download and installation instruction here.
SMU Libraries regularly conduct a series of R workshop that introduces various R packages taught by Assoc Prof Kam Tin Seong from SCIS every semester. Please look out for the announcement in library website.
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text using Julia, Python or R language. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. Installation instruction here.
SMU Libraries regularly conduct a series of Python workshop for beginners using Jupyter Lab/Jupyter Notebook every semester. Please look out for the announcement in library website.
SMU Libraries regularly conduct a series of R workshop that introduces various R packages (including tidyverse) taught by Assoc Prof Kam Tin Seong from SCIS every semester. Please look out for the announcement in library website.
Pandas is one of the most popular open-source Python libraries for data wrangling. It is well-known for its DataFrames for data structure transformation/manipulation. It also comes with a basic visualization and descriptive stats function so that you can get a quick overview of your data.
JASP is a free and open-source statistical software (similar to SPSS). JASP provides functions to do Bayesian statistics and you can also connect JASP to you OSF account and load data from there. The graphs and tables from your analysis will be formatted in APA style.
Jamovi is an open-source and free statistical analysis software. It is built on top of R and can provide the R code for your analysis. Jamovi also comes with spreadsheet editing features. The graphs and tables from your analysis will be formatted in APA style.
STATA is a statistical analysis software commonly used for research in the fields of sociology, political science, and economics, among others. STATA is available to SMU faculty and students upon request and approval. More information on how to request the software here.
Tools with coding required (including R and Python packages/libraries)
RStudio is an open-source integrated development environment (IDE) for R. It comes with debugging tools, syntax highlighters, and other features that makes working with R easier and more manageable. Download and installation instruction here.
SMU Libraries regularly conduct a series of R workshop that introduces various R packages taught by Assoc Prof Kam Tin Seong from SCIS every semester. Please look out for the announcement in library website.
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text using Julia, Python or R language. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. Installation instruction here.
SMU Libraries regularly conduct a series of Python workshop for beginners using Jupyter Lab/Jupyter Notebook every semester. Please look out for the announcement in library website.
NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices.
SciPy is a free and open-source Python library used for scientific and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, signal and image processing, and other tasks common in science and engineering.
A web-based textual analysis and visualization tool. Use this tool for textual data analysis. Do note that Voyant Tools do not provide text cleaning functions, so be sure to clean your text before you load them. Getting Started guide
Lexos is a web-based text analysis tools by Wheaton College. This tool is beginner-friendly as it guides you through the common steps taken in text analysis. It allows you to upload multiple text files or scrub the URLs that you provided. Comes with text cleaning features, visualizations, and statistical analysis features. The tool is also available for download here.
Texti is a text cleaner tool that could help you do the pre-processing (remove stop words, digits, lemmatize text, etc) before a text analysis. It accepts a PDF input, automatically read the PDF and apply the chosen pre-processing steps. It can also give you a simple word cloud based on the analysed text.
EdWordle is a handy web-based tool to quickly create a word cloud from a text. It comes with basic text cleaning features such as removing stop words, numbers, and grouping similar words together.
Taguette is a free and open-source text tagging/coding tools for qualitative research. Especially useful for thematic analysis. You can install the software and work locally on your computer, or you can try the cloud version to see how it works.
Tools with coding required (including R and Python packages/libraries)
RStudio is an open-source integrated development environment (IDE) for R. It comes with debugging tools, syntax highlighters, and other features that makes working with R easier and more manageable. Download and installation instruction here.
SMU Libraries regularly conduct a series of R workshop that introduces various R packages taught by Assoc Prof Kam Tin Seong from SCIS every semester. Please look out for the announcement in library website.
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text using Julia, Python or R language. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. Installation instruction here.
SMU Libraries regularly conduct a series of Python workshop for beginners using Jupyter Lab/Jupyter Notebook every semester. Please look out for the announcement in library website.
NLTK is one of the most popular python libraries for working with Natural Language Processing (NLP). See some examples here as well as the NLTK book for documentation.
spaCy is a free open-source python library for working with Natural Language Processing (NLP). Mainly created for industrial usage, spaCy claims to deliver faster performance compared to other NLP libraries. Check out spaCy 101 here.
VOSviewer is a software tool for constructing and visualizing bibliometric networks that may include journals, researchers, or individual publications, and they can be constructed based on citation, bibliographic coupling, co-citation, or co-authorship relations. It also offers text mining functionality that can be used to construct and visualize co-occurrence networks of important terms extracted from a body of scientific literature.
CiteSpace is a free Java application for visualizing and analyzing trends and patterns in the scientific literature. Note that you need to have a Java Runtime installed on your pc to run this application.
Publish or Perish is used for citation analysis, including h-index, average citations per year, etc. Click here to view the various data sources that it retrieves from to conduct the analysis.
Bibliometrix is one of the most popular R package for bibliometrics and scientometrics analysis such as H-index, citation analysis, co-citation analysis, etc. To run this, you need to install R and Rstudio.
Tableau Public is a free platform to publicly share and explore data visualizations online. Use this tool to create interactive graphs, maps, and live dashboards with no coding required. Connect to data in a variety of formats like Excel, CSV, and Google Sheets.
Power BI is a cloud-based business analytics tools by Microsoft, but the desktop version is also available for download and free for individual use. It allows you to create interactive dashboards for your data and analysis.