Data gathering, manipulation, visualizations and analysis will be carried out in R, a programming language and free software environment for statistical computing and graphics. The R software will be introduced and used comprehensively during the course. We will use online teaching resources from Datacamp (www.datacamp.com) for free. Especially some of the modules from the career track Data Scientist with R. In addition we will work on different cases, each case represents a new "data adventure," analyzing real datasets, exploring different questions and trying out different computational tools.
Students who have successfully completed the course should have achieved the following learning outcomes:
Knowledge and comprehension
Knowledge of the process of Data Science. Understanding of different computational tools that can be used to gather, visualize and analyze data. Students will learn to understand and explore conceptual challenges of inferential reasoning with data.
Be able to identify interesting Data Science opportunities, questions and data sources.
Be able to write code that extracts data from various relevant sources.
Be able to write code that manipulates and transforms data.
Be able to write code that visualize data.
Be able to write code that model relationships in data.
Be able to use code that produces insight from data, and the principles behind reproducible code/projects.
The student should be able to develop competence that adds value to data in the following five fields of Data Science:
1. data collection ¿ data wrangling and cleaning to get data suitable for analysis.
2. data management ¿ manipulating data consistently.
3. exploratory data analysis ¿ generating hypotheses and building intuition from data.
4. prediction or statistical learning from data.
5. communication ¿ present the extraction of knowledge and insights from data.
Assessment will be based on a portfolio of obligatory assignments, and a portfolio project. The portfolio project can be submitted as a part of a group. The portfolio project should showcase the ability to ask an interesting scientific or business relevant question, to gather and clean relevant data, to apply some meaningful analytical model, and to showcase or visualize the results in an engaging, digestible manner.
A graded scale of five marks from A to E for pass and F for fail. Only one overall grade is given for the course. There will not be a re-sit exam for this course.
Textbook: R for Data Science. Hadley Wickham & Garrett Grolemund. Available online at: http://r4ds.had.co.nz/
An additional reading list will be published at the beginning of the semester in UiTs LMS.