Year
2021Credit points
10Campus offering
No unit offerings are currently available for this unit.Prerequisites
Nil
Unit rationale, description and aim
Data is deemed as the world’s ‘new oil’ while data science is a new inter-disciplinary science of data that employs scientific methods, algorithms, tools and systems for uncovering insights, knowledge and value from massive data generated in different domains. Python, a general-purpose programming language, has gradually become the ‘engine’ of data and data science. In particular, many data scientists use Python because it provides an efficient coding experience and a wealth of data science tools and libraries.
This unit will first cover basic concepts and tools of Python and data science and fundamental elements of Python programming language including program constructs, data structures, functions and files. It will then delve into various Python data science libraries for data processing, analysis and visualisation. Data ethics and examples of exploratory data analysis in data science will be introduced at the end. The aim of the unit is to provide students with the knowledge and skills of Python that are needed to build data science solutions which potentially benefit human lives and create common good.
Learning outcomes
To successfully complete this unit you will be able to demonstrate you have achieved the learning outcomes (LO) detailed in the below table.
Each outcome is informed by a number of graduate capabilities (GC) to ensure your work in this, and every unit, is part of a larger goal of graduating from ACU with the attributes of insight, empathy, imagination and impact.
Explore the graduate capabilities.
On successful completion of this unit, students should be able to:
LO1 - Demonstrate an understanding of fundamental Python programming language and data science concepts and tools (GA5, GA8)
LO2 - Experiment with common Python data science libraries and tools for data collection, cleaning, and wrangling (GA5, GA10)
LO3 - Appraise the use of data processing, analysis and visualisation techniques and tools to solve real-world data science problems such as in digital health area (GA4, GA5)
LO4 - Examine data science ethical issues as they impact on human dignity and privacy (GA3, GA5)
Graduate attributes
GA3 - apply ethical perspectives in informed decision making
GA4 - think critically and reflectively
GA5 - demonstrate values, knowledge, skills and attitudes appropriate to the discipline and/or profession
GA8 - locate, organise, analyse, synthesise and evaluate information
GA10 - utilise information and communication and other relevant technologies effectively.
Content
Topics will include:
- Data science and Python introduction
- Data science environment setup: Jupyter notebooks
- Python language syntax, semantics and scalar types
- Python language control flow and basic data structures and sequences
- Python language functions and files
- Python data science packages Numpy and Pandas
- Data processing on data loading, storage, and file formats
- Data processing on data cleaning and preparation
- Data processing on data wrangling: join, combine, and reshape
- Data processing on data aggregation and group operations
- Data plotting and visualisation
- Data exploratory analysis and real-world examples e.g. in digital health
- Data ethics and potential adverse impacts
Learning and teaching strategy and rationale
Multi-mode
This unit will be delivered in a multimode over a twelve-week semester or equivalent study period. Students will have access to all primary learning materials online through LEO, along with formative and summative assessments, all of which will be available online, to provide a learning experience beyond the classroom. While there are no formal classroom lectures for this unit, students will be required to attend weekly two-hour workshops and fortnightly two-hour lab sessions for the achievement of the unit learning outcomes. Workshops facilitate learning by theory comprehension and problem solving while lab sessions focus on hands-on practices, which in combination is particularly effective for learning information technology skills.
Students should anticipate undertaking 150 hours of study for this unit, including class attendance, readings, online forum participation and assessment preparation.
Online mode
This unit utilises an active learning approach whereby students will engage in e-module activities, readings and reflections, and opportunities to collaborate with peers in an online environment. This can involve, but is not limited to, online workshops, online discussion forums, chat rooms, guided reading, and webinars. To deliver core content, pre-recorded lectures will be incorporated within the online learning environment and e-modules. In addition, electronic readings will be provided to guide students’ reading and extend other aspects of online learning.
Students should anticipate undertaking 150 hours of study for this unit, including online workshops, workshop webinars, lab tasks preparation and webinars, readings, online forum participation and assessment preparation.
Assessment strategy and rationale
A range of assessment procedures will be used to meet the unit learning outcomes and develop graduate attributes consistent with University assessment requirements. The first assessment consists of small to medium sized programming tasks associated with Python environment setup, Python programming syntax, Python data science packages, and data processing and analysis. The purpose is to assess students’ fundamental content knowledge of Python programming skills for data science. The second assessment is data preparation tasks using Python data science ecosystem/libraries. The purpose is to assess students’ use of Python data science libraries NumPy and Pandas and other related tools to load and explore data. The final assessment is a project assignment involving steps of data processing, analysis and visualisation for real-world datasets e.g. in digital health. The purpose is to assess students’ knowledge and skills of applying Python programming and data science packages to do data processing and exploration with consideration of data ethics. There are fortnightly lab sessions associated with the assessments including assessable lab participation/engagement.
The assessments for this unit are designed to demonstrate the achievement of each learning outcome. To pass this unit, students are required to:
- attempt all three assessment items
- obtain an overall mark of at least 50%
Overview of assessments
Brief Description of Kind and Purpose of Assessment Tasks | Weighting | Learning Outcomes | Graduate Attributes |
---|---|---|---|
Assessment 1: Programming tasks The first assessment item consists of Python environment setup and solving simple Python programming and data science problems. The assessment requires students to demonstrate their understanding and use of fundamental Python programming skills Submission Type: Individual Assessment Method: Content knowledge coding tasks Artefact: Code | 30% | LO1, LO2 | GA5, GA8 |
Assessment 2: Data preparation tasks The second assessment item is a data preparation practical using key Python data science ecosystem/libraries. The assessment requires students to use libraries NumPy and Pandas and other related tools for collecting, cleaning and wrangling various types of data. Submission Type: Individual Assessment Method: Conceptual knowledge coding tasks Artefact: Code | 30% | LO2 | GA5, GA10 |
Assessment 3: Data processing and exploration project The final assessment is a small individual project involving data processing, analysis and visualisation for real-world datasets e.g. in digital health. The project requires students to apply Python data science skills and techniques in data processing and exploration to solve problems on real-world datasets with consideration of data ethics. Submission Type: Individual Assessment Method: Projects of applying skills Artefact: Code and Report | 40% | LO3, LO4 | GA3, GA4, GA5 |
Representative texts and references
Wes McKinney, 2018. Python for Data Analysis, 2nd Edition O'Reilly Media, Inc.
Joel Grus, 2019. Data Science from Scratch, 2nd Edition, O'Reilly Media, Inc.
Eric Matthes, 2019. Python Crash Course: A Hands-On, Project-Based Introduction to Programming, 2nd Edition No Starch Press.
Luca Massaron and John Paul Mueller, 2019. Python for Data Science, 2nd Edition, For Dummies.
Peter Bruce et al, 2020. Practical Statistics for Data Scientists, 2nd Edition O'Reilly Media, Inc.
Allen B. Downey, 2015. Think Stats, 2nd Edition, O'Reilly Media, Inc.
Kubben et al (Eds), 2019. Fundamentals of Clinical Data Science, Springer – open access freely available from https://www.springer.com/gp/book/9783319997124
Celi et al (Eds), 2020. Leveraging Data Sciences for Global Health, Springer – open access freely available from https://link.springer.com/book/10.1007%2F978-3-030-47994-7
E.J.S. Hovenga, M. R Kidd, S. Garde and Hullin (Eds), 2010. Health informatics, an overview. Stud Health Technol Inform.