ICT205 Data Analytics :
ICT205 Data Analytics
A data analytics project starts with collecting the data and ends with communicating the results from the data. In between, there are multiple steps that are required to be followed- data preprocessing is one of the most important steps among them. The data preprocessing step itself has multiple steps depending on the nature, type, value etc. of the data.
On the other hand, data visualisation uses visual representations to explore, make sense of, and communicate data that often includes charts, graphs, illustrations etc. Today, there is a move towards visualisation that can be observed among many big companies.
Students are expected to work individually to prepare a report that details the use and applications of data preprocessing and data visualisation techniques on a selected data set. The aim of this assessment is to enable students to create a report that evaluates the use of data preprocessing and data visualisation techniques applied to a given case. And to complete the analysis or model building to solve the problem/question based on a business requirement.
Question 1: Students are required to select a data set for classification tasks and answer the following questions: (Marks: 15)
- What is the purpose of the data set, and what kind of insights can be extracted from the chosen data set?
- Have you applied any data cleaning approaches (e.g., missing value handling, noisy data handling) for the chosen data set? Explain in your own words what data cleaning approaches you have perform or why it was not required.
- Have you applied any data transformation techniques (normalisation, attribute creation, discretisation etc.) for the chosen data set? What data transformation techniques you have performed or why it was not required to perform any transformation? Explain in your own words.
- Have you applied any data reduction techniques (reduce dimension, reduce volume, balance data) ?If yes, then describe the data transformation technique(s) you have followed; otherwise, explain why no transformation techniques were not required.
- Determine and justify the appropriate data mining task and method for the selected data set.
- Build and evaluate multiple models for the selected data set.
- Design an interactive dashboard using 3-4 charts/graphs/illustrations to represent the data.
Question 2: Students are required to select a data set for regression tasks and define a question based on business requirement. This should include: (i) selection of dataset; (ii) exploring, summarizing and preparing the data; (iii) defining the problem and requirements; (iv) defining an experiment setup; (v) implementing your approach; and (vi) evaluating and analyzing approach. (Marks: 15)
- Problem: Describe the problem and highlight the business need.
- Approach: Describe your approach It should focus on e.g., learning techniques, features, model tuning, parameter selection and analysis e.g., how the analysis will answer your questions
- Results: Summarize results and critically analyze results e.g. limitations of data, setup or approach, characteristic errors, possible improvements.
- Conclusion: Conclude with what you have learned from this study which would improve yourself as a data analyst. Would you recommend this as a solution to your problem? Provide reasons.
Question 3: Suppose that you have built a classifier that can identify whether an email is spam or not spam. After applying the classifier to the training data, you get the following confusion matrix. (Marks: 10)
- Calculate the accuracy, true positive rate, true negative rate, precision, and recall. (3 marks)
- Based on the accuracy value, do you think the classifier is doing a good job identifying spam
- emails? Justify your answer. (4 marks)
- What is the class imbalance problem? How it is affecting the accuracy for the given scenario. (3 marks)
Individual Report (40 marks)
Individual Report Due (10th February 2022 Week 12 Friday 11:59pm) Expected word count 3,000 words minimum.
Students are expected to submit their assessments via Turnitin on Moodle. Minimum time expectation: 30 hrs
The following course learning outcomes are assessed by completing this assessment task: LO1. review and differentiate between the methods of data analysis and presentation;
LO2. analyse internal and external sources of data relevant to business environments including
technology and service utilisation data to identify relationships and trends;
LO3. develop and apply skills in spreadsheets to sort, manage, summarise and display data to support managerial decision-making;
For this assignment, students are required to write 2,500 words report on a specific case study and explain the use and applications of data preprocessing and data visualisation techniques on a selected data set (i.e., one dataset for classification task to answer Question #1 and another dataset for regression tasks to answer Question #2). Students can choose any suitable data set that is publicly available on the internet or from here https://archive.ics.uci.edu/ml/datasets.php.
For answering Question #3, students are not required to use any dataset.
In week 12, students will be required to submit their report on moodle. Students are expected to work individually and undergo their own research without collaboration with any other student. Students are expected to prepare a comprehensive report on the application of their knowledge of data preprocessing and visualisation on a given case study.
- All reports must include at least 5 academic references which must be done using APA7 reference style.
- The case study must assess the value propositions of the chosen data set and discuss what types of business questions can be answered using the data set. It must highlight the suitability of data cleaning approaches for the selected data set. It must highlight the data transformation techniques that are applicable to the data set. Students must also highlight how an interactive dashboard can be designed for the chosen data set to communicate the data effectively.
- This unit requires you to use APA system of referencing. See Sydney International’s quick reference guide. It should be used in conjunction with the online tool Academic Writer: https://extras.apa.org/apastyle/basics-7e/#/.
- A passing grade will be awarded to assignments adequately addressing all assessment criteria. Higher grades require better quality and more effort. For example, a minimum is set on the wider reading required. A student reading vastly more than this minimum will be better prepared to discuss the issues in depth and consequently their report is likely to be of a higher quality. So before submitting, please read through the assessment criteria very carefully.
All assessments must be submitted through Turnitin on Moodle.
Refer to the attached marking guide.
Feedback will be supplied through Moodle. Authoritative results will be published on Moodle.
To submit your assessment task, you must indicate that you have read and understood, and comply with, the Sydney International School of Technology and Commerce Academic Integrity and Student Plagiarism policies and procedures.
You must also agree that your work has not been outsourced and is entirely your own except where work quoted is duly acknowledged. Additionally, you must agree that your work has not been submitted for assessment in any other course or program.
Please follow the below sample for question # 1 and question #2. To answer question #3, please write the answer at the end and show your working with formulae to get the full marks.
- Coversheet (mandatory)
- Title page
- Table of content
- Overview of the data
- Data Preprocessing
- Data Cleaning
- Data Transformation
- Data Reduction
- Dashboard Design
Note: Students are allowed in include other sections as they deem necessary based on their case study.
|Absenteeism at work Data Set|
|Bank Marketing Data Set|
|Iranian Churn Dataset Data Set|
|Productivity Prediction of Garment Employees Data Set|
|Real estate valuation data set Data Set|
|Apartment for rent classified Data Set|
|Chronic_Kidney_Disease Data Set|
Individual Report Marking Guide – Marks 40
Weighting: 40 Student IDs: Assessment Criteria
|Presentation||Information is well||Information is||Information is somewhat||Information is somewhat|
|/Layout||organised, well written,||organised, well written,||organised, proper||organised, but proper|
|and proper grammar||with proper grammar||grammar and||grammar and|
|and punctuation are||and punctuation.||punctuation mostly||punctuation not always|
|used throughout.||Correct layout used.||used. Correct layout||used. Some elements of|
|/05 marks||Correct layout used.||used.||layout incorrect.|
|Structure||Structure guidelines||Structure guidelines||Structure guidelines||Some elements of|
|/05 marks||Enhanced||followed exactly||mostly followed.||structure omitted|
|Introduction||Introduces the topic of||Introduces the topic of||Satisfactorily introduces||Introduces the topic of|
|the report in an||the report in an||the topic of the report.||the report, but omits a|
|extremely engaging||engaging manner which||Gives a general||general background of|
|manner which arouses||arouses the reader’s||background.||the topic and/or the|
|the reader’s interest.||interest.||Indicates the overall||overall “plan” of the|
|Gives a detailed general||Gives some general||“plan” of the paper.||paper.|
|background and||background and|
|indicates the overall||indicates the overall|
|/05 marks||“plan” of the paper.||“plan” of the paper.|
|Details||All topics are discussed in||Consistently detailed||A topic has been||Inadequate discussion|
|Depth coherently.||discussion. Displays||adequately discussed.||of issues Little/no|
|Significant evidence of||sound understanding||Displays some||demonstrated|
|Critical analysis and||with some analysis of||understanding and||understanding or|
|Reflection.||Topics.||analysis of issues.||analysis of most issues|
|and/or some irrelevant|
|Summary & Conclusion||An interesting, well||A good summary of the||Satisfactory summary of||Poor/no summary of the|
|written summary of the||main points.||the main points.||main points.|
|main points.||A good final comment||A final comment on the||A poor final comment on|
|An excellent final||on the subject, based||subject, but introduced||the subject and/or new|
|comment on the||on the information||new material.||material introduced.|
|subject, based on the||provided.|
|/03 marks||information provided.|
|Referencing||Correct referencing||Mostly correct||Mostly correct||Not all material correctly|
|(APA7 Style). All quoted||referencing (APA7 Style). All||referencing (APA7 Style)||acknowledged.|
|material in quotes and||quoted material in||Some problems with||Some problems with the|
|acknowledged. All||Quotes & acknowledged.||quoted material and||reference list.|
|paraphrased material||All paraphrased material||paraphrased material|
|acknowledged.||acknowledged.||Some problems with the|
|Correctly set out||Mostly correct setting||reference list.|
|/02 marks||reference list.||out reference list.|
|Total out of 40|