Information Technology

ICT205 Data Analytics

23 April 2023 14:49 PM | UPDATED 12 months ago

ICT205 Data Analytics :

ICT205 Data Analytics
ICT205 Data Analytics

ICT205 Data Analytics

ICT205 Data Analytics

Overview

A data analytics project starts with collecting the data and ends with communicating the results from the data. In between, there are multiple steps that are required to be followed- data preprocessing is one of the most important steps among them. The data preprocessing step itself has multiple steps depending on the nature, type, value etc. of the data.

On the other hand, data visualisation uses visual representations to explore, make sense of, and communicate data that often includes charts, graphs, illustrations etc. Today, there is a move towards visualisation that can be observed among many big companies.

Timelines and Expectations

Students are expected to work individually to prepare a report that details the use and applications of data preprocessing and data visualisation techniques on a selected data set. The aim of this assessment is to enable students to create a report that evaluates the use of data preprocessing and data visualisation techniques applied to a given case. And to complete the analysis or model building to solve the problem/question based on a business requirement.

Question 1: Students are required to select a data set for classification tasks and answer the following questions: (Marks: 15)

  • What is the purpose of the data set, and what kind of insights can be extracted from the chosen data set?
  • Have you applied any data cleaning approaches (e.g., missing value handling, noisy data handling) for the chosen data set? Explain in your own words what data cleaning approaches you have perform or why it was not required.
  • Have you applied any data transformation techniques (normalisation, attribute creation, discretisation etc.) for the chosen data set? What data transformation techniques you have performed or why it was not required to perform any transformation? Explain in your own words.
  • Have you applied any data reduction techniques (reduce dimension, reduce volume, balance data) ?If yes, then describe the data transformation technique(s) you have followed; otherwise, explain why no transformation techniques were not required.
  • Determine and justify the appropriate data mining task and method for the selected data set.
  • Build and evaluate multiple models for the selected data set.
  • Design an interactive dashboard using 3-4 charts/graphs/illustrations to represent the data.

Question 2: Students are required to select a data set for regression tasks and define a question based on business requirement. This should include: (i) selection of dataset; (ii) exploring, summarizing and preparing the data; (iii) defining the problem and requirements; (iv) defining an experiment setup; (v) implementing your approach; and (vi) evaluating and analyzing approach. (Marks: 15)

  • Problem: Describe the problem and highlight the business need.
  • Approach: Describe your approach It should focus on e.g., learning techniques, features, model tuning, parameter selection and analysis e.g., how the analysis will answer your questions
  • Results: Summarize results and critically analyze results e.g. limitations of data, setup or approach, characteristic errors, possible improvements.
  • Conclusion: Conclude with what you have learned from this study which would improve yourself as a data analyst. Would you recommend this as a solution to your problem? Provide reasons.
Chart  Description automatically generated with medium confidence

Question 3: Suppose that you have built a classifier that can identify whether an email is spam or not spam. After applying the classifier to the training data, you get the following confusion matrix. (Marks: 10)

  • Calculate the accuracy, true positive rate, true negative rate, precision, and recall. (3 marks)
  • Based on the accuracy value, do you think the classifier is doing a good job identifying spam
  • emails? Justify your answer. (4 marks)
  • What is the class imbalance problem? How it is affecting the accuracy for the given scenario. (3 marks)

Individual Report (40 marks)

Individual Report Due (10th February 2022 Week 12 Friday 11:59pm) Expected word count 3,000 words minimum.

Students are expected to submit their assessments via Turnitin on Moodle. Minimum time expectation: 30 hrs

Learning Outcomes Assessed

The following course learning outcomes are assessed by completing this assessment task: LO1. review and differentiate between the methods of data analysis and presentation;

LO2. analyse internal and external sources of data relevant to business environments including

technology and service utilisation data to identify relationships and trends;

LO3. develop and apply skills in spreadsheets to sort, manage, summarise and display data to support managerial decision-making;

Assessment Details

For this assignment, students are required to write 2,500 words report on a specific case study and explain the use and applications of data preprocessing and data visualisation techniques on a selected data set (i.e., one dataset for classification task to answer Question #1 and another dataset for regression tasks to answer Question #2). Students can choose any suitable data set that is publicly available on the internet or from here https://archive.ics.uci.edu/ml/datasets.php.

For answering Question #3, students are not required to use any dataset.

In week 12, students will be required to submit their report on moodle. Students are expected to work individually and undergo their own research without collaboration with any other student. Students are expected to prepare a comprehensive report on the application of their knowledge of data preprocessing and visualisation on a given case study.

  1. All reports must include at least 5 academic references which must be done using APA7 reference style.
  2. The case study must assess the value propositions of the chosen data set and discuss what types of business questions can be answered using the data set. It must highlight the suitability of data cleaning approaches for the selected data set. It must highlight the data transformation techniques that are applicable to the data set. Students must also highlight how an interactive dashboard can be designed for the chosen data set to communicate the data effectively.
  3. This unit requires you to use APA system of referencing. See Sydney International’s quick reference guide. It should be used in conjunction with the online tool Academic Writer: https://extras.apa.org/apastyle/basics-7e/#/.
  4. A passing grade will be awarded to assignments adequately addressing all assessment criteria. Higher grades require better quality and more effort. For example, a minimum is set on the wider reading required. A student reading vastly more than this minimum will be better prepared to discuss the issues in depth and consequently their report is likely to be of a higher quality. So before submitting, please read through the assessment criteria very carefully.

Submission

All assessments must be submitted through Turnitin on Moodle.

Marking Criteria / Rubric

Refer to the attached marking guide.

Feedback

Feedback will be supplied through Moodle. Authoritative results will be published on Moodle.

Academic Misconduct

To submit your assessment task, you must indicate that you have read and understood, and comply with, the Sydney International School of Technology and Commerce Academic Integrity and Student Plagiarism policies and procedures.

You must also agree that your work has not been outsourced and is entirely your own except where work quoted is duly acknowledged. Additionally, you must agree that your work has not been submitted for assessment in any other course or program.

Individual report sample structure

Please follow the below sample for question # 1 and question #2. To answer question #3, please write the answer at the end and show your working with formulae to get the full marks.

  • Coversheet (mandatory)
    • Title page
    • Table of content
  • Introduction
  • Overview of the data
  • Data Preprocessing
    • Data Cleaning
    • Data Transformation
    • Data Reduction
  • Dashboard Design
  • Conclusions
  • References
  • Appendix

Note: Students are allowed in include other sections as they deem necessary based on their case study.

Sample data set for case study:

Absenteeism at work Data Set
Bank Marketing Data Set
Iranian Churn Dataset Data Set
Productivity Prediction of Garment Employees Data Set
Real estate valuation data set Data Set
Apartment for rent classified Data Set
Chronic_Kidney_Disease Data Set
ICT205 Data Analytics

Individual Report Marking Guide – Marks 40

Weighting: 40 Student IDs: Assessment Criteria

ScoreVery GoodGoodSatisfactoryUnsatisfactory
PresentationInformation is wellInformation isInformation is somewhatInformation is somewhat
/Layoutorganised, well written,organised, well written,organised, properorganised, but proper
 and proper grammarwith proper grammargrammar andgrammar and
 and punctuation areand punctuation.punctuation mostlypunctuation not always
 used throughout.Correct layout used.used. Correct layoutused. Some elements of
/05 marksCorrect layout used. used.layout incorrect.
StructureStructure guidelinesStructure guidelinesStructure guidelinesSome elements of
  /05 marksEnhancedfollowed exactlymostly followed.structure omitted
IntroductionIntroduces the topic ofIntroduces the topic ofSatisfactorily introducesIntroduces the topic of
 the report in anthe report in anthe topic of the report.the report, but omits a
 extremely engagingengaging manner whichGives a generalgeneral background of
 manner which arousesarouses the reader’sbackground.the topic and/or the
 the reader’s interest.interest.Indicates the overalloverall “plan” of the
 Gives a detailed generalGives some general“plan” of the paper.paper.
 background andbackground and  
 indicates the overallindicates the overall  
/05 marks“plan” of the paper.“plan” of the paper.  
DetailsAll topics are discussed inConsistently detailedA topic has beenInadequate discussion
 Depth coherently.discussion. Displaysadequately discussed.of issues Little/no
 Significant evidence ofsound understandingDisplays somedemonstrated
 Critical analysis andwith some analysis ofunderstanding andunderstanding or
 Reflection.Topics.analysis of issues.analysis of most issues
    and/or some irrelevant
/20 marks   information.
Summary & ConclusionAn interesting, wellA good summary of theSatisfactory summary ofPoor/no summary of the
 written summary of themain points.the main points.main points.
 main points.A good final commentA final comment on theA poor final comment on
 An excellent finalon the subject, basedsubject, but introducedthe subject and/or new
 comment on theon the informationnew material.material introduced.
 subject, based on theprovided.  
/03 marksinformation provided.   
ReferencingCorrect referencingMostly correctMostly correctNot all material correctly
 (APA7 Style). All quotedreferencing (APA7 Style). Allreferencing (APA7 Style)acknowledged.
 material in quotes andquoted material inSome problems withSome problems with the
 acknowledged. AllQuotes & acknowledged.quoted material andreference list.
 paraphrased materialAll paraphrased materialparaphrased material 
 acknowledged.acknowledged.Some problems with the 
 Correctly set outMostly correct settingreference list. 
/02 marksreference list.out reference list.  
SubTotal-/40 marks    
  Total out of 40    
ICT205 Data Analytics

Visit:https://auspali.info/

Also visit:https://www.notesnepal.com/archives/767