Skip to content

Module Catalogue

Breadcrumbs navigation

ID5059   Knowledge Discovery and Datamining

Academic year(s): 2023-2024

Key information

SCOTCAT credits : 15

ECTS credits : 7

Level : SCQF level 11

Semester: 2

Availability restrictions: Not automatically available to General Degree students

Planned timetable: 11.00 am Mon (odd weeks), Wed and Fri

Contemporary data collection can be automated and on a massive scale e.g. credit card transaction databases. Large databases potentially carry a wealth of important information that could inform business strategy, identify criminal activities, characterise network faults etc. These large scale problems may preclude the standard carefully constructed statistical models, necessitating highly automated approaches. This module covers many of the methods found under the banner of Datamining, building from a theoretical perspective but ultimately teaching practical application. Topics covered include: historical/philosophical perspectives, model selection algorithms and optimality measures, tree methods, bagging and boosting, neural nets, and classification in general. Practical applications build sought-after skills in programming (typically R, SAS or python).

Relationship to other modules

Anti-requisite(s): You cannot take this module if you take CS5014

Learning and teaching methods and delivery

Weekly contact: Lectures, seminars, tutorials and practical classes.

Scheduled learning hours: 35

Guided independent study hours: 115

Assessment pattern

As used by St Andrews: 2-hour Written Examination = 60%, Coursework = 40%

As defined by QAA
Written examinations : 60%
Practical examinations : 0%
Coursework: 40%

Re-assessment: Oral examination = 60%, Existing Coursework = 40%

Personnel

Module coordinator: Dr C M Fell
Module teaching staff: Dr C Fell

Intended learning outcomes

  • Understand the mathematics underpinning common machine-learning/data-mining methods, including parameter estimation
  • Determine what models are applicable for different data and objectives
  • Understand complex regressions from the perspective of basis functions, tree methods, boosting/bagging/ensemble model variants, neural networks, deep-learning, and other selected method
  • Conduct hyperparameter-tuning/model-selection as appropriate to the model
  • Manipulate data, fit models, and summarise/display their results/performance and objectively compare models in R, Python or other suitable language
  • Conduct comprehensive analysis of large real-world data, within a group, covering: data preparation; model fitting, critique & refinement; and presentation of results to a range of audiences