Czech version Czech version

Data Analysis I (MADI)

Fall 2022

Course description:

The course provides basic information about methods used for data mining and network analysis. Students will gain knowledge and skills necessary for further development in this area and the ability to apply them to simple problems. They will be able to assess the applicability of methods for different types of data and evaluate the outcomes of the application of the used methods.


Grading (Attendance at lectures and seminars is compulsory, as well as preparation for the seminars):


Lectures an Seminars (labs):


References and sources:

  • Tools for network analysis and visualization
  • Pajek - Program for Large Network Analysis, Pajek
  • NodeXL - Tempalte for Excel, NodeXL
  • SNAP - Stanford Network Analysis Project, SNAP
  • Gephi , Graphviz etc.
  • Visual Complexity
  • D3.js - JavaScript library for manipulating documents based on data, D3.js
 

Course Outline:

  1. Data for data mining, types and sources of data
  2. Attributes and their types, sparse data, incomplete and inaccurate data
  3. Algebraic and geometric interpretation of data
  4. Probabilistic interpretation of data
  5. Numerical and categorial attributes, the basic analytical approaches
  6. Data mining, pre-processing and data cleaning
  7. Data representation
  8. Foundations of data analysis (classification, clustering)
  9. Networks and their properties
  10. Types of networks and their representation
  11. Basic measures and metrics
  12. Structure and global properties of networks
  13. Basic network models