Czech version
Data Analysis I (MADI)
Fall 2022
Course description:
The course provides basic information about methods used for data mining and network analysis. Students will gain knowledge and skills necessary for further development in this area and the ability to apply them to simple problems. They will be able to assess the applicability of methods for different types of data and evaluate the outcomes of the application of the used methods.
Grading (Attendance at lectures and seminars is compulsory, as well as preparation for the seminars):
Lectures an Seminars (labs):
References and sources:
-
R language,
RStudio,
Intro1,
Intro2
-
Data for R
-
Weka
-
UC Irvine Machine Learning Repository -
data,
& next data for Weka
- Ian H. Witten, Eibe Frank , Mark A. Hall. Data Mining: Practical Machine Learning Tools and Techniques. 3rd Edition. The Morgan Kaufmann Series in Data Management Systems.
- Zaki, M. J., Meira Jr, W. (2014). Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press.
ZAKI dmafca.pdf
- Bramer, M. (2013). Principles of data mining. Springer.
- Albert-Laszlo Barabasi.
Network Science
- Mark Newman. Networks: An Introduction. Oxford University Press, 2010. ISBN 978-0199206650.
-
Tools for network analysis and visualization
- Pajek - Program for Large Network Analysis,
Pajek
- NodeXL - Tempalte for Excel,
NodeXL
- SNAP - Stanford Network Analysis Project,
SNAP
-
Gephi
, Graphviz etc.
-
Visual Complexity
- D3.js - JavaScript library for manipulating documents based on data,
D3.js
Course Outline:
- Data for data mining, types and sources of data
- Attributes and their types, sparse data, incomplete and inaccurate data
- Algebraic and geometric interpretation of data
- Probabilistic interpretation of data
- Numerical and categorial attributes, the basic analytical approaches
- Data mining, pre-processing and data cleaning
- Data representation
- Foundations of data analysis (classification, clustering)
- Networks and their properties
- Types of networks and their representation
- Basic measures and metrics
- Structure and global properties of networks
- Basic network models