Tutorial Programme AIME'09 Tutorial will be held on Sunday 19 July in the Department of Computer Science of the University of Verona. The Department is located in the quarter named Borgo Roma, situated in the southern part of the city. See the venue page for further information on reaching it. This full-day (from 9:00 to 17:00) tutorial will illustrate, via demonstration and hands-on experience, the application of data mining methodologies to a clinical database. A knowledge discovery life cycle model will be employed as the conceptual framework for the tutorial. Attendees will obtain practical experience in mining a database for use in clinical research, and ultimately for assisting with statistical analysis. We will focus on several well-known datasets for exploration in the tutorial, and we will learn and use the Weka data mining suite. Weka is freely available in the public domain, and runs on even modestly equipped computers within a Java runtime environment (JRE). Weka and the JRE will be distributed to attendees on CD-ROM free of charge. Attendees will be encouraged to bring laptops to the tutorial. Those who do not bring laptops will benefit from the detailed demonstrations in the tutorial. We will focus on several families of data mining methodologies, including trees, clustering, Bayesian classification, evolutionary computation, visualization, and statistical classifiers. After a discussion of the general characteristics of biomedical data, such as missing values and feature selection problems, and methods for preparing biomedical data for mining, we will introduce examine examples of the selected families of tools for mining biomedical data, including thorough algorithmic descriptions, functional examples, and live demonstrations of each on several real-world biomedical datasets. The applications will focus specifically on rule discovery, emergence of clinical prediction rules, classification, and clustering, as appropriate to each method. The advantages and disadvantages of each method will be discussed in detail. The tutorial will also include a rigorous discussion of methods for evaluating the results obtained from mining biomedical data, including classification and prediction accuracy and test characteristics such as sensitivity, specificity, area under the receiver operating characteristic curve, and predictive values, the choice and use of suitable validation datasets, methods for comparing models, and the use of human expert panels in providing content for qualitative model validation.
|