Laboratorium IV: Text processing 23-KODU-LABPT4
Topics:
Elements of the text processing pipeline
Basic tools for text processing (e.g. Python - NLTK, Java OpenNLP)
Corpus creation with simple binary categories (e.g. positive and negative opinions, spam and not-spam e-mails)
Classifier training for binary categories (elements of machine learning)
Classifier evaluation (F-1 score)
Corpora for discourse analysis – data collection and annotation schemes (e.g. Rhetorical Structure Theory, Argument Interchange Format)
Annotation tools for discourse analysis
Collecting and annotating corpora for discourse analysis
Classifier training for complex discoursive properties
Classifier evaluation methods for multi-class discourse corpora
Cele kształcenia
Kierunek studiów
Metody prowadzenia zajęć umożliwiające osiągnięcie założonych EK
Nakład pracy studenta (punkty ECTS)
Poziom przedmiotu
Wymagania wstępne w zakresie wiedzy, umiejętności oraz kompetencji
Koordynatorzy przedmiotu
Efekty kształcenia
After passing the module and EU verification, a student:
Has familiarity with processing stages of NLP
Can create a text corpus using correct methodology for sampling and annotation
Can perform manual corpus annotation and calculate Inter Annotator Agreement
Can write simple computer programme for annotated corpora analysis
Uses available literature and other resources for further development of skills and knowledge
Has familiarity with fundamental concepts of computational linguistics and can use them in a written text
Has ability to organize information and to draw conclusions
Kryteria oceniania
Project 1 - Text classifier: 20 points
1-2 pages report;
Accompanying code
Accompanying corpus
Project 2 - Discourse processing: 20 points
3 – 5 pages report;
Accompanying code
Accompanying corpus
In-class activity: 20 points
Programming tasks
In-class discussion
Case studies
Max: 60 points
Scale:
bardzo dobry (bdb; 5,0): 55 - 60 points
dobry plus (+db; 4,5): 50 - 54 points
dobry (db; 4,0): 45 - 49 points
dostateczny plus (+dst; 3,5): 40 - 44 points
dostateczny (dst; 3,0): 35 - 39 points
niedostateczny (ndst; 2,0): 0 - 34 points.
Participation in at least 80% of the classes is required.
Literatura
Ingersoll, Grant S., et al. (2013). Taming text: how to find, organize, and manipulate it. Manning Publications Co.
Stede, Manfred. (2012). Discourse processing. Morgan & Claypool Publishers.
Janier, Mathilde, and Patrick Saint-Dizier (2019). Argument Mining: Linguistic Foundations. John Wiley & Sons.
Więcej informacji
Dodatkowe informacje (np. o kalendarzu rejestracji, prowadzących zajęcia, lokalizacji i terminach zajęć) mogą być dostępne w serwisie USOSweb: