Based on a publicly available standard data set, we will give a short overview of traditional methods of text analysis, focusing on supervised classifcation and unsupervised methods like topic modelling. Then we will turn to more sophisticated embedding methods and compare word2vec, GloVe, fastText, ELMo and BERT (and maybe more which are invented during 2019) with their specific strengths and weaknesses. We evaluate these in terms of intended use, computing time requirements and result quality.
For all program examples we use Python and open source software.
Text analytics is a very dynamic field in machine learning right now. Several new methods are introduced each year. The talk aims to offer best practices and show possible use cases for the new methods compared to more traditional approaches.
Basic knowledge in Python and machine learning are helpful. Sophisticated concepts will be used, but are introduced first.
You can view Christian’s slides below: