Applied Natural Language Processing – Many Classes, Noisy Data And Imbalanced Training Sets

Gerhard Hausmann

The classification of objects on images with differentiation of many classes is one of the great successes of digital image processing: One of the tasks in ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is to distinguish 1,000 classes. Since 2015 machines achieve better results than humans.

Is this also possible in NLP? The scale of fees for doctors in Germany distinguishes more than 2,800 different fee codes for classification and billing medical services. Barmenia uses a classifier based on Deep Learning, which can assign fee codes to short texts in doctor’s bills. The classifier is constructed for the 1,800 most frequently used fee codes, achieving an accuracy of 97 per cent.

Objective of the talk

  • Use of deep and very deep Convolutional Neural Networks for text classification
  • Adaptation of CNN for short and faulty texts from optical character recognition
  • Construction of a classifier for unbalanced training data

Required audience experience

Basic knowledge on Convolutional Neural Networks

Track 2
Location: Date: October 1, 2019 Time: 10:55 am - 11:40 am Gerhard Hausmann Gerhard Hausmann, Barmenia Krankenversicherung