logo_artelab logo_insubria

Applied Recognition Technology Laboratory

Department of Theoretical and Applied Science



This is a support page for the paper Semantic Text Encoding for Text Classification using Convolutional Neural Networks, where we encode semantics of a text document in an image to take advantage of the same Convolutional Neural Networks (CNNs) that have been successfully employed to image classification. We use Word2Vec, which is an estimation of word representation in a vector space that can maintain the semantic and syntactic relationships among words. Word2Vec vectors are transformed into graphical words representing sequence of words in the text document. The encoded images are classified by using the AlexNet architecture. We introduced a new dataset named Text-Ferramenta gathered from an Italian price comparison website and we evaluated the encoding scheme through this dataset along with two publicly available datasets i.e. 20news-bydate and StackOverflow. Our scheme outperforms the text classification approach based on Doc2Vec and Support Vector Machine (SVM) when all the words of a text document can be completely encoded in an image. We believe that the results on these datasets are an interesting starting point for many Natural Language Processing works based on CNNs, such as a multimodal approach that could use a single CNN to classify both image and text information.

Please cite the paper Semantic Text Encoding for Text Classification using Convolutional Neural Networks if you use the ste2img.py source code.
Authors: Ignazio Gallo, Shah Nawaz and Alessandro Calefati
Source code: ste2img.py
Word2Vec features: 12 features, 24 features
Last updated: September 27 2017 16:22:18.
ste2img Model


Parameters of the ste2img algorithm:
  • word2vec encoding size:
  • superpixel size:
  • separator size:
  • visualword width:
  • image size:
Use one of the following text document samples (from 20news-bydate dataset) clicking on the thumbnail image, to see the encoding image of the txt file:

txt file

txt file

txt file

txt file