RATIONALE

The evolution of social media texts – such as blogs, micro-blogs (e.g., Twitter), and chats (e.g., Facebook messages) – has created many new opportunities for information access and language technology, but also many new challenges, making it one of the prime present-day research areas. Automatic processing of these types of texts warrants new strategies, in particular since they often are very ‘noisy’, that is, they are characterized by having a high percentage of spelling errors and containing creative spellings (gr8 for ‘great’), word play (goooood for ‘good’), abbreviations (OMG for ‘Oh my God!’), Meta tags (URLs, Hashtags), and so forth. So far, most of the research on social media texts has concentrated on English, whereas most of these texts now are in non-English languages. In social media, non-English speakers do not always use Unicode to write in their own language, they use phonetic typing, frequently insert English elements (through code-mixing and Anglicisms. See the following example (1), and often mix multiple languages to express their thoughts, making automatic language processing of social media texts a very challenging task. Thus it is clear that even though English still is the principal language for web communication, there is a growing need to develop technologies for other languages. Here we will concentrate on social media text in Indian languages, a nation with more than 20 official languages. ICON is a well-established gathering for the industrial and academic research communities both internationally and in India. Therefore, we believe that it is the best place to bring research attention towards developing language technologies for Indian social media text. The workshop will hold an embedded tutorial on code-mixing in social media. The three primary goals of the proposed workshop are:

To focus community awareness on language technologies for Indian social media.
Sharing of corpora and resources to promote future research.
Exchange of ideas and experiences amongst researchers.

Example 1: ICON isbar Goa mein ho raha hai! Great chance to visit Goa!

TOPICS OF INTEREST BUT NOT LIMITED TO

We welcome original and unpublished submissions on all aspects of language technologies for Indian languages in the social media context. Topics of interest include, but are not limited to:

Part of Speech (POS) Tagging
Language Detection
Morphological Analysis
Name Entity Recognition (NER)
Dependency Parsing
Lexical Resources
Annotated corpora
Transliteration
Sentiment Analysis

REFERENCES

Yogarshi Vyas, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury. POS Tagging of English-Hindi Code-Mixed Social Media Content. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 974–979, October 25-29, 2014, Doha, Qatar. pdf
U. Barman, A. Das, J. Wagner, and J. Foster. Code-Mixing: A Challenge for Language Identification in the Language of Social Media. The 1^st Workshop on Computational Approaches to Code Switching, EMNLP 2014 , October, 2014, Doha, Qatar. pdf

Download the CFP

The Eleventh International Conference on Natural Language Processing (ICON-2014), 18-21 December, 2014, Goa, India

Might be also interested in: 2^nd FIRE2014 Shared Task on Transliterated Search, FIRE 14, December, Bangalore, India

RATIONALE

TOPICS OF INTEREST BUT NOT LIMITED TO

REFERENCES

News

Important Dates

Organizers

The Eleventh International Conference on Natural Language Processing (ICON-2014), 18-21 December, 2014, Goa, India

Might be also interested in: 2nd FIRE2014 Shared Task on Transliterated Search, FIRE 14, December, Bangalore, India

RATIONALE

TOPICS OF INTEREST BUT NOT LIMITED TO

REFERENCES

News

Important Dates

Organizers

Might be also interested in: 2^nd FIRE2014 Shared Task on Transliterated Search, FIRE 14, December, Bangalore, India