IR-NLP Lab - CSUI

Research

Information Retrieval

Information Retrieval seeks to explore the methods and techniques of organizing, representing, storing, and searching of information in textual and multimedia forms (speech, image, and music).

Our activities in this area are as follows:

Information Retrieval

Our latest publications in this task:

Theresia V. Rampisela and Evi Yulianti. Semantic-Based Query Expansion for Academic Expert Finding. In Proceedings of the 2020 International Conference of Asian Language Processing (IALP), 4-6 Desember 2020. paper
Theresia V. Rampisela and Evi Yulianti. Academic Expert Finding in Indonesia using Word Embedding and Document Embedding: A Case Study of Fasilkom UI. In Proceedings of the 8th International Conference on Information and Communication Technology (ICoICT), 24-26 June 2020. paper
Rinaldi Andrian Rahmanda, Mirna Adriani, Dipta Tanaya. Cross Language Information Retrieval Using Parallel Corpus with Bilingual Mapping Method In Proceedings of the 2019 International Conference on Asian Language Processing (IALP), Shanghai, China, 15-17 Nov. 2019. paper

Recommender System

Our latest publications in this task:

Bayu Yudha Pratama, Indra Budi, Arlisa Yuliawati. Product Recommendation in Offline Retail Industry by using Collaborative Filtering. International Journal of Advanced Computer Science and Applications (IJACSA), Volume 11 No 9, 2020. paper
Ridho Trivonanda, Rahmad Mahendra, Indra Budi, and Rani Aulia Hidayat. Sequential Pattern Mining for e-Commerce Recommender System. In Proceedings of the 2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS), 17-18 Oct. 2020. paper
Aghny Arisya Putra, Rahmad Mahendra, Indra Budi, Qorib Munajat. Two-steps graph-based collaborative filtering using user and item similarities: Case study of E-commerce recommender systems. In Proceedings of 2017 International Conference on Data and Software Engineering (ICoDSE), 2017. paper

Natural Language Processing

Natural Language Processing (NLP) is a field which tries to model natural language in formal rule representation, or formalism grammar. This representation can be categorized into phonetics, morphology, syntax, semantics, and discourses. These models are implemented as softwares which can process language artifacts, including utterance, sentences, text documents, etc.

Our activities in this area are as follows:

Morphological Analysis, POS Tagging and Syntactic Parsing

Our latest publications in these tasks:

Ika Alfina, Indra Budi, and Heru Suhartanto. Tree Rotations for Dependency Trees: Converting the Head-Directionality of Noun Phrases. Journal of Computer Science, Volume 16 No 11, 2020. paper | dataset
Ika Alfina, Daniel Zeman, Arawinda Dinakaramani, Indra Budi and Heru Suhartanto. Selecting the UD v2 Morphological Features for Indonesian Dependency Treebank. In Proceedings of the 2020 International Conference of Asian Language Processing (IALP), 4-6 Desember 2020. paper | dataset
Jessica Naraiswari Arwidarasti, Ika Alfina and Adila Alfa Krisnadhi. Adjusting Indonesian Multiword Expression Annotation to the Penn Treebank Format. In Proceedings of the 2020 International Conference of Asian Language Processing (IALP), 4-6 Desember 2020. paper | dataset
Muhammad Yudistira Hanifmuti and Ika Alfina. Aksara: An Indonesian Morphological Analyzer that Conforms to the UD v2 Annotation Guidelines. In Proceedings of the 2020 International Conference of Asian Language Processing (IALP), 4-6 Desember 2020. paper | tool

Lexical Normalization
Lexical normalization is the task of translating/transforming a non standard text to a standard register.

Our latest publications in this task:

Ajmal Kurnia and Evi Yulianti. Statistical Machine Translation Approach for Lexical Normalization on Indonesian Text. In Proceedings of the 2020 International Conference of Asian Language Processing (IALP), 4-6 Desember 2020. paper
Haryo Akbarianto Wibowo, Tatag Aziz Prawiro, Muhammad Ihsan, Alham Fikri Aji, Radityo Eko Prasojo and Rahmad Mahendra. Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation. In Proceedings of the 2020 International Conference of Asian Language Processing (IALP), 4-6 Desember 2020. paper
Anab Maulana Barik, Rahmad Mahendra, Mirna Adriani. Normalization of Indonesian-English Code-Mixed Twitter Data. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019). paper

Named Entity Recognition
Named entity recognition (NER) is the task of tagging entities in text with their corresponding type.

Our latest publications in this task:

Hadi Syah Putra, Faisal Satrio Priatmadji, Rahmad Mahendra. Semi-supervised Named-Entity Recognition for Product Attribute Extraction in Book Domain. In Proceedings of the International Conference on Asian Digital Libraries, November 2020. paper
Eka Qadri Nuranti and Evi Yulianti. Legal Entity Recognition in Indonesian Court Decision Documents Using Bi-LSTM and CRF Approaches In Proceedings of the 2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS), 17-18 Oct. 2020. paper
Ika Alfina, Septiviana Savitri, Mohamad Ivan Fanany Modified DBpedia entities expansion for tagging automatically NER dataset In Proceedings of the International Conference on Advanced Computer Science and Information Systems (ICACSIS 2017), Jakarta, Indonesia, 28-29 Oktober 2017. paper | dataset
Valdi Rachman, Septiviana Savitri, Fithriannisa Augustianti, Rahmad Mahendra Named Entity Recognition on Indonesian Twitter Posts Using Long Short Term Memory Networks In Proceedings of the International Conference on Advanced Computer Science and Information Systems (ICACSIS 2017), Jakarta, Indonesia, 28-29 Oktober 2017. paper

Sentiment Analysis
Sentiment analysis is the task of classifying the polarity of a given text.

Our latest publications in this task:

Andi Suciati and Indra Budi. Aspect-based sentiment analysis and emotion detection for code-mixed review. International Journal of Advanced Computer Science and Applications (IJACSA), Volume 11 No 9, 2020. paper
Andi Suciati and Indra Budi. UI at SemEval-2020 Task 8: Text-Image Fusion for Sentiment Classification. In Proceedings of the 14th International Workshop on Semantic Evaluation, December 12, 2020. paper
Majesty Eksa Permana, Handoko Ramadhan, Indra Budi, Aris Budi Santoso, Prabu Kresna Putra. Sentiment Analysis and Topic Detection of Mobile Banking Application Review. In Proceedings of the Fifth International Conference on Informatics and Computing (ICIC), Gorontalo, 3-4 Nov. 2020. paper

Hate Speech and Abusive Language Detection

Our latest publications in this task:

Dimas Sony Dewantara, Indra Budi, and Muhammad Okky Ibrohim. 3218IR at SemEval-2020 Task 11: Conv1D and Word Embedding in Propaganda Span Identification at News Articles In Proceedings of the 14th International Workshop on Semantic Evaluation, December 12, 2020. paper
Sandy Kurniawan, Indra Budi, and Muhammad Okky Ibrohim. IR3218-UI at SemEval-2020 Task 12: Emoji Effects on Offensive Language Identification In Proceedings of the 14th International Workshop on Semantic Evaluation, December 12, 2020. paper
Dimas Sony Dewantara and Indra Budi. Combination of LSTM and CNN for Article-Level Propaganda Detection in News Articles. In Proceedings of the Fifth International Conference on Informatics and Computing (ICIC), Gorontalo, 3-4 Nov. 2020. paper
Sandy Kurniawan, Indra Budi. Indonesian Tweets Hate Speech Target Classification using Machine Learning. In Proceedings of the Fifth International Conference on Informatics and Computing (ICIC), Gorontalo, 3-4 Nov. 2020. paper

Language Modeling
Language modeling is the task of predicting the next word or character in a document.

Our latest publications in this task:

Bryan Wilie, Karissa Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, Ayu Purwarianti. IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020. paper

Natural Language Inference
Natural language inference is the task of determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.

Our latest publications in this task:

Kerenza Doxolodeo, and Rahmad Mahendra. CSUI at SemEval-2020 Task 4: Commonsense Validation and Explanation by Exploiting Contradiction In Proceedings of the 14th International Workshop on Semantic Evaluation, December 12, 2020. paper

Question Answering
Question answering is the task of answering a question..

Our latest publications in this task:

Rahmad Mahendra, Abid Nurul Hakim, Mirna Adriani. Towards Question Identification from Online Healthcare Consultation Forum Post in Bahasa In Proceedings of the International Conference on Asian Language Processing (IALP 2017). paper
Christian Halim, Alfan Farizki Wicaksono, Mirna Adriani. Extracting Disease-Symptom Relationships from Health Question and Answer Forum In Proceedings of the International Conference on Asian Language Processing (IALP 2017). paper
Abid Nurul Hakim, Rahmad Mahendra, Mirna Adriani. Corpus Development for Indonesian Consumer-Health Question Answering System In Proceedings of the International Conference on Advanced Computer Science and Information Systems (ICACSIS 2017). paper

Summarization
Summarization is the task of producing a shorter version of one or several documents that preserves most of the input’s meaning.

Our latest publications in this task:

Meganingrum Arista Jiwanggi, Mirna Adriani. Topic Summarization of Microblog Document in Bahasa Indonesia using the Phrase Reinforcement Algorithm. In Procedia Computer Science, Vol 80, 2016. paper
Arlisa Yuliawati, Ruli Manurung. Improving coherence by reordering the output of extractive summarization using Centering Theory through genetic algorithm In Proceedings of the 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Sept. 2013. paper

Machine Translation
Machine translation is the task of translating a sentence in a source language to a different target language.

Our latest publications in this task:

Evi Yulianti, Indra Budi, Achmad N. Hidayanto, Hisar M. Manurung, and Mirna Adriani. Developing Indonesian-English Hybrid Machine Translation System. In Proceedings of the 2011 International Conference on Advanced Computer Science and Information System (ICACSIS 2011). Jakarta, 17-18 December 2011. paper
Metti Zakaria Wanagiri. Indonesian-English Machine Translation. In Proceedings of the FOURTH INTERNATIONAL MALINDO WORKSHOP 2010. Jakarta, August 2, 2010.

Speech Processing

Speech processing is the study of speech signals and the processing methods of signals.

Our activities in this area are as follows:

Automatic Speech Recognition (ASR)
Automatic speech recognition is the task of automatically recognizing speech.

Our latest publications in this task:

Fahmi Fahmi, Meganingrum Arista Jiwanggi, Mirna Adriani. Speech-Emotion Detection in an Indonesian Movie. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), Marseille, 11-16 Mei 2020. paper
Mei Silviana Saputri and Mirna Adriani. Identifying Indonesian Local Languages on Spontaneous Speech Data. In Proceedings of the 2019 International Conference on Advanced Computer Science and information Systems (ICACSIS). paper
Nur Endah Safitri, Amalia Zahra, and Mirna Adriani. Spoken Language Identification Using Phonotactic Methods on Minangkabau, Sundanese, and Javanese. 5th Workshop on Spoken Language Technologies for Under-resourced Languages, 2016.
Koto, Fajri, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, Satoshi Nakamura. The use of semantic and acoustic features for open-domain TED talk summarization. APSIPA 2014: 1-42013. paper

Text-to-Speech (TTS)

Our latest publications in this taks:

Kurniawati Azizah, Mirna Adriani, and Wisnu Jatmiko. Hierarchical Transfer Learning for Multilingual, Multi-Speaker, and Style Transfer DNN-Based TTS on Low-Resource Languages. IEEE Access, Volume 8, 2020. paper
Kurniawati Azizah and Mirna Adriani. Hierarchical Transfer Learning for Text-to-Speech in Indonesian, Javanese, and Sundanese Languages. In Proceedings of the 2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS), 17-18 Oct. 2020. paper