Research

Information Retrieval

Information Retrieval seeks to explore the methods and techniques of organizing, representing, storing, and searching of information in textual and multimedia forms (speech, image, and music). There are many subfields of Information Retrieval, such as Cross-language Information Retrieval (CLIR), Geographical Information Retrieval, Music Retrieval, and Image Retrieval.

Cross-Language Information Retrieval
We have conducted several research to propose a method for Indonesian-English CLIR, in which Indonesian Queries are posed to retrieve relevant English documents.

  • Sari, Syandra; Hayurani, Herika; Adriani, Mirna. 2007. Using Query Expansion for Indonesian-English Cross Language Information Retrieval. National Conference on Computer Science and Information Technology. Jakarta: 29-30 Maret 2007.
  • Sari, Syandra; Hayurani, Herika; Adriani, Mirna. 2007. Memperbaiki Teknik Penerjemahan Query Dan Penerjemahan Dokumen Untuk Perolehan Informasi Lintas Bahasa Indonesia-Inggris. National Conference on Computer Science and Information Technology. Jakarta: 29-30 Maret 2007.
  • Hayurani, Herika; Sari, Syandra; Adriani, Mirna. 2006. Evaluating Language Resources for English-Indonesian CLIR. CLEF 2006 Workshop.
  • driani, Mirna; Wahyu, Ihsan. 2005. University of Indonesia's Participation in Ad Hoc in CLIR-CLEF 2005. CLEF 2005 Workshop.

Geographical Information Retrieval
Geographical Information Retrieval focuses on how to improve retrieval quality using geographical information embedded on a query.

  • Adriani, Mirna, Nasikhin.In Proceedings of the Workshop on Geographic Information Retrieval, CLEF 2007
  • Adriani, Mirna; Paramita, Monica Lestari. Identifying Location in Indonesian Documents for Geographic Information Retrieval. In Proceedings of the Workshop on Geographic Information Retrieval, CIKM 2007, Lisbon, Portugal.
  • Paramita, Monica Lestari; Adriani, Mirna. 2007. Geographic Information Retrieval System for Indonesian Documents. National Conference on Computer Science and Information Technology. Jakarta: 29-30 Maret 2007.

Image Retrieval
Image Retrieval focuses on developing a system that enable us to search and retrieving images from a huge database of images.

  • Adriani, Mirna; Framadhan. The University of Indonesia's Participation in IMAGE-CLEF 2005. CLEF 2005 Workshop.

Web Information Retrieval
Web Information Retrieval focuses on handling web documents in a large scale collections.

  • Wijaya, Syntia; Widhi, Bimo; Khoerniawan, Tommy; Adriani, Mirna. 2007. Analisa Struktur Dokumen Pada Perolehan Informasi Dengan Dokumen Web . National Conference on Computer Science and Information Technology. Jakarta: 29-30 Maret 2007.
  • Wijaya, Syntia; Widhi, Bimo; Khoerniawan, Tommy; Adriani, Mirna. 2006. Using Document Structure on Retrieving Webpages at the Web-CLEF 2006. CLEF 2006 Workshop.

Natural Language Processing

Natural Language Processing is a field which tries to model natural language in formal rule representation, or formalism grammar. This representation can be categorized into phonetics, morphology, syntax, semantics, and discourses. These models are implemented as softwares which can process language artifacts, including utterance, sentences, text documents, etc.

This language modelling has many purposes. In linguistics, we can use it to understand language process and artifact better. In computer science, we can make use of it for various applications, for example Information Retrieval.

Our activities in this area are as follows:

  • Building information resources such as text corpus, especially in Bahasa Indonesia.
  • Designing and implementing formal rules for Bahasa Indonesia, especially in morphology and syntax level.
  • Using them to improve IR system perfomance.

Several Publications in this area:

  • Meidy, Angga Kho and Ruli Manurung. An Initial Indonesian Semantic Analyser that Leverages SUMO Inferential Power FOURTH INTERNATIONAL MALINDO WORKSHOP 2010. Jakarta, August 2, 2010.
  • Arawinda Dinakaramani, Fam Rashel, Andry Luthfi, Ruli Manurung. Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus. IALP 2014: 66-69
  • Fam Rashel, Andry Luthfi, Arawinda Dinakaramani, and Ruli Manurung. Building an Indonesian Rule-Based Part-of-Speech Tagger. International Conference on Asian Language Processing (IALP 2014). Kuching, 20-22 October 2014.
  • Nazief, Bobby. Development of computational linguistics research: a challenge for Indonesia. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. 2000.

Question Answering System

An area within the field of Information Retrieval and Natural Language Processing, which focuses on building a system that can automatically answer questions posed by human in a natural language.

Several Publications in this area:

  • Toba, Hapnes, Zhaoyan Ming, Mirna Adriani, Tat-Seng Chua. Discovering high quality answers in community question answering archives using a hierarchy of classifier. Information Science (2014) 261: 101-115.
  • Sjafrizal, Yolanda, Indra Budi, Mirna Adriani, Philip Arthur. Question Answering System Using Query Expansion and Heuristic Features. CLEF (Working Notes) 2013
  • Toba, Hapnes, Mirna Adriani, and Ruli Manurung. Predicting Answer Location Using Shallow Semantic Analogical Reasoning in a Factoid Question Answering System. 26th Pacific Asia Conference on Language Information and Computation (PACLIC 26). Bali, 8-10 November 2012.

Text Mining

Text Mining seeks approaches for structuring textual data, deriving patterns from the structured textual, and finally interpreting the results as well as mining useful information from the results.

Several Publications in this area:

  • Anggi Maulidyani, Ruli Manurung. Automatic Identification of Age-Appropriate Ratings of Song Lyrics. Beijing, China, ACL (2) 2015: 583-5872014
  • Alfan Farizki Wicaksono, Clara Vania, Bayu Distiawan, Mirna Adriani. Automatically Building A Corpus for Sentiment Analysis on Indonesian Tweets PACLIC 2014.
  • Natasha and Ruli Manurung. Building an Indonesian News Aggregator using Naive Bayes Classification and Non-Negative Matrix Factorization Clustering Algorithms FOURTH INTERNATIONAL MALINDO WORKSHOP 2010. Jakarta, August 2, 2010.
  • Wahyudi, Gatot and Mirna Adriani. Indonesian Named Entity Recognition Using Support Vector Machine FOURTH INTERNATIONAL MALINDO WORKSHOP 2010. Jakarta, August 2, 2010.
  • Adriani, Mirna; Sitawati, Haryani Diah. 2006. Summarization System for Indonesian Documents . Information System & Technology National Conference (SNASTI).

Machine Translation

Machine Translation is a sub-field of computational linguistics that seeks computational models to automatically translates text or speech expressed in one language to another language. Information Retrieval Lab has been publishing several works in this area, especially for Indonesia-English translation.

Several Publications in this area:

  • Yulianti, Evi, Indra Budi, Achmad N. Hidayanto, Hisar M. Manurung, and Mirna Adriani. Developing Indonesian-English Hybrid Machine Translation System International Conference on Advanced Computer Science and Information System (ICACSIS 2011). Jakarta, 17-18 December 2011.
  • Wanagiri, Metti Zakaria. Indonesian-English Machine Translation FOURTH INTERNATIONAL MALINDO WORKSHOP 2010. Jakarta, August 2, 2010. Wahyudi, Gatot and Mirna Adriani. Indonesian Named Entity Recog
  • Marsye, Aurora and Mirna Adriani. Evaluating Various Corpora for Building Indonesian-English Statistical Machine Translation System. THIRD INTERNATIONAL MALINDO WORKSHOP, Co-located Event ACL-IJCNLP 2009. Singapore, August 1, 2009.

Speech Processing

In our lab, we have been doing research on Automatic Speech Recognition (ASR) that enables the recognition and translation of speech or spoken language into text. This area incorporates disciplines from computational linguistics and electrical engineering.

Several Publications in this area:

  • Nur Endah Safitri, Amalia Zahra, and Mirna Adriani. Spoken Language Identification Using Phonotactic Methods on Minangkabau, Sundanese, and Javanese. 5th Workshop on Spoken Language Technologies for Under-resourced Languages, 2016.
  • Andros Tjandra, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, Satoshi Nakamura. Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR. ICASSP, Brisabane, Australia, 2015: 4525-4529.
  • Wanagiri, Metti Z. and Mirna Adriani. Developing and Analyzing ASR System for Accented Indonesian Speech The 15th Oriental COCOSDA Conference. Macau, 9-12 December 2012.

Past Research Activities

  • 2007-2008: Mind Your Language: Corpora and Algorithms for the Fundamental Natural Language Processing Tasks in Information Retrieval and Extraction for the Indonesian and Malay languages. It is a joint research with Dr. Stephane Bressan (National University of Singapore (NUS), Singapore and Dr. RANAIVO-MALANON Balisoamanandray, Universiti Sains Malaysia.
  • 2007-2008: Development of Indonesian WordNet
  • 2007: Speech Recognition for Indonesian for telephone application
  • 2006-2007: Cross-Language Information retrieval based on Parallel Corpora
  • 2006: Information Retrieval System on Adaptive Peer-to-Peer system