About

Mohammad Sadegh Rasooli

<b>Mohammad Sadegh Rasooli</b>

Mohammad Sadegh Rasooli

Principal applied scientist, Speech and Language Group at Microsoft, Mountain View, CA

Former postdoctoral researcher, University of Pennsylvania

Former research scientist at Facebook AI

PhD of Computer Science, Columbia University

mrasooli-at-microsoft.[com]

CV (Updated: October 2024)

Publications

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Sreyan Ghosh, Mohammad Sadegh Rasooli, Michael Levit, Peidong Wang, Jian Xue, Dinesh Manocha, Jinyu Li. arXiv:2410.13198, 2024.</em> [abstract] [bibtex]
```
@misc{ghosh2024failingforwardimprovinggenerative,
      title={Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation}, 
      author={Sreyan Ghosh and Mohammad Sadegh Rasooli and Michael Levit and Peidong Wang and Jian Xue and Dinesh Manocha and Jinyu Li},
      year={2024},
      eprint={2410.13198},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2410.13198}, 
}
```
Generative Error Correction (GEC) has emerged as a powerful post-processing method to enhance the performance of Automatic Speech Recognition (ASR) systems. However, we show that GEC models struggle to generalize beyond the specific types of errors encountered during training, limiting their ability to correct new, unseen errors at test time, particularly in out-of-domain (OOD) scenarios. This phenomenon amplifies with named entities (NEs), where, in addition to insufficient contextual information or knowledge about the NEs, novel NEs keep emerging. To address these issues, we propose DARAG (Data- and Retrieval-Augmented Generative Error Correction), a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios. We augment the GEC training dataset with synthetic data generated by prompting LLMs and text-to-speech models, thereby simulating additional errors from which the model can learn. For OOD scenarios, we simulate test-time errors from new domains similarly and in an unsupervised fashion. Additionally, to better handle named entities, we introduce retrieval-augmented correction by augmenting the input with entities retrieved from a database. Our approach is simple, scalable, and both domain- and language-agnostic. We experiment on multiple datasets and settings, showing that DARAG outperforms all our baselines, achieving 8\% -- 30\% relative WER improvements in ID and 10\% -- 33\% improvements in OOD settings.
External Language Model Integration for Factorized Neural Transducers
Michael Levit, Sarangarajan Parthasarathy, Cem Aksoylar, Mohammad Sadegh Rasooli, and Shuangyu Chang.
arXiv:2305.17304, 2023. [abstract] [bibtex]
```
@misc{levit2023external,
      title={External Language Model Integration for Factorized Neural Transducers}, 
      author={Michael Levit and Sarangarajan Parthasarathy and Cem Aksoylar and Mohammad Sadegh Rasooli and Shuangyu Chang},
      year={2023},
      eprint={2305.17304},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```
We propose an adaptation method for factorized neural transducers (FNT) with external language models. We demonstrate that both neural and n-gram external LMs add significantly more value when linearly interpolated with predictor output compared to shallow fusion, thus confirming that FNT forces the predictor to act like regular language models. Further, we propose a method to integrate class-based n-gram language models into FNT framework resulting in accuracy gains similar to a hybrid setup. We show average gains of 18% WERR with lexical adaptation across various scenarios and additive gains of up to 60% WERR in one entity-rich scenario through a combination of class-based n-gram and neural LMs.
Bidirectional Language Models Are Also Few-shot Learners
Ajay Patel, Bryan Li, Mohammad Sadegh Rasooli, Noah Constant, Colin Raffel, Chris Callison-Burch.
ICLR, 2023. [abstract] [bibtex]
```
@inproceedings{Patel2022BidirectionalLM,
  title={Bidirectional Language Models Are Also Few-shot Learners},
  author={Ajay Patel and Bryan Li and Mohammad Sadegh Rasooli and Noah Constant and Colin Raffel and Chris Callison-Burch},
  year={2022}
}
```
Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.
Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation
Bryan Li, Mohammad Sadegh Rasooli, Ajay Patel, Chris Callison-Burch.
LoResMT, pp 16-31, 2023. [abstract] [bibtex]
```
@article{li2022multilingual,
  title={Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation},
  author={Li, Bryan and Patel, Ajay and Callison-Burch, Chris and Rasooli, Mohammad Sadegh},
  journal={arXiv preprint arXiv:2209.02821},
  year={2022}
}
```
We propose a two-stage training approach for developing a single NMT model to translate unseen languages both to and from English. For the first stage, we initialize an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform multilingual fine-tuning on parallel data in 25 languages to English. We find this model can generalize to zero-shot translations on unseen languages. For the second stage, we leverage this generalization ability to generate synthetic parallel data from monolingual datasets, then train with successive rounds of back-translation. The final model extends to the English-to-Many direction, while retaining Many-to-English performance. We term our approach EcXTra (English-centric Crosslingual (X) Transfer). Our approach sequentially leverages auxiliary parallel data and monolingual data, and is conceptually simple, only using a standard cross-entropy objective in both stages. The final EcXTra model is evaluated on unsupervised NMT on 8 low-resource languages achieving a new state-of-the-art for English-to-Kazakh (22.3 > 10.4 BLEU), and competitive performance for the other 15 translation directions.

The Persian Dependency Treebank Made Universal
Mohammad Sadegh Rasooli, Pegah Safari, Amirsaeid Moloodi, and Alireza Nourian.
LREC 2022. [abstract] [bibtex][code+data]

"Wikily" Neural Machine Translation Tailored to Cross-Lingual Tasks
Mohammad Sadegh Rasooli, Chris Callison-Burch and Derry Tanti Wijaya.
EMNLP 2021. [abstract] [bibtex][code]

@inproceedings{rasooli-etal-2021-wikily, title = "{``}Wikily{''} Supervised Neural Translation Tailored to Cross-Lingual Tasks", author = "Rasooli, Mohammad Sadegh and Callison-Burch, Chris and Wijaya, Derry Tanti", booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2021", address = "Online and Punta Cana, Dominican Republic", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.emnlp-main.124", doi = "10.18653/v1/2021.emnlp-main.124", pages = "1655--1670", abstract = "We present a simple but effective approach for leveraging Wikipedia for neural machine translation as well as cross-lingual tasks of image captioning and dependency parsing without using any direct supervision from external parallel data or supervised models in the target language. We show that first sentences and titles of linked Wikipedia pages, as well as cross-lingual image captions, are strong signals for a seed parallel data to extract bilingual dictionaries and cross-lingual word embeddings for mining parallel text from Wikipedia. Our final model achieves high BLEU scores that are close to or sometimes higher than strong \textit{supervised} baselines in low-resource languages; e.g. supervised BLEU of 4.0 versus 12.1 from our model in English-to-Kazakh. Moreover, we tailor our \textit{wikily} translation models to unsupervised image captioning, and cross-lingual dependency parser transfer. In image captioning, we train a multi-tasking machine translation and image captioning pipeline for Arabic and English from which the Arabic training data is a \textit{wikily} translation of the English captioning data. Our captioning results on Arabic are slightly \textit{better} than that of its supervised model. In dependency parsing, we translate a large amount of monolingual text, and use it as an artificial training data in an \textit{annotation projection} framework. We show that our model outperforms recent work on cross-lingual transfer of dependency parsers.", }

Cultural and Geographical Influences on Image Translatability of Words across Languages
Nikzad Khani, Isidora Chara Tourni, Mohammad Sadegh Rasooli, Chris Callison-Burch and Derry Tanti Wijaya.
NAACL 2021. [abstract] [bibtex]
```
tbd
```
Neural Machine Translation (NMT) models have been observed to produce poor translations when there are few/no parallel sentences to train the models. In the absence of parallel data, several approaches have turned to the use of images to learn translations. Since images of words, e.g., \textit{horse} may be unchanged across languages, translations can be identified via images associated with words in different languages that have a high degree of visual similarity. However, translating via images has been shown to improve upon text-only models only marginally. To better understand \textit{when} images are useful for translation, we study \textit{image translatability} of words, which we define as the translatability of words via images, by measuring intra- and inter-cluster similarities of image representations of words that are translations of each other. We find that images of words are not always invariant across languages, and that language pairs with shared culture, meaning having either a common language family, ethnicity or religion, have improved image translatability (i.e., have more similar images for similar words) compared to its converse, regardless of their geographic proximity. In addition, in line with previous works that show images help more in translating concrete words, we found that concrete words have improved image translatability compared to abstract ones.

ParsiNLU: A Suite of Language Understanding Challenges for Persian
Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian, Mozhdeh Gheini, Arman Kabiri, Rabeeh Karimi Mahabadi, Omid Memarrast, Ahmadreza Mosallanezhad, Erfan Noury, Shahab Raji, Mohammad Sadegh Rasooli, Sepideh Sadeghi, Erfan Sadeqi Azer, Niloofar Safi Samghabadi, Mahsa Shafaei, Saber Sheybani, Ali Tazarv, and Yadollah Yaghoobzadeh.
Transactions of the ACL, 9:1147–1162, 2021. [abstract] [bibtex][code+data]

@article{khashabi-etal-2021-parsinlu, title = "{P}arsi{NLU}: A Suite of Language Understanding Challenges for {P}ersian", author = "Khashabi, Daniel and Cohan, Arman and Shakeri, Siamak and Hosseini, Pedram and Pezeshkpour, Pouya and Alikhani, Malihe and Aminnaseri, Moin and Bitaab, Marzieh and Brahman, Faeze and Ghazarian, Sarik and Gheini, Mozhdeh and Kabiri, Arman and Mahabagdi, Rabeeh Karimi and Memarrast, Omid and Mosallanezhad, Ahmadreza and Noury, Erfan and Raji, Shahab and Rasooli, Mohammad Sadegh and Sadeghi, Sepideh and Azer, Erfan Sadeqi and Samghabadi, Niloofar Safi and Shafaei, Mahsa and Sheybani, Saber and Tazarv, Ali and Yaghoobzadeh, Yadollah", journal = "Transactions of the Association for Computational Linguistics", volume = "9", year = "2021", address = "Cambridge, MA", publisher = "MIT Press", url = "https://aclanthology.org/2021.tacl-1.68", doi = "10.1162/tacl_a_00419", pages = "1147--1162", abstract = "Abstract Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks{---}reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.1", }

Automatic Standardization of Colloquial Persian
Mohammad Sadegh Rasooli, Farzane Bakhtyari, Fatemeh Shafiei, Mahsa Ravanbakhsh, and Chris Callison-Burch.
arXiv:2012.05879, Dec. 2020. [abstract] [bibtex][code+data]
```
@misc{rasooli2020automatic,
      title={Automatic Standardization of Colloquial Persian}, 
      author={Mohammad Sadegh Rasooli and Farzane Bakhtyari and Fatemeh Shafiei and Mahsa Ravanbakhsh and Chris Callison-Burch},
      year={2020},
      eprint={2012.05879},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```
The Iranian Persian language has two varieties: standard and colloquial. Most natural language processing tools for Persian assume that the text is in standard form: this assumption is wrong in many real applications especially web content. This paper describes a simple and effective standardization approach based on sequence-to-sequence translation. We design an algorithm for generating artificial parallel colloquial-to-standard data for learning a sequence-to-sequence model. Moreover, we annotate a publicly available evaluation data consisting of 1912 sentences from a diverse set of domains. Our intrinsic evaluation shows a higher BLEU score of 62.8 versus 61.7 compared to an off-the-shelf rule-based standardization model in which the original text has a BLEU score of 46.4. We also show that our model improves English-to-Persian machine translation in scenarios for which the training data is from colloquial Persian with 1.4 absolute BLEU score difference in the development data, and 0.8 in the test data.

Multitask Learning for Cross-Lingual Transfer of Broad-coverage Semantic Dependencies
Maryam Aminian, Mohammad Sadegh Rasooli, and Mona Diab.
EMNLP 2020. [abstract] [bibtex]

Low-Resource Syntactic Transfer with Unsupervised Source Reordering
Mohammad Sadegh Rasooli, and Michael Collins.
NAACL 2019. [abstract] [bibtex]

Cross-Lingual Transfer of Semantic Roles: From Raw Text to Semantic Roles
Maryam Aminian, Mohammad Sadegh Rasooli, and Mona Diab.
IWCS 2019. [abstract] [bibtex]

Cross-Lingual Transfer of Natural Language Processing Systems
Mohammad Sadegh Rasooli.
PhD Thesis, Columbia Univeristy, 2019. [abstract] [bibtex]
```
@PhdThesis{rasoolithesis,
  author =       {Mohammad Sadegh Rasooli},
  title = {Cross-Lingual Transfer of Natural Language Processing Systems},
  school =       {Columbia University},
  year =         {2018},
  address =      {New York},
  month =        {December},
}   
```
Accurate natural language processing systems rely heavily on annotated datasets. In the absence of such datasets, transfer methods can help to develop a model by transferring annotations from one or more rich-resource languages to the target language of interest. These methods are generally divided into two approaches: 1) annotation projection from translation data, aka parallel data, using supervised models in rich-resource languages, and 2) direct model transfer from annotated datasets in rich-resource languages.
In this thesis, we demonstrate different methods for transfer of dependency parsers and sentiment analysis systems. We propose an annotation projection method that performs well in the scenarios for which a large amount of in-domain parallel data is available. We also propose a method which is a combination of annotation projection and direct transfer that can leverage a minimal amount of information from a small out-of-domain parallel dataset to develop highly accurate transfer models. Furthermore, we propose an unsupervised syntactic reordering model to improve the accuracy of dependency parser transfer for non-European languages. Finally, we conduct a diverse set of experiments for the transfer of sentiment analysis systems in different data settings.
A summary of our contributions are as follows:
• We develop accurate dependency parsers using parallel text in an annotation projection framework. We make use of the fact that the density of word alignments is a valuable indicator of reliability in annotation projection.
• We develop accurate dependency parsers in the absence of a large amount of parallel data. We use the Bible data, which is in orders of magnitude smaller than a conventional parallel dataset, to provide minimal cues for creating cross-lingual word representations. Our model is also capable of boosting the performance of annotation projection with a large amount of parallel data. Our model develops cross-lingual word representations for going beyond the traditional delexicalized direct transfer methods. Moreover, we propose a simple but effective word translation approach that brings in explicit lexical features from the target language in our direct transfer method.
• We develop different syntactic reordering models that can change the source tree- banks in rich-resource languages, thus preventing learning a wrong model for a non-related language. Our experimental results show substantial improvements over non-European languages.
• We develop transfer methods for sentiment analysis in different data availability scenarios. We show that we can leverage cross-lingual word embeddings to create accurate sentiment analysis systems in the absence of annotated data in the target language of interest.
We believe that the novelties that we introduce in this thesis indicate the usefulness of transfer methods. This is appealing in practice, especially since we suggest eliminating the requirement for annotating new datasets for low-resource languages which is expensive, if not impossible, to obtain.
Entity-Aware Language Model as an Unsupervised Reranker
Mohammad Sadegh Rasooli, and Sarangarajan Parthasarathy.
INTERSPEECH 2018. [abstract] [bibtex]
```
@inproceedings{Rasooli2018,
  author={Mohammad Sadegh Rasooli and Sarangarajan Parthasarathy},
  title={Entity-Aware Language Model as an Unsupervised Reranker},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={406--410},
  doi={10.21437/Interspeech.2018-62},
  url={http://dx.doi.org/10.21437/Interspeech.2018-62}
}
```
In language modeling, it is difficult to incorporate entity relationships from a knowledge-base. One solution is to use a reranker trained with global features, in which global features are derived from n-best lists. However, training such a reranker requires manually annotated n-best lists, which is expensive to obtain. We propose a method based on the contrastive estimation method that alleviates the need for such data. Experiments in the music domain demonstrate that global features, as well as features extracted from an external knowledge-base, can be incorporated into our reranker. Our final model achieves a 0.44 absolute word error rate improvement on the blind test data.
Cross-Lingual Sentiment Transfer with Limited Resources
Mohammad Sadegh Rasooli, Noura Farra, Axinia Radeva, Tao Yu and Kathleen McKeown.
Machine Translation, Volume 32, Issue 1–2, pp 143–165, 2018. [abstract] [bibtex] [code]
```
@Article{Rasooli2018,
author="Rasooli, Mohammad Sadegh
and Farra, Noura
and Radeva, Axinia
and Yu, Tao
and McKeown, Kathleen",
title="Cross-lingual sentiment transfer with limited resources",
journal="Machine Translation",
year="2018",
month="Jun",
day="01",
volume="32",
number="1",
pages="143--165",
issn="1573-0573",
doi="10.1007/s10590-017-9202-6",
url="https://doi.org/10.1007/s10590-017-9202-6"
}
```
We describe two transfer approaches for building sentiment analysis systems without having gold labeled data in the target language. Unlike previous work that is focused on using only English as the source language and a small number of target languages, we use multiple source languages to learn a more robust sentiment transfer model for 16 languages from different language families. Our approaches explore the potential of using an annotation projection approach and a direct transfer approach using cross-lingual word representations and neural networks. Whereas most previous work relies on machine translation, we show that we can build cross-lingual sentiment analysis systems without machine translation or even high quality parallel data. %We have conducted experiments with and without parallel data (e.g. using comparable corpora). We have conducted experiments assessing the availability of different resources such as in-domain parallel data, out-of-domain parallel data, and in-domain comparable data. Our experiments show that we can build a robust transfer system whose performance can in some cases approach that of a supervised system.

Transferring Semantic Roles Using Translation and Syntactic Information
Maryam Aminian, Mohammad Sadegh Rasooli, and Mona Diab.
IJCNLP 2017. [abstract] [bibtex]
```
@InProceedings{I17-2003,
  author =  "Aminian, Maryam
    and Rasooli, Mohammad Sadegh
    and Diab, Mona",
  title =   "Transferring Semantic Roles Using Translation and Syntactic Information",
  booktitle =   "Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
  year =  "2017",
  publisher =   "Asian Federation of Natural Language Processing",
  pages =   "13--19",
  location =  "Taipei, Taiwan",
  url =   "http://aclweb.org/anthology/I17-2003"
}
```
Annotation projection for semantic role labeling is a transfer method that aims to develop systems for resource-poor languages using supervised annotations of a resource-rich language through parallel data. We propose a method that employs information from source and target syntactic dependencies as well as word alignment density to improve the quality of an iterative bootstrapping method. Our experiments yield a $3.5$ absolute labeled F-score improvement over a standard annotation projection method.
Cross-Lingual Syntactic Transfer with Limited Resources
Mohammad Sadegh Rasooli and Michael Collins.
Transactions of the ACL, 5:279--293, 2017.[abstract] [bibtex] [code]

Density-Driven Cross-Lingual Transfer of Dependency Parsers
Mohammad Sadegh Rasooli and Michael Collins.
EMNLP 2015.[abstract] [bibtex] [Slides] [Video] [Models & Runnable jar]

On the Importance of Ezafe Construction in Persian Parsing
Alireza Nourian, Mohammad Sadegh Rasooli, Mohsen Imany and Heshaam Faili.
ACL-IJCNLP 2015.[abstract] [bibtex] [Poster]

Yara Parser: A Fast and Accurate Dependency Parser
Mohammad Sadegh Rasooli and Joel Tetreault.
arXiv:1503.06733v2 [cs.CL], 2015.[abstract] [bibtex] [Code]

Persian Syntactic Treebank: a Research Based on Dependency Grammar
Mohammad Sadegh Rasooli, Manouchehr Kouhestani and Amirsaeid Moloodi.
‌SCICT; in Persian; ISBN = 313-388-3388-81-3.
Improving Deep Neural Network Acoustic Modeling For Audio Corpus Indexing Under The IARPA Babel Program
Xiaodong Cui, Brian Kingsbury, Jia Cui, Bhuvana Ramabhadran andrew Rosenberg, Mohammad Sadegh Rasooli, Owen Rambow, Nizar Habash and Vaibhava Goel.
INTERSPEECH 2014.[abstract] [bibtex]

Unsupervised Morphology-Based Vocabulary Expansion
Mohammad Sadegh Rasooli, Thomas Lippincott, Nizar Habash and Owen Rambow.
ACL 2014.[abstract] [bibtex] [Poster]

Non-Monotonic Parsing of Fluent Umm I mean Disfluent Sentences
Mohammad Sadegh Rasooli and Joel Tetreault.
EACL 2014.[abstract] [bibtex] [Slides]

Joint Parsing and Disfluency Detection in Linear Time
Mohammad Sadegh Rasooli and Joel Tetreault.
EMNLP 2013.[abstract] [bibtex] [Slides]

Orthographic and Morphological Processing for Persian to English Statistical Machine Translation
Mohammad Sadegh Rasooli, Ahmed El Kholy and Nizar Habash.
IJCNLP 2013.[abstract] [bibtex] [Poster]

Development of a Persian Syntactic Dependency Treebank
Mohammad Sadegh Rasooli, Manouchehr Kouhestani and Amirsaeid Moloodi.
NAACL 2013.[abstract] [bibtex] [Poster] [Data]

Unsupervised Induction of Persian Semantic Verb Classes Based on Syntactic Information
Maryam Aminian, Mohammad Sadegh Rasooli and Hossein Sameti.
International Conference Language Processing and Intelligent Information Systems, 2013.[abstract] [bibtex]

Unsupervised Extraction of Verb Valency in Persian
Mohammad Sadegh Rasooli, Behrouz Minaei-Bidgoli, Heshaam Faili and Maryam Aminian.
Journal of Signal and Data Processing, 2(18), pp. 3-12, 2013; in Persian.
Fast Unsupervised Dependency Parsing with Arc-Standard Transitions
Mohammad Sadegh Rasooli and Heshaam Faili.
Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP; 2012.[abstract] [bibtex]

Persian Verb Valency Lexicon: An Attempt Toward Teaching Persian to Non-native Persian Speakers
Manouchehr Kouhestani, Amirsaeid Moloodi and Mohammad Sadegh Rasooli.
International Conference on Spread of Persian Language and Literature, 2012; in Persian.
Unsupervised Identification of Persian Compound Verbs
Mohammad Sadegh Rasooli, Heshaam Faili and Behrouz Minaei-Bidgoli.
10th Mexican International Conference on Artificial Intelligence (MICAI 2011).[abstract] [bibtex]

Extracting Parallel Paragraphs and Sentences from English-Persian Translated Documents
Mohammad Sadegh Rasooli, Omid Kashefi and Behrouz Minaei-Bidgoli.
The Seventh Asia Information Retrieval Societies Conference (AIRS 2011).[abstract] [bibtex]

A Syntactic Valency Lexicon for Persian Verbs: The First Steps towards Persian Dependency Treebank
Mohammad Sadegh Rasooli, Amirsaeid Moloodi, Manouchehr Kouhestani and Behrouz Minaei-Bidgoli.
5th Language & Technology Conference (LTC 2011)[bibtex] [Data]

Effect of Adaptive Spell Checking in Persian
Mohammad Sadegh Rasooli, Omid Kashefi and Behrouz Minaei-Bidgoli.
7th Conference on Natural Language Processing and Knowledge Engineering (NLPKE 2011)[abstract] [bibtex]

A New Approach for Persian Spellchecking
Mohammad Sadegh Rasooli and Behrouz Minaei-Bidgoli.
2nd Data Mining Conference (IDMC 2008); in Persian.[abstract] [bibtex]