Saturday, June 14, 2014

How to install moses on ubuntu 14.04 (x64) LTS ?

Posted By: Achchuthan Yogarajah - 6/14/2014 07:39:00 AM

Share

& Comment

Phrase-based statistical machine translation has emerged as the dominant paradigm in machine translation research. However, until now, most work in this field has been carried out on proprietary and in-house research systems. This lack of openness has created a high barrier to entry for researchers as many of the components required have had to be duplicated. This has also hindered effective comparisons of the different elements of the systems.

By providing a free and complete toolkit, we hope that this will stimulate the development of the
field. For this system to be adopted by the community, it must demonstrate performance that is comparable to the best available systems. Moses has shown that it achieves results comparable to the most competitive and widely used statistical machine translation systems in translation quality and runtime. It features all the capabilities of the closed sourced Pharaoh decoder.

Apart from providing an open-source toolkit for SMT, a further motivation for Moses is to extend phrase-based translation with factors and confusion network decoding. The current phrase-based approach to statistical machine translation is limited to the mapping of small text chunks without any explicit use of linguistic information, be it morphological, syntactic, or semantic. These additional sources of information have been shown to be valuable when integrated into pre-processing or post-processing steps.

Moses also integrates confusion network decoding, which allows the translation of ambiguous input. This enables, for instance, the tighter integration of speech recognition and machine translation. Instead of passing along the one-best output of the recognizer, a network of different word choices may be examined by the machine translation system. Efficient data structures in Moses for the memory-intensive translation model and language model allow the exploitation of much larger data resources with limited hardware.

The toolkit is a complete out-of-the-box translation system for academic research. It consists of all the components needed to preprocess data, train the language models and the translation models. It also contains tools for tuning these models using minimum error rate training and evaluating the resulting translations using the BLEU score (Papineni et al. 2002). Moses uses standard external tools for some of the tasks to avoid duplication, such as GIZA++ (Och and Ney 2003) for word alignments and SRILM for language modeling. Also, since these tasks are often CPU intensive, the toolkit has been designed to work with Sun Grid Engine parallel environment to increase throughput. In order to unify the experimental stages, a utility has been developed to run repeatable experiments. This uses the tools contained in Moses and requires minimal changes to setup and customize.

The toolkit has been hosted and developed under sourceforge.net since inception. Moses has an active research community and has reached over 1000 downloads as of 1 st March 2007. The main online presence is at http://www.statmt.org/moses/ where many sources of information about the project can be found. Moses was the subject of this year’s Johns Hopkins University Workshop on Machine Translation (Koehn et al. 2006). The decoder is the core component of Moses. To minimize the learning curve for many researchers, the decoder was developed as a drop-in replacement for Pharaoh, the popular phrase-based decoder.

In order for the toolkit to be adopted by the community, and to make it easy for others to contribute to the project, we kept to the following principles when developing the decoder:
  • Accessibility
  • Easy to Maintain
  • Flexibility
  • Easy for distributed team development
  • Portability

It was developed in C++ for efficiency and followed modular, object-oriented design.

Figure 1 - Installation Directory

Step 1 - Install the following packages using the command

g++
git
subversion
automake
libtool
zlib1g-dev
libboost-all-dev
libbz2-dev
liblzma-dev
python-dev
libtcmalloc-minimal4


sudo apt-get install g++ git subversion automake libtool zlib1g-dev libboost-all-dev libbz2-dev liblzma-dev python-dev libtcmalloc-minimal4
To compile Moses, you need the following installed on your machine:
g++
Boost

Step 2 - Installing Boost

In my home directory i done the following work , in your machine wget not working download the file and move to the Home directory.
wget http://downloads.sourceforge.net/project/boost/boost/1.55.0/boost_1_55_0.tar.gzr=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fboost%2Ffiles%2Fboost%2F1.55.0%2F&ts=13
tar zxvf boost_1_55_0.tar.gz
cd boost_1_55_0/
./bootstrap.sh
./b2 -j5 --prefix=$PWD --libdir=$PWD/lib64 --layout=tagged link=static threading=multi,single install || echo FAILURE
This create library file in the directory lib64, NOT in the system directory.
Note : In the last command " -j5 " is indicate my PC is 5 Core machine (i.e my processor is CORE I5 ),
If you are using different core machine change it in your core value.

Step 3 - Installing Moses

Once these are installed, run bjam.
git clone https://github.com/moses-smt/mosesdecoder.git
cd mosesdecoder/
./bjam -j5
If you installed moses successfully ,You see what options are available with bjam, run
./bjam --help
Once boost is installed, you can then compile Moses. However, you must tell Moses where boost is with the
--with-boost flag. This is the exact commands I use to compile Moses:
./bjam --with-boost=~/boost_1_55_0 -j5

Step 4 - Installing GIZA++

GIZA++ is hosted at Google Code 20 , and a mirror of the original documentation can be found
here 21 . I recommend that you download the latest version via svn:

svn checkout http://giza-pp.googlecode.com/svn/trunk/ giza-pp
cd giza-pp
make
This should create the binaries ~/giza-pp/GIZA++-v2/GIZA++, ~/giza-pp/GIZA++-v2/snt2cooc.out
and ~/giza-pp/mkcls-v2/mkcls. These need to be copied to somewhere that Moses can find
them as follows
cd ~/mosesdecoder
mkdir tools
cp ~/giza-pp/GIZA++-v2/GIZA++ ~/giza-pp/GIZA++-v2/snt2cooc.out \
~/giza-pp/mkcls-v2/mkcls tools
When you come to run the training, you need to tell the training script where GIZA++ was
installed using the -external-bin-dir argument.
train-model.perl -external-bin-dir $HOME/mosesdecoder/tools

Step 5 - Installing IRSTLM 

IRSTLM is a language modelling toolkit from FBK, and is hosted on sourceforge [http://sourceforge.net/projects/irstlm/]. Again, you should download the latest version. I used version 5.80.03 for this guide so assuming you downloaded the tarball into your home directory (and making the obvious changes if you download a later version) the following commands should build and install IRSTLM:

tar zxvf irstlm-5.80.03.tgz
cd irstlm-5.80.03
./regenerate-makefiles.sh
./configure --prefix=$HOME/irstlm-5.80.03
make install

You should now have several binaries and scripts in ~/irstlm-5.80.03/bin, in particular build-lm.sh

About Achchuthan Yogarajah

I’m passionate about Web Development and Programming and I go to extreme efforts to meet my passion. I’m a believer of learning the fundamentals first. I try to understand everything little bit more than the average.

Copyright © 2016 Believe in Yourself - Achchuthan Yogarajah ACHCHUTHAN.ORG. Designed by Templateism .