The page you are trying to access is out-of-date. Please use https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory. JRC-ACQUIS Multilingual Parallel Corpus V2.2

JRC logo

This page is obsolete.

Please go to the DGT-TM section of the JRC's Language Technology resource page:

https://ec.europa.eu/jrc/en/language-technologies>
You will find many more useful linguistic resources there.

DGT Translation Units, Version 1.0

The "translation units" are aligned sentences that have been provided by the Directorate-General for Translation of the European Commission by extraction from one of its large shared translation memories in Euramis (European advanced multilingual information system). In order to cut down the size, the extraction takes English as the source language. Users can select the language combination they want, using the extraction extension tool also provided by DGT.

Thay provide an extraction tool and the data grouped in 60 zip files (Volume_1.zip, ….Volume_60.zip), each of approximately 18 MB. Each zip file has dozens of tmx-files identified by the EUR-Lex number of the underlying documents of the acquis and a file list in txt specifying the languages in which the documents are available. Users can extract any language pair as follows, using the extraction tool TMXtract:

·        download the zip files (do not unzip them!), the extraction tool TMXtract (exe.file) and the file swt-win32-3218.dll onto your PC. The files must be in the same directory;

·        open TMXtract;

·        select Input files (Volume_1.zip, etc.; multiple selection is possible);

·        specify Output file (the result is always 1 file);

·        choose Source and Target language;

·        click on Start.

 

Documentation provided by DG Translation:

  1. Extraction acquis-def.doc
  2. Extraction acquis readme-def.doc
  3. EUR-Lex preprocessing.doc

LangTech logo


Page last updated 2007-04-26, LT Group - JRC

Valid HTML 4.01 Transitional

Site Meter