CAT tool with easily adjustable segmentation rules

Forum: CAT Tools Technical Help
Topic: CAT tool with easily adjustable segmentation rules
Poster: Samuel Murray
Post title: @Csaba

[quote]Csaba Lehel wrote:
Translation is normally done in a DOC table, left column for source, right for target. There is one special problem: since this will be subtitle, an average sentence is split into a few cells. [/quote]

OmegaT does have a function whereby it ignores line breaks in TXT files. So what you could do, is this:

1. Copy/paste the source column into a new Word file.
2. Convert the table to text.
3. Add <#> at the end of every line (i.e. find [b]^p[/b], replace with [b]<#>^p[/b]).
4. Save as plain text, i.e. TXT in Unicode or Unicode 8.

In OmegaT:

1. Create a new project and add the TXT file to it.
2. In OmegaT, go Options > File Filters. Select "Text" and click the Options button. Set the "Segment source text into paragraphs on" setting to "Never".
3. In OmegaT, go Options > Preferences > Tag Processing. In the field called "Regular expression for custom tags", type <#> (or: if there is already something in that field, add [b]|[/b]<#> to it).

Then do the translation. To create a line break, press Shift+Enter. Make sure every segment that has <#> in the source also has it in the target (an easy way to insert it is using Ctrl+Space a few times). You can check if you've forgotten any, by using Tools > Check issues at any time, and when you create the final file. If you keep forgetting that "Enter" is for moving to a new segment, you can disable it in Options > Preferences > General (use TAB to advance), but you'll still have to use Shift+Enter to insert a new line.

The reason for the <#> is to help you to check the final file to make sure it has the same number of lines as the original one, and that the cells are likely to match up when you paste it into MS Word in the end. I mean, if you're happy that you don't need to use the <#>, then you don't have to, obviously.

One downside to this method is that it's very difficult in OmegaT to merge or split a segment, so if you have more than one sentence in a single cell, OmegaT will show them as separate segments (which may not be a problem, but it's worth keeping in mind).

[Edited at 2019-06-09 11:59 GMT]

CAT tool with easily adjustable segmentation rules | @Csaba

Trending Articles

Black Angus Grilled Artichokes

Attharintiki Daaredhi: Bappu Gari Bommo Lyrics Translation

Kusvirana kana mambotukana kunonaka sei? – Makwirirwo anodiwa nevakadzi!

Lady Gaga & Bruno Mars – Die With A Smile (Acoustic) – Single [iTunes Plus M4A]

Moondru Mudichu 01-05-2017 – Polimer tv Serial

Download: Bicko Bicko ft Rich Bizzy & Crew G- Wanfulanganya (Prod by: Bicko...

Karimnagar District Police Office Mobile Numbers List in Telangana State

Punjab School Education Board Latest Exam Result 2016 www.pseb.ac.in

Senior High School (SHS) DLL - Organization and Management

99 God Status for Whatsapp, Facebook

Practice Sheet of Right form of verbs for HSC Students

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

RNS 510 C14 bricked after NAND erase

Michel Roux roast duck with cherries, cherry sauce and potatoes recipe on...

Online এ তৈরি করুন Fake Smart ID Card

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

[GET] High Ticket PDF Profits by Glynn Kosky - superhyped

Aoi Teshima – Mori no Chiisana Restaurant – Single [iTunes Plus M4A]

The Personal Assistant (JL Creation) (ENG+RUS) [L] [1.79GB]