Forum: CAT Tools Technical Help
Topic: CAT tool with easily adjustable segmentation rules
Poster: Samuel Murray
Post title: @Csaba
[quote]Csaba Lehel wrote:
Translation is normally done in a DOC table, left column for source, right for target. There is one special problem: since this will be subtitle, an average sentence is split into a few cells. [/quote]
OmegaT does have a function whereby it ignores line breaks in TXT files. So what you could do, is this:
1. Copy/paste the source column into a new Word file.
2. Convert the table to text.
3. Add <#> at the end of every line (i.e. find [b]^p[/b], replace with [b]<#>^p[/b]).
4. Save as plain text, i.e. TXT in Unicode or Unicode 8.
In OmegaT:
1. Create a new project and add the TXT file to it.
2. In OmegaT, go Options > File Filters. Select "Text" and click the Options button. Set the "Segment source text into paragraphs on" setting to "Never".
3. In OmegaT, go Options > Preferences > Tag Processing. In the field called "Regular expression for custom tags", type <#> (or: if there is already something in that field, add [b]|[/b]<#> to it).
Then do the translation. To create a line break, press Shift+Enter. Make sure every segment that has <#> in the source also has it in the target (an easy way to insert it is using Ctrl+Space a few times). You can check if you've forgotten any, by using Tools > Check issues at any time, and when you create the final file. If you keep forgetting that "Enter" is for moving to a new segment, you can disable it in Options > Preferences > General (use TAB to advance), but you'll still have to use Shift+Enter to insert a new line.
The reason for the <#> is to help you to check the final file to make sure it has the same number of lines as the original one, and that the cells are likely to match up when you paste it into MS Word in the end. I mean, if you're happy that you don't need to use the <#>, then you don't have to, obviously.
One downside to this method is that it's very difficult in OmegaT to merge or split a segment, so if you have more than one sentence in a single cell, OmegaT will show them as separate segments (which may not be a problem, but it's worth keeping in mind).
[Edited at 2019-06-09 11:59 GMT]
Topic: CAT tool with easily adjustable segmentation rules
Poster: Samuel Murray
Post title: @Csaba
[quote]Csaba Lehel wrote:
Translation is normally done in a DOC table, left column for source, right for target. There is one special problem: since this will be subtitle, an average sentence is split into a few cells. [/quote]
OmegaT does have a function whereby it ignores line breaks in TXT files. So what you could do, is this:
1. Copy/paste the source column into a new Word file.
2. Convert the table to text.
3. Add <#> at the end of every line (i.e. find [b]^p[/b], replace with [b]<#>^p[/b]).
4. Save as plain text, i.e. TXT in Unicode or Unicode 8.
In OmegaT:
1. Create a new project and add the TXT file to it.
2. In OmegaT, go Options > File Filters. Select "Text" and click the Options button. Set the "Segment source text into paragraphs on" setting to "Never".
3. In OmegaT, go Options > Preferences > Tag Processing. In the field called "Regular expression for custom tags", type <#> (or: if there is already something in that field, add [b]|[/b]<#> to it).
Then do the translation. To create a line break, press Shift+Enter. Make sure every segment that has <#> in the source also has it in the target (an easy way to insert it is using Ctrl+Space a few times). You can check if you've forgotten any, by using Tools > Check issues at any time, and when you create the final file. If you keep forgetting that "Enter" is for moving to a new segment, you can disable it in Options > Preferences > General (use TAB to advance), but you'll still have to use Shift+Enter to insert a new line.
The reason for the <#> is to help you to check the final file to make sure it has the same number of lines as the original one, and that the cells are likely to match up when you paste it into MS Word in the end. I mean, if you're happy that you don't need to use the <#>, then you don't have to, obviously.
One downside to this method is that it's very difficult in OmegaT to merge or split a segment, so if you have more than one sentence in a single cell, OmegaT will show them as separate segments (which may not be a problem, but it's worth keeping in mind).
[Edited at 2019-06-09 11:59 GMT]