Quantcast
Channel: ProZ.com Translation Forums
Viewing all articles
Browse latest Browse all 3915

Split multi-sentence TMX segments into single sentence segments | Filter via regular expression

$
0
0
Forum: CAT Tools Technical Help
Topic: Split multi-sentence TMX segments into single sentence segments
Poster: CafeTran Training
Post title: Filter via regular expression

[quote]Samuel Murray wrote:

Something that would help me a great deal is a tool that can extract all segments with more than one sentence in it. [/quote]

You can use a regular expression to filter on segments that contain multiple sentences.

Here the project (CafeTran can handle TMX files like projects):

Filtered:

You can then Split and Merge left and right (source and target).

It's of course also possible to write these filtered segments to a new file (and delete them from the current one), duplicate, triplicate them.

Then via Find and Replace and regular expressions you can remove all second (third etc.) sentences from every segment (left and right). Repeat this in another copy of the file for all first and fourth (fifth etc.) sentences.

Here demonstrated to remove the first sentence of a paragraph:

Of course this doesn't consider abbrev.--you'll have to enhance the regular expression for that. If it's possible, at all.

[Edited at 2016-06-27 17:11 GMT]

Viewing all articles
Browse latest Browse all 3915

Trending Articles