Quantcast
Channel: ProZ.com Translation Forums
Viewing all articles
Browse latest Browse all 3915

When a 50% match isn't a 50% match? | It's a bit of science and a bit of magic

$
0
0
Forum: CAT Tools Technical Help
Topic: When a 50% match isn't a 50% match?
Poster: Samuel Murray
Post title: It's a bit of science and a bit of magic

[quote]Chris S wrote:
I did a rare CAT job today and noticed this:
...
This came up as a 50% match. [/quote]

By character, 65% of the segment in the TM matches 40% of the segment in the source text.

I believe CAT tools that can't do proper morphological stemming/tokenization may try to strike a balance between word matching and character matching. My own CAT tool, WFC, favours character matching when the segment is short, and word matching when the segment is long. This leads to things similar to Roman's construction/instruction.

[quote]Since when do four words out of 14 make a 50% match? [/quote]

Is the proposed translation 50% useful to you? If yes, then it is a true 50% match. If not, then they didn't get the magic quite right, but magic isn't precise anyway.

[quote]Jean Dimitriadis wrote:
Out of curiosity, which CAT tool did you use?
Mine gives a 33% match, also counting the word "til". [/quote]

Without any morphological analysis (i.e. default tokenizer, "language unknown"), OmegaT says it's below the match threshold (i.e. below 30%). With an English tokenizer, OmegaT says it's a 38% match. But with the Danish tokenizer, OmegaT says it's a 50% match.

[Edited at 2018-11-16 18:34 GMT]

Viewing all articles
Browse latest Browse all 3915

Trending Articles