Forum: CAT Tools Technical Help
Topic: When a 50% match isn't a 50% match?
Poster: DZiW
Post title: culture-dependents: half-full is half-empty
Not dwelling too much on such "secret vendors' know-hows" as hashes, checksums, shingles, clusters, vectors, Levenshtein distances, encoders, SounEx, and other weird stuff, it's just an attempt to obfuscate the fact that very idea of "similar sentences"--let alone in different language--is but an expensive miscalculation.
Little by little modern trends steadily come to per-language structural [subj-pred-obj] parts aggregation, considering synonyms and weighting antonyms while sacrificing functional parts. A couple years ago I was pleasantly surprised to watch a demonstration where some app analyzed simple, complex, and compound sentences and could tell about similarity of the context--noting the antecedents (the meaning).
However, even in a new/small TM I never used a 50% fuzzy match, because I also doubt that many 'false positives' are any useful for speeding the process up
Topic: When a 50% match isn't a 50% match?
Poster: DZiW
Post title: culture-dependents: half-full is half-empty
Not dwelling too much on such "secret vendors' know-hows" as hashes, checksums, shingles, clusters, vectors, Levenshtein distances, encoders, SounEx, and other weird stuff, it's just an attempt to obfuscate the fact that very idea of "similar sentences"--let alone in different language--is but an expensive miscalculation.
Little by little modern trends steadily come to per-language structural [subj-pred-obj] parts aggregation, considering synonyms and weighting antonyms while sacrificing functional parts. A couple years ago I was pleasantly surprised to watch a demonstration where some app analyzed simple, complex, and compound sentences and could tell about similarity of the context--noting the antecedents (the meaning).
However, even in a new/small TM I never used a 50% fuzzy match, because I also doubt that many 'false positives' are any useful for speeding the process up