Quantcast
Channel: ProZ.com Translation Forums
Viewing all 3905 articles
Browse latest View live

When a 50% match isn't a 50% match? | SDL Trados – up there with the best of them

$
0
0
Forum: CAT Tools Technical Help
Topic: When a 50% match isn't a 50% match?
Poster: Endre Both
Post title: SDL Trados – up there with the best of them

SDL Trados Studio 2017 thinks 0 out of 1 word plus different punctuation equates to a 62% match:

Both words are uppercased, after all.

Web-based CAT Tool solutions to install on own server | Wordfast Server + client

$
0
0
Forum: CAT Tools Technical Help
Topic: Web-based CAT Tool solutions to install on own server
Poster: Fi2 n Co
Post title: Wordfast Server + client

Hi all, I just stumbled upon this thread.

Why not try Wordfast Server?
It is a software you install on any Windows computer or Windows server. It will happily host your TMs and glossaries up until massive sizes and have extremely sharp and fast response times. Powerful concordance too.

BTW: It's free for freelancers (not agencies or organizations of any kind) for up to three client connections simultaneously.

Clients:
_You can connect all WF CAT tools to it.
_If you want a web-based version, you can open a Wordfast Anywhere account (free) and connect your TMs and glossaries from your own web server to it. I have tried it and it works!
_You just need to get the API (for TMs and gloss combinations) from your local server install, and then add it your client (WFA, WFPro, WF Classic).
_If you're good at programming, you can make your own UI to make API calls to your server (concordance searches etc.).

If some of you want additional information please ask specific questions, I can try to explain a little more.
Otherwise, I'm preparing a video series on this (slow process though) and I have existing videos describing the API connection system on my FI2Pro channel. CAT Guru also has some old ones on WF Server.

That's an interesting and powerful setup, if some of you actually get to do it, let me know how you feel about it.

Hope this helps

My bests :)

Segment length analysis?

$
0
0
Forum: CAT Tools Technical Help
Topic: Segment length analysis?
Poster: Mirko Mainardi

Hi everyone.

For some reason I never really thought about this before... but are there CAT tools that provide an analysis of segments based on their length? I mean, everyone knows that translating 2500 isolated 1 word terms is totally different from translating 25 segments comprised of 100 words of cohesive text each, even if the total wordcount is the same.

And if the answer to the above question is "no"... why?

Segment length analysis? | What would you like to see?

$
0
0
Forum: CAT Tools Technical Help
Topic: Segment length analysis?
Poster: SDL Community
Post title: What would you like to see?

You already have the analysis by character, word and segment. So if there were 2500 segments and 2500 words you'd know. How would you like to see the analysis?

Regards

Paul
[url removed]

Segment length analysis? | Agnostic

$
0
0
Forum: CAT Tools Technical Help
Topic: Segment length analysis?
Poster: Mirko Mainardi
Post title: Agnostic

Thank you for your reply Paul. At any rate, this was supposed to be an agnostic question (i.e. not specifically SDL-related).

Also, if I'm not mistaken, what the Studio analysis says is based on totals and "fuzzy bands", while what I meant is an analysis specifically based on number of words per segment.

In other words, if the analysis says a file has 100 segments, 2500 words, and 10000 characters, all I know is what the average length per segment is (25 words), but in reality, I could have a few segments with big chunks of text and a lot of smaller/tiny segments.

So, what I'm talking about is a breakdown based on segment length rather than (or "in addition to", of course...) fuzzy matching, so that a translator would have an additional metric to discern how time consuming a task could be, at a glance.

Segment length analysis? | Warning

$
0
0
Forum: CAT Tools Technical Help
Topic: Segment length analysis?
Poster: Philippe Etienne
Post title: Warning

A breakdown by source segment length could look like this:
< 5 words 18% (titles, software strings, headlines, tables: more time)
5-19 words 64% (sentences: standard)
> 19 words 18% (long sentences: perhaps more time to convey with style)

The middle band deserves discounts, I think.

Philippe

Segment length analysis? | Filter segments by length

$
0
0
Forum: CAT Tools Technical Help
Topic: Segment length analysis?
Poster: Jean Dimitriadis
Post title: Filter segments by length

In CafeTran Espresso, additionally to the CAT file analysis (number of segments/words/characters) [and SDL Trados can also provide these details without any fuzzy matching/TM attached], which gives a good idea of the average words number per segment, you can quickly sort (filter) segments by length (short or long first). I think MemoQ does offer that as well.

You can also use a QA step for displaying only segments above a user-defined maximum character count.

This is not a standard analysis as you mean it, but it does provide a rough overview that should be sufficient to understand at a glance whether the project has many short segments, many long segments, or a mix.

When speaking of translation difficulty, a quantitative analysis (especially total word count alone) can only get you that far.

I’m still refining my own pre-translation analysis process for time and translation difficulty estimation, it is a tricky subject for sure.

[Edited at 2019-04-26 18:23 GMT]

Segment length analysis? | Additional metric

$
0
0
Forum: CAT Tools Technical Help
Topic: Segment length analysis?
Poster: Mirko Mainardi
Post title: Additional metric

[quote]Jean Dimitriadis wrote:

When speaking of translation difficulty, a quantitative analysis (especially total word count alone) can only get you that far.
[/quote]

Yes Jean, I do agree, and that's why I wrote this would be "an additional metric to discern how time consuming a task could be, at a glance".

However, good point about sorting segments by length, although I would much prefer a report.

[quote]Philippe Etienne wrote:

A breakdown by source segment length could look like this:
< 5 words 18% (titles, software strings, headlines, tables: more time)
5-19 words 64% (sentences: standard)
> 19 words 18% (long sentences: perhaps more time to convey with style)

The middle band deserves discounts, I think.
[/quote]

Yeah, something like that, even though I would like a detailed breakdown, especially for shorter (i.e. <7 words) segments. Also, I don't think this could be used to give or request further discounts (in addition to those for fuzzies...). Just to make an example, a lot of 1-2 words segments would basically amount to glossary building, or would however take more time compared to longer and cohesive text, so in my opinion it would be useful to have a quick way to check that (ideally before accepting a project...).

[Edited at 2019-04-26 20:05 GMT]

Segment length analysis? | @Paul

$
0
0
Forum: CAT Tools Technical Help
Topic: Segment length analysis?
Poster: Samuel Murray
Post title: @Paul

[quote]SDL Community wrote:
So if there were 2500 segments and 2500 words you'd know. [/quote]

What if there were 1000 segments and 10 000 words? That's 10 words per segment, on average. But the time saving on very long segments does not cancel out the time wastage on very short segments. A 30-word segment does not really take more time per word than a 20-word segment, but a 3-word segment takes up much more time per word than a 10-word segment.

I mean, suppose 100 of those segments have only 1 word, and 100 have only 2 words, and 100 have only 3 words, then the average length of the remaining 700 segments (the remaining 9400 words) is 13 words per segment. The 300 short segments will take up far more time per word than the average.

It takes me (generally) just as long to translate a 1-word segment as a 3-word segment or even a 5-word segment. So for me, if I had wanted the weighted word count to be an accurate indication of the amount of time it will take to do the job, all segments of 5 words or less should be counted as 5 words.

So let's recalculate the the 10 000-word example:

100 x 1-word segments: 100 words actual, 500 words weighted
100 x 2-word segments: 200 words actual, 500 words weighted
100 x 3-word segments: 300 words actual, 500 words weighted
Other segments: 9400 words actual

The adjusted word count, then, is 10900 words (i.e. it would take two to three hours longer to complete the job than a strictly average 10 000 words).

[Edited at 2019-04-27 06:25 GMT]

Segment length analysis? | All agreed...

$
0
0
Forum: CAT Tools Technical Help
Topic: Segment length analysis?
Poster: SDL Community
Post title: All agreed...

[quote]Samuel Murray wrote:

[quote]SDL Community wrote:
So if there were 2500 segments and 2500 words you'd know. [/quote]

What if there were 1000 segments and 10 000 words? That's 10 words per segment, on average. But the time saving on very long segments does not cancel out the time wastage on very short segments. A 30-word segment does not really take more time per word than a 20-word segment, but a 3-word segment takes up much more time per word than a 10-word segment.

I mean, suppose 100 of those segments have only 1 word, and 100 have only 2 words, and 100 have only 3 words, then the average length of the remaining 700 segments (the remaining 9400 words) is 13 words per segment. The 300 short segments will take up far more time per word than the average.

It takes me (generally) just as long to translate a 1-word segment as a 3-word segment or even a 5-word segment. So for me, if I had wanted the weighted word count to be an accurate indication of the amount of time it will take to do the job, all segments of 5 words or less should be counted as 5 words.

So let's recalculate the the 10 000-word example:

100 x 1-word segments: 100 words actual, 500 words weighted
100 x 2-word segments: 200 words actual, 500 words weighted
100 x 3-word segments: 300 words actual, 500 words weighted
Other segments: 9400 words actual

The adjusted word count, then, is 10900 words (i.e. it would take two to three hours longer to complete the job than a strictly average 10 000 words).

[Edited at 2019-04-27 06:25 GMT] [/quote]

That's why I asked what you'd like to see. In terms of helping with project estimation this seems like an interesting way forward. Perhaps this is something we could do as a small plugin so you have an additional analysis. Any developer could add this using the API... but assuming nobody here can develop perhaps I'll add it to our list of things to do.

Regards

Paul

Which CAT tool(s) is the most effective and efficient for complex math equations

$
0
0
Forum: CAT Tools Technical Help
Topic: Which CAT tool(s) is the most effective and efficient for complex math equations
Poster: jssco90

Hello,

I would like to know which CAT tool would best help me translate high level mathametical book. I'm particularly concerned about the symbols and equations. Which CAT tools would help me preserve the complex math symbols and equations.

Thanks in advance

Which CAT tool(s) is the most effective and efficient for complex math equations | Unicode

$
0
0
Forum: CAT Tools Technical Help
Topic: Which CAT tool(s) is the most effective and efficient for complex math equations
Poster: DZiW
Post title: Unicode

Modern CATs support Unicode for special chars, so it's not a problem.

However, what exactly your issue: LaTeX? MS Word formulae? Or something else?
We never translated equations, often marking them as untranslatable or saving it as pictures to prevent editing.

Which CAT tool(s) is the most effective and efficient for complex math equations | Copying and pasting the equations from original to translated version

$
0
0
Forum: CAT Tools Technical Help
Topic: Which CAT tool(s) is the most effective and efficient for complex math equations
Poster: jssco90
Post title: Copying and pasting the equations from original to translated version

I translated a lot of math content with the help of MATE cat; however, when it downloads my translation as a Microsoft Word document, I lose all the equations. Then when I try to copy the equations from the original PDF in Adobe and paste them into the translated document (whether it is doc or pdf), it simply does not work. I have to include the equations, symbols and graphs in the translated version. Since MATE cat does not reproduce the equations, how can I copy them from the original document and paste them into the translated document?
Something that appears so simple is presenting a lot of difficulties for me.

Which CAT tool(s) is the most effective and efficient for complex math equations | not a CAT issue

$
0
0
Forum: CAT Tools Technical Help
Topic: Which CAT tool(s) is the most effective and efficient for complex math equations
Poster: DZiW
Post title: not a CAT issue

I'm afraid, it's not a CAT only issue, because (1) a PDF is mostly a DTP product not intended for editing and (2) there's no 100% equivalence between any different formats.

We used clean snapshots/pictures, adding relevant tags for searching.

The problem with editable equations is rather searchable and still not really solvable for the only almost* sure-fire way to copy a formula from one format to some other is... recreating the formula, alas. It does take more time, so just charge your client accordingly)

Which CAT tool(s) is the most effective and efficient for complex math equations | Probably no CAT solution

$
0
0
Forum: CAT Tools Technical Help
Topic: Which CAT tool(s) is the most effective and efficient for complex math equations
Poster: Kevin Fulton
Post title: Probably no CAT solution

From your description, you appear to be working with pdf files the source of which which may have been created either via DTP or by distilling from Word. The embedded equations are images which do not lend themselves to translation by CAT tools, as another poster has indicated. In such cases, I've used the "snapshot" feature of Adobe Acrobat (available in older editions, I can't say whether it's currently available) to copy and paste the original equation into my target document (indicated by a dummy placeholder in the translation). I'm surprised this hasn't worked for you.
When working with Word documents with embedded equations (created in Word), I've sometimes had success with MemoQ.

I suspect that you will have to use an equation editor to recreate your equations. Until relatively recently, this was a module in MS Word. There are commercially-available equation editors available, and, for all I know, freeware/shareware products as well.
Good luck!

Which CAT tool(s) is the most effective and efficient for complex math equations | Copy and pasting equations Adobe

$
0
0
Forum: CAT Tools Technical Help
Topic: Which CAT tool(s) is the most effective and efficient for complex math equations
Poster: jssco90
Post title: Copy and pasting equations Adobe

Thanks for your guys' input and suggestions.

I have attempted to use Adobe's snapshot feature; unfortunately, it was to no avail. I first used it to paste the equations into the translated doc file. It did work for a few equations, but for most, it did not. I also attempted to use the snapshot to copy and paste from the original PDF file into the target file (which I converted to PDF) and that did not work at all.

Should I request that the orginal file be in DOC format? Wouldn't that resolve the problem of having to reproduce all of the equations and graphs on my own?

Which CAT tool(s) is the most effective and efficient for complex math equations | Word doc might be a solution

$
0
0
Forum: CAT Tools Technical Help
Topic: Which CAT tool(s) is the most effective and efficient for complex math equations
Poster: Kevin Fulton
Post title: Word doc might be a solution

[quote]j
snip

Should I request that the orginal file be in DOC format? Wouldn't that resolve the problem of having to reproduce all of the equations and graphs on my own? [/quote]

If your client is an agency, you might not get the source DOC file, but it wouldn't hurt to ask, especially since I have the impression that this is a long file. It may be that the equations are also images, but if they are, they should be easier to copy into your target file.

I find it odd, however, that the client has requested a pdf file. Normally a translation is checked by another set of eyes, and Word documents are easier to edit. I know that clients sometimes want a pdf anyway, but they're harder to edit.

Which CAT tool(s) is the most effective and efficient for complex math equations | Use a PDF to DOC converter

$
0
0
Forum: CAT Tools Technical Help
Topic: Which CAT tool(s) is the most effective and efficient for complex math equations
Poster: IanDhu
Post title: Use a PDF to DOC converter

Not mainly for the the text in this case, but chiefly to capture the equations in a Word environment. For this, you should set the conversion parameters (settings) so that embedded images are not converted by OCR. I use Nuance Power PDF, which I find to be the most useful software for this purpose. I run Windows 7 on a 64-bit Dell machine. I have yet to explore the capabilities of Studio 2019 for handling equations, although I suspect that compiling each from scratch is lengthy and, in any case, may lie outside your client's requirements - do check this point, though, with the client.

I hope this helps.

With kind regards,

Adam Warren (IanDhu - translator 41189)

Which CAT tool(s) is the most effective and efficient for complex math equations | What is the original format?

$
0
0
Forum: CAT Tools Technical Help
Topic: Which CAT tool(s) is the most effective and efficient for complex math equations
Poster: Samuel Murray
Post title: What is the original format?

[quote]jssco90 wrote:
When I try to copy the equations from the original PDF in Adobe... [/quote]

PDF is, itself, an export format. If you wish to retain the equations, you may have to figure out what the original format was, and then translate the original file (the file that was used to generate the PDF file from). Do you know what the original format is, and do you have access to files in the original format? Or did the client just send you PDFs and expected you to deliver formatted DOC(X) files in return?

It is entirely possible that despite all your efforts to reproduce the equations in DOC(X) format, the client's DTP officer is simply going to ignore it and retain the equations that are already in the original file (and attempt to update the translatable bits of it). In other words, the client isn't going to send your DOC(X) file to the printers; instead, her DTP officer will copy/paste text from the DOC(X) file into their own DTP software. Ask the client if it would be okay for you to just refer to the equations using textual descriptions, and/or by pasting screenshots of it.

[Edited at 2019-04-29 06:42 GMT]

Segment length analysis? | MeToo

$
0
0
Forum: CAT Tools Technical Help
Topic: Segment length analysis?
Poster: Philippe Etienne
Post title: MeToo

[quote]Samuel Murray wrote:
...
It takes me (generally) just as long to translate a 1-word segment as a 3-word segment or even a 5-word segment. So for me, if I had wanted the weighted word count to be an accurate indication of the amount of time it will take to do the job, all segments of 5 words or less should be counted as 5 words.
...[/quote]
While opposed to potentially getting weighted wordcounts higher than the actual wordcount for philosophical reasons, I see the point. In all fairness, small segments shouldn't be "discounted".

Simpler to visualise than segment wordcount breakdown, I think such a weighted wordcount would already lead to a much more accurate anticipation of the translation time required.

But CAT tool makers, when coming up with "partial matches", "analyses", "non-existing matches that will exist later", "tags/numbers that don't count" and stuff, haven't implemented a kind of threshold (I also think that around 3-5 words is realistic) below which the contents of small segments are reported as full words, neither weighted, nor discounted.
If there are only a few mini-segments, the buyer would "lose" a few pennies, and it there are a lot, the translator would actually be paid for the extra-time needed.
However, I am aware that weighted wordcounts have long lost their primary function of anticipating the time required: for instance, 80% discounts on 95-99% concordance matches seem to be common practice with a certain type of agencies, whereas 15 years ago, most used a single discount rate for all 75-99% matches.
To actually anticipate the time needed, I use a slightly amended historical version of the three-thirds 33/66/100, with fuzzies in the 75-99% concordance band.

Besides, I can't imagine any CAT tool maker implementing any small-segment threshold, because its analyses would consistently yield higher weighted wordcounts compared to the competition. Hardly a selling argument in the agency market, which to a significant extent shapes what translators buy as CAT tools.
After almost 20 years of daily use of CAT tools, I've never seen any "ground-breaking", "innovating" or "killer" feature increase weighted wordcounts! And don't start me on the "significant productivity gains" to justify the downward trend of weighted wordcounts together with the downward trend of discount grids together with the stagnation of the unit rate.

Philippe
Viewing all 3905 articles
Browse latest View live