Forum: CAT Tools Technical Help
Topic: CAT-Tool FEATURE REQUEST
Poster: Samuel Murray
Post title: @Mikhailo
[quote]mikhailo wrote:
[quote]Samuel Murray wrote:
And the reason you're looking for such a utility is that it is very cumbersome to do that inside Trados itself, right?[/quote]
This can be said for any CAT, that stores TM in standalone file or DB... [/quote]
Well, no, we can't say that for certain unless we examine each tool.
Trados is somewhat cumbersome in this respect due to the fact that you can't add files to its SDLXLIFF import dialog by drag and drop. This means that all SDLXLIFF files must first be copied into a single folder. However, once you've copied the SDLXLIFF files to a single folder, you can add that folder to the TM import dialog, and it will run through all of the SDLXLIFF files. It's not super fast, though.
I ran a test with 1000 SDLXLIFF files totalling 2.3 GB in size, and it took Trados about 45 minutes to import 53 000 segments. Final TM size: 78 MB. 53 000 seems very little, so I'm not confident that Trados had imported all the segments (or perhaps it did not import duplicate segments). Anyway, each segment was accompanied by the user ID and date/time of segment creation, although the file names were not retained.
In this Trados test, the import speed did not decline over the course of the import -- the last files to be imported took just as long to import as the first files. One downside is that you can't break the import process halfway: if you "Cancel" midway, the entire import operation files, so it may be an idea to import only 100 files at a time.
I tried the same 1000 SDLXLIFF files with the SDLXliff2TMX utility that I linked to in a previous post. This utility does support drag and drop, so the SDLXLIFF files don't need to be all in the same folder. It also offers more options w.r.t. what you want to filter. It took 10 minutes and extracted 460 000 segments (presumably duplicate segments are retained). As with the Trados process, this utility retained user ID and date/time for each segment, but not the file name.
I would recommend that you try the SDLXliff2TMX utility. It outputs TMX, but it appears to be a TMX variant that retains the tags in a way that Trados will have retained the tags as well if you convert it to SDLTM format.
By the way, I tried a similar thing in Wordfast Pro 3, with 1200 TXML files totalling 300 MB (since TXML files do not contain the output files inside them). It took 10 minutes, and got me 370 000 segments (duplicate segments retained). Final TM size: 175 MB. Wordfast added the file name to each segment, but it wrote the same user ID for all segments and wrote the same date for all segments.
[Edited at 2019-02-08 12:56 GMT]