Forum: CAT Tools Technical Help
Topic: TMLookup
Poster: Michael Joseph Wdowiak Beijer
Post title: aha
[quote]FarkasAndras wrote:
[quote]Michael Joseph Wdowiak Beijer wrote:
The TMX was created by Déjà Vu X3. Just sent it to you.
Thanks for the new version!
Michael [/quote]
As expected, this is due to creative tag formatting. The tmx has the tag split between two lines and TMLookup expects it to be on one line. Again, like a previous issue, this is because TMLookup doesn't have a proper xml parser because I can't be bothered to learn how to implement one. So instead of wrangling with a horrible coding problem I'm left wrangling with somewhat less horrible troubleshooting problems every now and then. So it goes. I could just look for xml:lang= without the tuv, but in principle some tmx files could have other elements where the language is specified with xml:lang=, not just the text itself. So then it could break on those. This can be solved in multiple ways of course, but none of them are trivial or appetizing to me. Implementing a proper parser is the least appetizing of all. So maybe I'll fix this... maybe not. Adding the language codes to the filename should work.
In the meantime, the new version of sqlite that will allow for somewhat fancier/faster text searches is trickling down the pipeline. It went through two stages and it is now one step away from where I can start fiddling with it. We'll see.
[/quote]
Indeed, I just had another look at it, and they sure got creative with the line breaks.
Until you fix it I'll just add the language codes to the filename, which seems to work fine. I might also mention to Atril support that their TMXs are a little weird.
Topic: TMLookup
Poster: Michael Joseph Wdowiak Beijer
Post title: aha
[quote]FarkasAndras wrote:
[quote]Michael Joseph Wdowiak Beijer wrote:
The TMX was created by Déjà Vu X3. Just sent it to you.
Thanks for the new version!
Michael [/quote]
As expected, this is due to creative tag formatting. The tmx has the tag split between two lines and TMLookup expects it to be on one line. Again, like a previous issue, this is because TMLookup doesn't have a proper xml parser because I can't be bothered to learn how to implement one. So instead of wrangling with a horrible coding problem I'm left wrangling with somewhat less horrible troubleshooting problems every now and then. So it goes. I could just look for xml:lang= without the tuv, but in principle some tmx files could have other elements where the language is specified with xml:lang=, not just the text itself. So then it could break on those. This can be solved in multiple ways of course, but none of them are trivial or appetizing to me. Implementing a proper parser is the least appetizing of all. So maybe I'll fix this... maybe not. Adding the language codes to the filename should work.
In the meantime, the new version of sqlite that will allow for somewhat fancier/faster text searches is trickling down the pipeline. It went through two stages and it is now one step away from where I can start fiddling with it. We'll see.
[/quote]
Indeed, I just had another look at it, and they sure got creative with the line breaks.
Until you fix it I'll just add the language codes to the filename, which seems to work fine. I might also mention to Atril support that their TMXs are a little weird.