Forum: CAT Tools Technical Help
Topic: TMLookup
Poster: Michael Beijer
Post title: cool, thanks!
[quote]FarkasAndras wrote:
[quote]Michael Beijer wrote:
Any idea what is causing this? [/quote]
This is caused by the unusual tmx generated by felix and the half-assed tmx processing done by TMLookup. I'll fix it in the next release.
Boring technical details: TML doesn't properly parse TMX because I don't know much about xml parsers, and the only available parser I found that was specifically written for tmx is crap. So I just hacked together some regex code that extracts the relevant parts from TMX, but there's always the chance for screwups with a solution like that if the xml/tmx is not exactly how you expected it. I think I noted this here and asked for testing. Anyway, my code expects the tuv tag to end right after the language is declared, like so: a . Felix tacks on the creationdate and importing the tmx fails because TMLookup can't tell what the languages are. Olifant removes the creationdate from one of the two langs, so at least one is recognized and the import proceeds (with a warning message). I now fixed the bug and added some primitive error handling so that the import can continue even if the languages are not recognized.
Nobody reported this bug so far, which means that felix might be the only tool that adds something in the tuv tag after the language code. Most tools add the creationdate in the tu (segment level) tag, which makes a lot more sense than adding the creationdate [i]twice[/i] into every segment, into each tuv (language level) tag. Not sure if what felix does is just a little unusual or wrong, but it doesn't matter much. [/quote]
I'll ask him. TMX isn't the standard TM format in his tool (he has his own, special .ftm XML format), so maybe it was just an oversight.
Michael
Topic: TMLookup
Poster: Michael Beijer
Post title: cool, thanks!
[quote]FarkasAndras wrote:
[quote]Michael Beijer wrote:
Any idea what is causing this? [/quote]
This is caused by the unusual tmx generated by felix and the half-assed tmx processing done by TMLookup. I'll fix it in the next release.
Boring technical details: TML doesn't properly parse TMX because I don't know much about xml parsers, and the only available parser I found that was specifically written for tmx is crap. So I just hacked together some regex code that extracts the relevant parts from TMX, but there's always the chance for screwups with a solution like that if the xml/tmx is not exactly how you expected it. I think I noted this here and asked for testing. Anyway, my code expects the tuv tag to end right after the language is declared, like so: a . Felix tacks on the creationdate and importing the tmx fails because TMLookup can't tell what the languages are. Olifant removes the creationdate from one of the two langs, so at least one is recognized and the import proceeds (with a warning message). I now fixed the bug and added some primitive error handling so that the import can continue even if the languages are not recognized.
Nobody reported this bug so far, which means that felix might be the only tool that adds something in the tuv tag after the language code. Most tools add the creationdate in the tu (segment level) tag, which makes a lot more sense than adding the creationdate [i]twice[/i] into every segment, into each tuv (language level) tag. Not sure if what felix does is just a little unusual or wrong, but it doesn't matter much. [/quote]
I'll ask him. TMX isn't the standard TM format in his tool (he has his own, special .ftm XML format), so maybe it was just an oversight.
Michael