Quantcast
Viewing all articles
Browse latest Browse all 3905

How to extract content from SGML to create TMX file | Bitext and regex

Forum: CAT Tools Technical Help
Topic: How to extract content from SGML to create TMX file
Poster: Meta Arkadia
Post title: Bitext and regex

[quote]Michael Beijer wrote:
I don't see how this will work. If you remove all the markup, you have also removed the info you need to convert it into its two languages. The text in your screenshot is all run together. How are you going to turn that into vn-en?[/quote]

It looks like bitext, so the lot should be proceeded by the code. Unfortunately, I don't know how to do that, especially not for Vietnamese. We'll have to ask Andras.

[quote]SDL Community wrote:
This will look at the ID column (column A) and check if's an even number or not. If it is then it will copy the contents in the cell. If it's an odd number it puts nothing.[/quote]

With my first try, I got something similar, if not easier:

However, I encountered encoding problems that I didn't want to try to solve because I used Mac-only apps nobody else uses, and my Vietnamese is lousy at that.

How did you get rid of all those superfluous spaces? Did you regex them away, or did they disappear automagically? What are they doing there anyway?

Cheers,

Hans


Viewing all articles
Browse latest Browse all 3905

Trending Articles