Forum: CAT Tools Technical Help
Topic: Any CAT tool or TMS has a functionality of extracting website content for localisation?
Poster: Samuel Murray
Post title: @Devaki
[quote]languagesbureau wrote:
We have a requirement of quoting for localising a website with hundreds of links. The client won't give us ready content and wants us to extract it. [/quote]
When you say "links", do you mean that the client has given you links to all the pages that must be downloaded and translated? In other words, the client isn't asking you to visit pages and then follow additional links on those pages to additional pages, but instead: the only pages that you must translate are the pages whose links you have?
If so, then all you need is a program that can give a list of links and let it download the pages.
WinHTTrack is a well-known program for this. When you create your project, you have to edit the settings and set the scanning depth to zero, otherwise it will attempt to follow links within pages.
(Of course, what the others said about CMS etc. is also valid, but if the client gave you a list of links, I assume they'd be fine with you translating the HTML versions of the web pages.
[quote]languagesbureau wrote:
I am not sure how they will be uploading the content - they will need translation in Excel/Google Sheets. [/quote]
Aah, so you'd still have to paste the content into Excel files (presumably source text in column A and target text in column B). Do you they want separate Excel files for each HTML file? Sheesh.
[Edited at 2024-12-20 09:26 GMT]
Topic: Any CAT tool or TMS has a functionality of extracting website content for localisation?
Poster: Samuel Murray
Post title: @Devaki
[quote]languagesbureau wrote:
We have a requirement of quoting for localising a website with hundreds of links. The client won't give us ready content and wants us to extract it. [/quote]
When you say "links", do you mean that the client has given you links to all the pages that must be downloaded and translated? In other words, the client isn't asking you to visit pages and then follow additional links on those pages to additional pages, but instead: the only pages that you must translate are the pages whose links you have?
If so, then all you need is a program that can give a list of links and let it download the pages.
WinHTTrack is a well-known program for this. When you create your project, you have to edit the settings and set the scanning depth to zero, otherwise it will attempt to follow links within pages.
(Of course, what the others said about CMS etc. is also valid, but if the client gave you a list of links, I assume they'd be fine with you translating the HTML versions of the web pages.
[quote]languagesbureau wrote:
I am not sure how they will be uploading the content - they will need translation in Excel/Google Sheets. [/quote]
Aah, so you'd still have to paste the content into Excel files (presumably source text in column A and target text in column B). Do you they want separate Excel files for each HTML file? Sheesh.
[Edited at 2024-12-20 09:26 GMT]