Forum: CAT Tools Technical Help
Topic: List of standard subject categories for a translation memory (TM) wanted
Poster: Oliver Walter
Post title: That UNESCO list
Today (about 3 years late!) I've had a look at that UNESCO list.
It is a PDF file, about 1.2 MB in size, containing 19 pages, with the actual list of names of areas of science occupying 17 and a half pages (pp 2-19).
The page contents are graphical (i.e. images), with no extractable text, and I estimate it has a total of about 1900 science areas - probably much too detailed for practical use as TM subjects. The quality of the graphics is probably too poor for OCR to be of much use.
The top 2 levels of subject-area headings may be of interest for this purpose. e.g. at the top level:
11 Logic, 12 Mathematics, 21 Astronomy and Astrophysics - - - 63 Sociology, 71 Ethics, 72 Philosophy.
At the 2nd level:
1103 Deductive logic, 1203 Computer sciences, 1201 Algebra, 1204 Geometry, 2406 Biophysics.
The actual detailed subject areas are the 3rd level (e.g. Boolean algebra, Hybrid computing, Solar physics, Fused salts, Human anatomy. Pharmacodynamics).
If you want to estimate how many titles there are at each level, download your copy, count the relevant titles on one or two of the pages 2 to 18 and multiply by 17.5 (if you counted 1 page) or 8.7 (if you counted 2 pages to get a slightly more accurate estimate. I haven't done that!)
I would guess these details at the third level are of interest to librarians like Christine (of some years ago) but translators are more likely to be interested in the first or second level.
Michael's list, from TC might be more useful.
I've just discovered: there is (unsurprisingly) some information about this nomenclature in Wikipedia, e.g.:
[url removed]
Topic: List of standard subject categories for a translation memory (TM) wanted
Poster: Oliver Walter
Post title: That UNESCO list
Today (about 3 years late!) I've had a look at that UNESCO list.
It is a PDF file, about 1.2 MB in size, containing 19 pages, with the actual list of names of areas of science occupying 17 and a half pages (pp 2-19).
The page contents are graphical (i.e. images), with no extractable text, and I estimate it has a total of about 1900 science areas - probably much too detailed for practical use as TM subjects. The quality of the graphics is probably too poor for OCR to be of much use.
The top 2 levels of subject-area headings may be of interest for this purpose. e.g. at the top level:
11 Logic, 12 Mathematics, 21 Astronomy and Astrophysics - - - 63 Sociology, 71 Ethics, 72 Philosophy.
At the 2nd level:
1103 Deductive logic, 1203 Computer sciences, 1201 Algebra, 1204 Geometry, 2406 Biophysics.
The actual detailed subject areas are the 3rd level (e.g. Boolean algebra, Hybrid computing, Solar physics, Fused salts, Human anatomy. Pharmacodynamics).
If you want to estimate how many titles there are at each level, download your copy, count the relevant titles on one or two of the pages 2 to 18 and multiply by 17.5 (if you counted 1 page) or 8.7 (if you counted 2 pages to get a slightly more accurate estimate. I haven't done that!)
I would guess these details at the third level are of interest to librarians like Christine (of some years ago) but translators are more likely to be interested in the first or second level.
Michael's list, from TC might be more useful.
I've just discovered: there is (unsurprisingly) some information about this nomenclature in Wikipedia, e.g.:
[url removed]