This page aims provides the individual stand-off phenotype annotations created using four concept recognition systems on four corpora. In addition, for each textual corpus, system-based silver standard corpora have been created using both exact matching, as well as sentence-level matching.
The four systems are:
The four corpora are:
All annotations are stored in stand-off tab based format in files carrying the names corresponding to the files listed in the original corpus. In the case of the Pubmed and CT_Phentoype corpora, the file names represent Pubmed or Clinical Trials IDs, which can be directly retrived from their original publishers. The stand-off annotation format is: startOffset::endOffset [tab] original text span [tab] list of CUIs separated by comma. Silver standard corpora created from the system annotations ommit the original text span and list only the offsets and the CUIs. Archives corresponding to the four corpora can be downloaded using the links below: