keyword identification data maps

https://whimsical.com/en-4xif7jNqXbJ5pVfmEZjWeH

Keyword extraction target

In PatentPia, the targets for keyword extraction are: i) original text, ii) text recognized through OCR. Text sources include i) patents, articles, web sources, and ii) image sources.

Patent keyword extract location

PatentPia extracts keywords from each part of the composition of a patent disclosure. Typical places include i) title of invention + abstract + patent claims, ii) technical field + background technology + (summary), iii) embodiments + description of invention, iv) patent drawings, etc. Extracting keywords from 'title of invention + abstract + patent claims' is the most basic keyword extraction track.

Calculate keyword importance

Each keyword is weighted by weighing its extracted position, frequency of appearing, etc. The title of invention has the highest weight.

Keyword recognition

There are many expressions that look like keywords but are not keywords, or expressions that have no value as keywords. Also, expressions that are too long (high word count) are difficult to treat as keywords.

Identify keyword equivalence

There are various cases where you need to recognize the same keyword even though the expression itself is different. Typical cases are i) British English vs. United States of America English, ii) synonyms, iii) equivalence structures (A of B = BA), i) abbreviations, numbers, special symbols (hyphens, etc.), etc.