Identiying technologies is a key element or mapping a domain and its evolution. It allows managers and de- cision makers to anticipate trends or an accurate orecast and eective oresight. Researchers and practitioners are taking advantage o the rapid growth o the publicly accessible sources to map technological domains. Among these sources, patents are the widest technical open access database used in the literature and in practice. Nowadays, Natural Language Processing (NLP) techniques enable new methods or the analysis o patent texts. Among these techniques, in this paper we explore the use o Named Entity Recognition (NER) with the purpose to identiy the technologies mentioned in patents' text. We compare three dierent NER methods, gazetteer-based, rule-based and deep learning-based (e.g. BERT), measuring their perormances in terms o precision, recall and computational time. We test the approaches on 1600 patents rom our assorted IPC classes as case studies. Our NER systems collected over 4500 ne-grained technologies, achieving the best results thanks to the combination o the three methodologies. The proposed method overcomes the literature thanks to the ability to lter generic technological terms. Our study delineates a valid technology identication tool that can be integrated in any text analysis pipeline to support academics and companies in investigating a technological domain.
Technology identification from patent texts : a novel named entity recognition method
Puccetti, Giovanni
;
2023
Abstract
Identiying technologies is a key element or mapping a domain and its evolution. It allows managers and de- cision makers to anticipate trends or an accurate orecast and eective oresight. Researchers and practitioners are taking advantage o the rapid growth o the publicly accessible sources to map technological domains. Among these sources, patents are the widest technical open access database used in the literature and in practice. Nowadays, Natural Language Processing (NLP) techniques enable new methods or the analysis o patent texts. Among these techniques, in this paper we explore the use o Named Entity Recognition (NER) with the purpose to identiy the technologies mentioned in patents' text. We compare three dierent NER methods, gazetteer-based, rule-based and deep learning-based (e.g. BERT), measuring their perormances in terms o precision, recall and computational time. We test the approaches on 1600 patents rom our assorted IPC classes as case studies. Our NER systems collected over 4500 ne-grained technologies, achieving the best results thanks to the combination o the three methodologies. The proposed method overcomes the literature thanks to the ability to lter generic technological terms. Our study delineates a valid technology identication tool that can be integrated in any text analysis pipeline to support academics and companies in investigating a technological domain.File | Dimensione | Formato | |
---|---|---|---|
Technology identification from patent texts A novel named entity recognition method - 1-s2.0-S0040162522006813-main.pdf
Accesso chiuso
Tipologia:
Published version
Licenza:
Non pubblico
Dimensione
1.08 MB
Formato
Adobe PDF
|
1.08 MB | Adobe PDF | Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.