Identiying technologies is a key element or mapping a domain and its evolution. It allows managers and de- cision makers to anticipate trends or an accurate orecast and eective oresight. Researchers and practitioners are taking advantage o the rapid growth o the publicly accessible sources to map technological domains. Among these sources, patents are the widest technical open access database used in the literature and in practice. Nowadays, Natural Language Processing (NLP) techniques enable new methods or the analysis o patent texts. Among these techniques, in this paper we explore the use o Named Entity Recognition (NER) with the purpose to identiy the technologies mentioned in patents' text. We compare three dierent NER methods, gazetteer-based, rule-based and deep learning-based (e.g. BERT), measuring their perormances in terms o precision, recall and computational time. We test the approaches on 1600 patents rom our assorted IPC classes as case studies. Our NER systems collected over 4500 ne-grained technologies, achieving the best results thanks to the combination o the three methodologies. The proposed method overcomes the literature thanks to the ability to lter generic technological terms. Our study delineates a valid technology identication tool that can be integrated in any text analysis pipeline to support academics and companies in investigating a technological domain.

Technology identification from patent texts : a novel named entity recognition method

Puccetti, Giovanni
;
2023

Abstract

Identiying technologies is a key element or mapping a domain and its evolution. It allows managers and de- cision makers to anticipate trends or an accurate orecast and eective oresight. Researchers and practitioners are taking advantage o the rapid growth o the publicly accessible sources to map technological domains. Among these sources, patents are the widest technical open access database used in the literature and in practice. Nowadays, Natural Language Processing (NLP) techniques enable new methods or the analysis o patent texts. Among these techniques, in this paper we explore the use o Named Entity Recognition (NER) with the purpose to identiy the technologies mentioned in patents' text. We compare three dierent NER methods, gazetteer-based, rule-based and deep learning-based (e.g. BERT), measuring their perormances in terms o precision, recall and computational time. We test the approaches on 1600 patents rom our assorted IPC classes as case studies. Our NER systems collected over 4500 ne-grained technologies, achieving the best results thanks to the combination o the three methodologies. The proposed method overcomes the literature thanks to the ability to lter generic technological terms. Our study delineates a valid technology identication tool that can be integrated in any text analysis pipeline to support academics and companies in investigating a technological domain.
2023
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
Information retrieval; named entity recognition; natural language processing; patents; technology analysis
File in questo prodotto:
File Dimensione Formato  
Technology identification from patent texts A novel named entity recognition method - 1-s2.0-S0040162522006813-main.pdf

Accesso chiuso

Tipologia: Published version
Licenza: Non pubblico
Dimensione 1.08 MB
Formato Adobe PDF
1.08 MB Adobe PDF   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11384/131223
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 11
social impact