Velvet Star Monitor

Standout celebrity highlights with iconic style.

updates

Getting entities from pre-saved DocBin

Writer Matthew Harrington

I have around 700k documents that I want to process in spacy and save into a DocBin for later use.

I wrote a code to do a keywords search using phrasematcher and it worked great. I'm now trying to build a knowledge graph out of the DocBin I have and I can't seem to be able to access the entities to use them in the graph logic. I read somewhere that DocBins don't keep that information (?) but when I print DocBin.tokens I get some values and not just an empty output.

This might be a very stupid question but I'm quite lost and the documentation does not seem to be detailed enough for this.

import spacy
from spacy.tokens import DocBin
from spacy.vocab import Vocab
nlp = spacy.load('fr_dep_news_trf')
DocBinPath = r'C:\[Redacted]\FRdocBin.nlp'
loadedDocBin = DocBin(Vocab()).from_disk(DocBinPath)
DocList=list(loadedDocBin.get_docs(nlp.vocab))
for doc in DocList People = list(set([ent.text for ent in doc.ents if ent.label_=='PERSON'])) 

This doesn't produce any errors but doc.ents is empty.

This is the code for saving the Docbin:

FRdoc_bin = DocBin (store_user_data=True,attrs=['ENT_TYPE','LEMMA','LIKE_EMAIL','LIKE_URL','LIKE_NUM','ORTH','POS','HEAD','DEP'])
doc = frNLP(text)
FRdoc_bin.add(doc)
FRdoc_bin.to_disk(CreatedModelPath+r'\FRdocBin'+'.nlp')

2 Answers

If you want to use custom attrs, you need both ENT_IOB and ENT_TYPE for entities.

Are you sure that you need custom attrs in the first place? Have you customized the values for LIKE_URL or other lexical attrs? If not, the default attrs for DocBin should be fine.

1

Edit: I figured out the issue from the spacy discussion, it's quite simply that the fr model I was using doesn't support NER. Switched fr_core_news_lg and it worked :)

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.