CamemBERT-bio

CamemBERT-bio

a Tasty French Language Model Better for your Health

Inria Paris

ALMAnaCH

CamemBERT-bio

CamemBERT-bio is a state-of-the-art french biomedical language model built using continual-pretraining from camembert-base.

It was trained on a french public biomedical corpus of 413M words containing scientific documments, drug leaflets and clinical cases extrated from theses and articles. It shows 2.54 points of F1 score improvement on average on 5 different biomedical named entity recognition tasks compared to camembert-base.

CamemBERT-bio was trained and evaluated by Rian Touchent and Eric Villemonte de La Clergerie.

Download

HuggingFace

CamemBERT-bio is available on huggingface 🤗 :
https://huggingface.co/almamach/camembert-bio-base