CamemBERT-bio is a state-of-the-art french biomedical language model built using continual-pretraining from camembert-base.
It was trained on a french public biomedical corpus of 413M words containing scientific documments, drug leaflets and clinical cases extrated from theses and articles. It shows 2.54 points of F1 score improvement on average on 5 different biomedical named entity recognition tasks compared to camembert-base.
CamemBERT-bio was trained and evaluated by Rian Touchent and Eric Villemonte de La Clergerie.
CamemBERT-bio is available on huggingface 🤗 :
https://huggingface.co/almamach/camembert-bio-base