Medicines Text Mining Tool

Medicines text mining tool

Repository owner: NHS Digital Analytical Services

To contact us raise an issue on Github or via email and we will respond promptly.

Clinical statement

NHS Digital's Interoperable Medicines Programme has been involved in establishing a flow of medicine data from secondary care to improve medication safety, gain insights into overprescribing, understand the overuse of antibiotics, and improve the treatments related to COVID-19. Initial investigatory work with trusts and suppliers concluded that the standard for describing and coding medicines (dictionary of medicines and devices - dm+d) is only partially adopted by secondary care organisations. To enable any medicines data collection from individual hospitals to be comparable across England the programme has developed text mining functionality to map a hospital medicine description to the closest match in the dm+d standard (this functionality is known as Medicines Text Mining Tool - MTMT). Where there are too many variations between the medicine description and the closest dm+d description the match will appear as unmapped (exceeds a threshold). The mapped outputs can only be used for the secondary uses of data and must not be used for direct care (e.g. must not be used to map a hospital drug dictionary for direct care use).

The Medicines Text Mining Tool has been developed from data derived from CareFlow Medicines Management electronic Prescribing and Medicines Administration (ePMA) systems utilised in 25 trusts in England. The data set contained 50,841,362 prescribed items with 49,844 unique descriptions. From the list of prescribed items 89.3% were mapped to the dm+d standard.
There were 24,081,702 prescribed items where it was possible to compare the Medicines Text Mining Tool against manual dm+d mapping performed by Trusts. The results matched exactly 60.8% of the time and identified the same active ingredient (that is had a common VTM or VMP code) 99.3% of the time.
Assurances cannot be provided around the coverage of patient episodes nor bed days included in the data that was used to develop the "tool"
All reasonable endeavours have been undertaken to clinically assure the Medicines Text Mining Tool and reasonable attempts were undertaken to rectify issues and errors that were identified.
A list of issues that have been identified and could not be rectified are provided in appendix A, please note that list is not comprehensive - there may be further issues and incorrect mappings that have yet been identified.
Therefore, for reasons outlined above, NHS Digital cannot accept clinical responsibility for use of the "tool" and any outputs from use of the mapped data. The use of both the mapping tool and mapped data is the clinical responsibility of the user.

APPENDIX A - Known issues with the Medicines Text Mining Tool

No	Issue	Examples
1	Differences in dm+d and ePMA naming conventions	dm+d includes the salt name (sodium) in its naming of Levothyroxine sodium. If the ePMA term does not mention the salt name, the mapping tool may wrongly match it to liothyronine or leave it as unmappable dm+d has 2 VTMs for phenytoin (phenytoin and phenytoin sodium). The ePMA naming convention affects which MTMT match is returned The format of the ePMA term can affect the accuracy of the mapping. For example, the ePMA term 'Potassium Chloride (Sando-K) 12mmol K and 8mmol Cl Tablet' was matched to VTM Potassium chloride
2	Similar names may yield the wrong match	ePMA term 'Trospium 20 mg Tablets' was matched to the VTM Chlordiazepoxide due to its association with AMPs with the brand name 'Tropium' ePMA term 'Exorex cream' was matched to the VTM Coal tar due to its lexical similarity with 'Exorex lotion' which has a VTM of Coal tar ePMA term 'OPTIUM (FREESTYLE) Test Strip' was matched to the VTM Opium ePMA terms for 'Aquacel Ag' were matched to dm+d terms for 'Aquacel Ag+' ePMA terms containing the word 'clear', for example 'RESOURCE THICKEN UP STAGE 1 = LEVEL 2 CLEAR POWDER' was matched to the VTM 'Docusate' due to the lexical similarity with the word 'clear' in the AMP term for the product 'Clear ear drop' ePMA term 'FACTOR VIII 500iu/vWF 1200iu (VOCENTO) Injection' was matched to the VTM Factor XIII ePMA terms for 'Hepatitis B vaccine' matched to the VTM Riboflavin due to the term 'Vitamin B' ePMA term 'FLUPENTIXOL DECANOATE 100 mg/1 mL Injection' matched to VTM Flupentixol dihydrochloride
3	Multiple ingredient ePMA terms being matched inaccurately due to lexically similar dm+d entries	ePMA terms for 'Meningococcal ACWY' were matched to dm+d terms for 'Meningococcal group C' ePMA terms for 'Hepatitis A + B vaccine' were matched to dm+d terms for 'Hepatitis A vaccine' (and vice versa) ePMA terms for 'diphtheria, pertussis, polio and tetanus vaccines' were matched to the dm+d terms for 'diphtheria, tetanus and pertussis'
4	ePMA typos	Misspelling of pyrazinamide as pyrizinamide. ePMA term 'Rifater (rifampicin, isoniazid and pyrizinamide) tablets' was matched to VTM Rifampicin + Isoniazid Misspeling tafluprost as safluprost. ePMA term 'Safluprost + Timolol (Taptiqom) Unit Dose PF Eye Drops' was matched to VTM Timolol
5	Ambiguous ePMA term	ePMA term 'MOVICOL / LAXIDO sachets' was unmappable
6	ePMA term contains additional pre/post-fixed information, abbreviations, additional spacing or brevity of the ePMA term	ePMA term 'Adcal - D3 Caplet' matched to VTM Calcium carbonate ePMA term 'TRELEGY ELLIPTA fluticasone 92mcg/umeclidinium 55mcg/vilante' matched to VTM Fluticasone
7	Closest lexical match is not the correct match	ePMA terms for 'phosphate enema' matched to the VTM Phosphate. The VMPs 'Phosphates enema (Formula B) 128ml long tube' and 'Phosphates enema (Formula B) 128ml standard tube' are linked to the VTM 'Sodium acid phosphate + Sodium phosphate'
8	Mismatching due to flavours	ePMA term 'FORTISIP COMPACT PROTEIN (TROPICAL GINGER) Liquid' matched to VTM Ginger ePMA term 'Peppermint water' matched to VTM Water
9	Non-medicinal product entries in ePMA	ePMA terms such as 'drug not listed 1' and 'FLUIDS: See Paper Chart' were unmappable

Getting started

This code runs on a Databricks cluster (Spark version 2.4.5) with the following packages:

collections
datetime
enum
functools
operator
os
pandas
pyspark.broadcast
pyspark.sql
random
re
time
traceback
typing
unittest
uuid
warnings

The following dm+d tables are required as inputs to the pipeline:

amp
vmp
vtm
amp_parsed
vmp_parsed
form
route
unit_of_measure

To run the pipeline

Locate and run the init_schemas notebook. You will need to specify the following, which are the database and table names where the outputs will be written to:

db
match_lookup_final_name
unmappable_table_name
accuracy_table_name

Locate and run the run_notebooks notebook. You will need to specify:

source_dataset: Whether the input data is of type source_A or source_B
raw_input_table: If input is of type source_B, this must contain string fields named "medication_name_value" and "form_in_text". If input is of the type source_A, this must contain a string field named "Drug"
db: The database
batch_size: The number of rows you'd like to process (i.e. The number of rows in raw_input_table)
notebook_root: the location of this notebook

Licence

Medicines Text Mining Tool codebase is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
coe_code		coe_code
notebooks		notebooks
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
init_schemas.py		init_schemas.py
run_notebooks.py		run_notebooks.py
run_tests.py		run_tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medicines Text Mining Tool

Clinical statement

APPENDIX A - Known issues with the Medicines Text Mining Tool

Getting started

To run the pipeline

Licence

About

Releases

Packages

Languages

License

simonrnss/medicines-text-mining-tool

Folders and files

Latest commit

History

Repository files navigation

Medicines Text Mining Tool

Clinical statement

APPENDIX A - Known issues with the Medicines Text Mining Tool

Getting started

To run the pipeline

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages