!pip install -q imageio
!pip install -q termcolor
import requests, json
import matplotlib.pyplot as plt
import imageio
import numpy as np
from termcolor import colored
import time
- These helper functions are used only for this notebook for displaying results returned from the microservice API gateway.
- They are not necessary when using the the microservice APIs when deploying the frontend.
class bcolors:
HEADER = '\033[95m'
OKBLUE = '\033[94m'
OKCYAN = '\033[96m'
OKGREEN = '\033[92m'
WARNING = '\033[93m'
FAIL = '\033[91m'
ENDC = '\033[0m'
BOLD = '\033[1m'
UNDERLINE = '\033[4m'
def print_title(title, max_len = 110):
title = title +" " * (max_len-len(title)) +' "Cite'
print(bcolors.BOLD + bcolors.HEADER + title + bcolors.ENDC)
def print_year(year):
print(bcolors.BOLD + bcolors.OKBLUE + str(year) + bcolors.ENDC)
def print_section_name( section_name ):
print(bcolors.BOLD + section_name + bcolors.ENDC)
def print_highlighted_sentence(sen, end ="\n"):
sen = sen.replace("\n"," ").replace("\r"," ")
print(bcolors.BOLD + bcolors.WARNING + str(sen)[:150] + bcolors.ENDC, end=end, flush =True)
def remove_none(a):
if a is None:
return ""
else:
return a
def print_authors(authors):
author_list = []
for author in authors:
given_name = remove_none(author.get("GivenName", ""))
family_name = remove_none(author.get("FamilyName", ""))
author_list.append( given_name+" "+family_name)
print(" ".join(author_list))
def print_pagination_results(pagination_results):
for res in pagination_results:
print_title(res.get("Title",""))
if isinstance(res.get("PublicationDate",{}),dict):
print_year(res.get("PublicationDate",{}).get("Year",""))
if isinstance(res.get("Author",[]),list) and len(res.get("Author",[]))>0:
print_authors(res.get("Author",[]))
if isinstance(res.get("relevant_sentences",[]),list) and len(res.get("relevant_sentences",[]))>0:
print()
print("...".join(res.get("relevant_sentences",[])))
if res.get("Content",{}).get("Abstract","").strip() != "":
print( res.get("Content",{}).get("Abstract","") )
print("\n")
print()
def print_paper( paper_info, collection):
print_title(paper_info.get("Title",""))
print("source: ", collection)
if isinstance(paper_info.get("PublicationDate",{}),dict):
print_year(paper_info.get("PublicationDate",{}).get("Year",""))
print_authors(paper_info.get("Author",[]))
print("\n")
if "Content" in paper_info:
for tag in ["Abstract_Parsed","Fullbody_Parsed"]:
if tag not in paper_info["Content"]:
continue
for section in paper_info["Content"][tag]:
print_section_name( section.get("section_title","") )
print("\n")
for para in section.get("section_text",[]):
for sen in para.get("paragraph_text",[]):
sim = sen.get("sentence_similarity",0.0)
text = sen.get("sentence_text","")
if sim >0:
print_highlighted_sentence(text, end = " ")
else:
print(text, end=" ", flush = True)
print("\n")
def color_sen(sen):
sen_text = sen["sentence_text"]
cite_spans = sen["cite_spans"]
colored_text = ""
current_pos = 0
for cite_span in cite_spans:
start, end = int(cite_span["start"]), int(cite_span["end"])
colored_text += sen_text[current_pos:start]+colored(sen_text[start:end],"blue")
current_pos = end
colored_text += sen_text[current_pos:]
return colored_text
def print_para( para ):
corlored_para = ""
for sen in para["paragraph_text"]:
corlored_para += color_sen(sen) + " "
print(corlored_para)
def print_sec( sec ):
print(bcolors.BOLD+ sec["section_title"] +bcolors.ENDC + "\n")
for para in sec["section_text"]:
print_para(para)
print()
As shown in the architecture, once the backend service has been started, the API gateway is used the bridge the connection between frontend and backend. Therefore, from the perspective of frontend (or API user), we only need to know the address of the API gateway.
Suppose that in the file final_api_gateway/docker-compose.yaml, the port mapping rule looks like:
ports:
- 8060:8060 # Host PORT : container PORT
this means it maps the host machine's 8060 port to the port 8060 of API gateway's docker container. In this case, when we can to send request to the API gateway, the base address (without endpoint name) is:
http://localhost:8060
where localhost refers to the hostmachine. We can further use ngrok to map the port 8060 to www, to make the API gateway accessible from anywhere in the world. Then the API address will be changed to
https://xxxxx.ngrok.io (no port number is needed)
In this example, we run the backend in our local machine, so we set (suppose the host port is 8060):
api_gateway_address = "http://localhost:8060"
Service address refers to the address to the corresponding endpoint: api gateway's base address + endpoint name
api_gateway_address+"/ml-api/doc-search/v1.0"
'http://localhost:8060/ml-api/doc-search/v1.0'
-
Function: perform document search under different scenarios, including:
a) semantic search: selected content (text or paper) as ranking source & keywords filtering
b) search from my library
c) generic search: given the title of a paper, find this paper from our database -
Input: a request dictionary that contains the following key-value pairs:
a) "ranking_variable":
optional, default: ""(empty string)
texts that users type in the "ranking source text" box, or texts that are selected from the manuscript as the query. If "ranking_variable" is provided, then the value of "ranking_variable" will be used as final ranking source.
b) "ranking_collection":
optional, default: ""
If users want to find relevant papers of a certain query paper, then the collection name of that query paper must be provided.
c) "ranking_id_field":
optional, default: ""
If users want to find relevant papers of a certain query paper, then the paper id field of that query paper must be provided. E.g. the paper id type can be "id_int", which is most commonly used. It is also possible to use the "DOI" field as paper id.
d) "ranking_id_value":
optional, default: ""
If users want to find relevant papers of a certain query paper, then the paper id value of that query paper must be provided.
e) "ranking_id_type":
optional, default: "", not necessary
f) "keywords":
optional, default: ""
This is used for keyword boolean filter. If all the keys above are not provided and only keywords provided, then use keywords alse as ranking source. When users want to perform a generic search, e.g. search for a paper given title, then simply use the title as "keywords". In this case, the ranking variable can either be the abstract (if available) or "" (empty string).
g) "paper_list":
optional, default: ""
This is used to specify a range of papers within wich users search for relevant papers. For example, if users want to search for relevant papers from "My Libraray", then set the value of "paper_list" as the list of the ids of the papers in users' libraray, with the following format:
[ { "collection":"PMCOA", "id_field": "id_int", "id_value": 123 }, ... ]
h) "username":
optional, default: ""
i) "nResults":
The maximum number of returned paper ids
optional, default: 100
Max number of results to be returned. -
Output: A results dictionary with the following format:
{ "query_id": "fa27a7a6-0026-45f9-9c81-e47cf4ea5c57",
"response": [ {'collection': 'pubmed', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 123}, ... ],
'search_stats': {'DurationTotalSearch': 87, 'nMatchingDocuments': 1000000}
}
Here "query_id" is a randomly generated unique id string that is associated with current search behavior. This can be used to achieve the click feedback function. For example, when users search for something, after they get the searched results and they click a certain paper, then the frontend can return current "query_id" and the id information of the clicked paper back to the click-feedback api (described below).
ranking_variable = "recent progress in Covid-19"
keywords = "covid"
query_data ={
"ranking_variable":ranking_variable,
"keywords":keywords,
}
output = requests.post( api_gateway_address+"/ml-api/doc-search/v1.0",
data = json.dumps( query_data ),
headers = {"Content-Type": "application/json"} ).json()
print(output)
{'query_id': 'd510ee29-6c02-490f-b38d-a5e57fdf7c97', 'response': [{'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 798}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 499}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 664}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 251}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 38}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 722}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 213}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 504}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 1011}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 638}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 346}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 666}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 905}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 237}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 708}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 712}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 340}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 1007}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 787}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 535}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 404}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 700}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 461}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 784}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 212}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 12}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 143}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 640}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 292}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 843}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 44}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 557}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 547}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 813}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 448}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 822}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 23}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 752}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 337}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 745}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 530}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 786}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 250}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 781}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 1027}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 502}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 435}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 639}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 77}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 704}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 331}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 870}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 993}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 641}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 1006}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 732}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 645}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 324}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 680}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 89}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 808}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 32}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 966}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 554}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 505}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 5}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 815}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 189}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 391}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 359}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 740}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 768}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 466}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 624}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 97}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 337}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 869}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 586}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 246}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 180}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 725}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 858}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 847}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 1020}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 1019}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 855}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 67}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 501}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 650}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 898}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 173}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 586}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 77}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 979}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 306}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 394}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 666}, {'collection': 'PMCOA', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 986}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 111}, {'collection': 'arXiv', 'id_field': 'id_int', 'id_type': 'int', 'id_value': 607}], 'search_stats': {'DurationTotalSearch': 9695, 'nMatchingDocuments': 143}}
api_gateway_address+"/ml-api/get-papers/v1.0"
'http://localhost:8060/ml-api/get-papers/v1.0'
-
Function: Given a list of paper ids, return the content of those papers. The frontend can also specify which field (e.g abstract/title/date) to return. This API can be generally used to display a list papers. For example, display a page of search results, or display all papers in my library
-
Input:
a) "paper_list":
mandatory
a list of papers' ids to be displayed and highlighted. For example, a list of ids of the papers to be displayed on the first retrived page, or a list ids of the papers in "My Library"
Format: [ { "collection":"PMCOA", "id_field": "id_int", "id_value": 123 }, ... ]
b) "projection":
optional, default: ""
The frontend can specify which field to return, with a high level of flexibility. This parameter is similar to the pymongoDB api. If "projection" is not specified, the by default the server will return the following fields: "PublicationDate", "Author", "Title", "Venue", "id_int", "relevant_sentences". If the frontend requires only title and abstract, then set the value of "projection" as the following: { "Title":1, "Content.Abstract":1 } -
Output: A list of papers, each of which has structured information confined by the "projection".
Note: the output will always be a list, even if the value of the "paper_list" is a list of one element.
-
If in the paper_list at a certain position a wrong paper id information provide, the service will return a placeholder {} for the corresponding paper, in order to make sure the returned paper list and the paper_list parameter have the same length
-
If the paper_list parameter is empty [], the service will just return an empty list []
Given a query, search for relevant papers and get the paper's content
ranking_variable = "recent progress in Covid-19"
keywords = "covid"
query_data ={
"ranking_variable":ranking_variable,
"keywords":keywords,
}
searched_document_ids = requests.post( api_gateway_address+"/ml-api/doc-search/v1.0",
data = json.dumps( query_data ),
headers = {"Content-Type": "application/json"} ).json()["response"]
search_document_contents = requests.post( api_gateway_address+ "/ml-api/get-papers/v1.0",
data = json.dumps( {
"paper_list":searched_document_ids,
} ),
headers = {"Content-Type": "application/json"} ).json()["response"]
Let's have a look at the first 10 search results:
print_pagination_results( search_document_contents[:10])
�[1m�[95mA systematic review of participatory approaches to empower health workers in low- and middle-income countries, highlighting Health Workers for Change "Cite�[0m
�[1m�[94m2023�[0m
�[1m�[95mAcceptance of Public Health Measures During the COVID-19 Pandemic: A Cross-Sectional Study of the Swiss Population’s Beliefs, Attitudes, Trust, and Information-Seeking Behavior "Cite�[0m
�[1m�[94m2023�[0m
Maddalena Fiordelli Maddalena Fiordelli Sara Rubinelli Sara Rubinelli Nicola Diviani Nicola Diviani
�[1m�[95mDoes COVID-19 vaccine exacerbate rotator cuff symptoms? A prospective study "Cite�[0m
�[1m�[94m2023�[0m
Servet İğrek İbrahim Ulusoy Aytek Hüseyin Çeliksöz
�[1m�[95mA rapid cell-free expression and screening platform for antibody discovery "Cite�[0m
�[1m�[94m2023�[0m
Andrew C. Hunt Andrew C. Hunt Bastian Vögeli Bastian Vögeli Ahmed O. Hassan Laura Guerrero Laura Guerrero Weston Kightlinger Weston Kightlinger Danielle J. Yoesep Danielle J. Yoesep Antje Krüger Antje Krüger Madison DeWinter Madison DeWinter Madison DeWinter Michael S. Diamond Michael S. Diamond Michael S. Diamond Michael S. Diamond Ashty S. Karim Ashty S. Karim Michael C. Jewett Michael C. Jewett Michael C. Jewett Michael C. Jewett Michael C. Jewett
�[1m�[95mAssociations of online religious participation during COVID-19 lockdown with subsequent health and well-being among UK adults "Cite�[0m
�[1m�[94m2023�[0m
Koichiro Shiba Koichiro Shiba Richard G. Cowden Natasha Gonzalez Yusuf Ransome Atsushi Nakagomi Ying Chen Matthew T. Lee Tyler J. VanderWeele Tyler J. VanderWeele Tyler J. VanderWeele Daisy Fancourt
�[1m�[95mThe diversity of providers’ and consumers’ views of virtual versus inpatient care provision: a qualitative study "Cite�[0m
�[1m�[94m2023�[0m
Robyn Clay-Williams Peter Hibbert Peter Hibbert Ann Carrigan Ann Carrigan Natalie Roberts Elizabeth Austin Diana Fajardo Pulido Isabelle Meulenbroeks Hoa Mi Nguyen Mitchell Sarkies Sarah Hatem Katherine Maka Graeme Loy Jeffrey Braithwaite
�[1m�[95mUtilizing a Capture-Recapture Strategy to Accelerate Infectious Disease Surveillance "Cite�[0m
�[1m�[94m2023�[0m
Lin Ge Yuzi Zhang Lance Waller Robert Lyles
�[1m�[95mWhat is the health status of girls and boys in the COVID-19 pandemic? Selected results of the KIDA study "Cite�[0m
�[1m�[94m2023�[0m
Julika Loss Miriam Blume Laura Neuperdt Nadine Flerlage Tim Weihrauch Kristin Manz Roma Thamm Christina Poethko-Müller Elvira Mauz Petra Rattay Jennifer Allen Mira Tschorn Mira Tschorn
�[1m�[95mProlonged T-cell activation and long COVID symptoms independently associate with severe COVID-19 at 3 months "Cite�[0m
�[1m�[94m2023�[0m
Marianna Santopaolo Michaela Gregorova Fergus Hamilton David Arnold Anna Long Aurora Lacey Elizabeth Oliver Alice Halliday Holly Baum Kristy Hamilton Rachel Milligan Olivia Pearce Lea Knezevic Begonia Morales Aza Alice Milne Emily Milodowski Eben Jones Rajeka Lazarus Anu Goenka Anu Goenka Adam Finn Adam Finn Adam Finn Nicholas Maskell Andrew D Davidson Kathleen Gillespie Linda Wooldridge Laura Rivino
�[1m�[95mUsing technology to reduce critical deterioration (the DETECT study): a cost analysis of care costs at a tertiary children's hospital in the United Kingdom "Cite�[0m
�[1m�[94m2023�[0m
Eduardo Costa Eduardo Costa Céu Mateus Bernie Carter Holly Saron Chin-Kien Eyton-Chong Fulya Mehta Steven Lane Sarah Siner Jason Dean Michael Barnes Chris McNally Caroline Lambert Caroline Lambert Bruce Hollingsworth Enitan D. Carrol Enitan D. Carrol Gerri Sefton
api_gateway_address+"/ml-api/extractive-summarize/v1.0"
'http://localhost:8060/ml-api/extractive-summarize/v1.0'
-
Function: Given a document containing a list of sentences, return the highlights of it.
-
Input:
a) "sentence_list" : a list of sentences, e.g., we can convert a full scientific paper into a list of consecutive sentences, and use it as the input -
Output: Extracted summary (highlights) of the given paper. It is a dictionary containing
"summary": extracted_sen, "sentence_position":sen_pos
We can search for a paper, and get the extractive summary of it
ranking_variable = "recent progress in Covid-19"
keywords = "covid"
query_data ={
"ranking_variable":ranking_variable,
"keywords":keywords,
}
searched_document_ids = requests.post( api_gateway_address+"/ml-api/doc-search/v1.0",
data = json.dumps( query_data ),
headers = {"Content-Type": "application/json"} ).json()["response"]
## We get the content of the first paper
document_content = requests.post( api_gateway_address+ "/ml-api/get-papers/v1.0",
data = json.dumps( {
"paper_list":searched_document_ids,
} ),
headers = {"Content-Type": "application/json"} ).json()["response"][0]
## we need to convert the paper's full text into a list of sentences
sentences = []
for sec in document_content["Content"]["Abstract_Parsed"] + document_content["Content"]["Fullbody_Parsed"]:
for para in sec["section_text"]:
for sen in para["paragraph_text"]:
sentences.append(sen["sentence_text"])
## summarize it!
summary = requests.post( api_gateway_address+"/ml-api/extractive-summarize/v1.0",
data = json.dumps( {
"sentence_list":sentences,
} ),
headers = {"Content-Type": "application/json"} ).json()["response"]
summary
{'summary': ['Abstract This systematic review assesses participatory approaches to motivating positive change among health workers in low- and middle-income countries (LMICs).',
'The mistreatment of clients at health centres has been extensively documented, causing stress among clients, health complications and even avoidance of health centres altogether.',
'Health workers, too, face challenges, including medicine shortages, task shifting, inadequate training and a lack of managerial support.',
'Solutions are urgently needed to realise global commitments to quality primary healthcare, country ownership and universal health coverage.',
'This review searched 1243 titles and abstracts, of which 32 were extracted for full text review using a published critical assessment tool.',
'Eight papers were retained for final review, all using a single methodology, ‘Health Workers for Change’ (HWFC).',
'Health workers acknowledged their negative behaviour towards clients, often as a way of coping with their own unmet needs.'],
'sentence_position': [0, 1, 2, 3, 4, 5, 8]}
api_gateway_address + "/ml-api/generate-citation/v1.0"
'http://localhost:8060/ml-api/generate-citation/v1.0'
-
Function: Batch-wise generation of citation sentences given a batch of local contexts, keywords, and papers to be cited. (The batchwise operation is used to handle concurrent requests, e.g. generating citation sentences for multiple papers simultaneously.)
-
Input:
a) "context_list" : a batch of local context texts (text in the manuscript before the target citation sentence.)
b) "keywords_list": a batch of keywords used to guide the generation
c) "papers": a batch of papers' content to be cited -
Output:
a batch of generated citation sentences
Given a query, we can search for relevant papers, and generated citation sentences for top 5 relevant papers.
ranking_variable = "There have been many recent advances in the treatment of Covid-19."
keywords = "Covid"
query_data ={
"ranking_variable":ranking_variable,
"keywords":keywords,
}
searched_document_ids = requests.post( api_gateway_address+"/ml-api/doc-search/v1.0",
data = json.dumps( query_data ),
headers = {"Content-Type": "application/json"} ).json()["response"]
## We get the content of the top 5 paper
document_contents = requests.post( api_gateway_address+ "/ml-api/get-papers/v1.0",
data = json.dumps( {
"paper_list":searched_document_ids[:5],
} ),
headers = {"Content-Type": "application/json"} ).json()["response"]
## Generate citation sentences:
citation_sentences = requests.post( api_gateway_address+ "/ml-api/generate-citation/v1.0",
data = json.dumps( {
## we use the ranking variable also as context, as an example
"context_list":[ranking_variable for _ in range(len(document_contents))],
## we use the ranking keywords also as the keywords to guide generation, but this can be changed to other keywords
"keywords_list":[keywords for _ in range(len(document_contents))],
"papers":document_contents
} ),
headers = {"Content-Type": "application/json"} ).json()["response"]
citation_sentences
['However, the treatment of Covid-19 has been largely based on a lack of knowledge, skills, and experience [ #CIT ].',
'However, there is still a lack of data on the effects of Covid-19 vaccination on shoulder injury [ #CIT ].',
'IL-5 is a well-known anti-inflammatory cytokine that plays an important role in the treatment of COVID-19 [ #CIT ].',
'For example, a large reservoir-based continuous positive airway pressure (CPAP) has been developed for the treatment of COVID-19 [ #CIT ].',
'For example, the treatment of Covid-19 has been mainly based on the use of a variety of drugs [ #CIT ].']
api_gateway_address + "/ml-api/citation-formatting-service/v1.0"
'http://localhost:8060/ml-api/citation-formatting-service/v1.0'
-
Function: Given a list of paper ids, return the formatted citation information (e.g. in bibtex or mla format) of the input papers
-
Input:
a) "paper_list":
mandatory
a list of papers' ids: Format: [ { "collection":"PMCOA", "id_field": "id_int", "id_value": 123 }, ... ]
(NOTE: This has to be a list even if there is only one paper)b) "username": same as doc-serach api
-
Output: A list of papers' citation entries, each citation entry is a dictionary that contains two keys "bibtex" and "mla". The values of these two keys represent two different format of the citation information
paper_ids = [
{"collection":"PMCOA","id_field":"id_int", "id_value":1},
{"collection":"PMCOA","id_field":"id_int", "id_value":2},
]
requests.post( api_gateway_address +"/ml-api/citation-formatting-service/v1.0",
data = json.dumps(
{
"paper_list":paper_ids
}
),
headers = {"Content-Type": "application/json"} ).json()
{'response': [{'bibtex': '@article{2022, paperIDInfo={collection:PMCOA, id_value:1, id_type:int}, title={“Bringing you the Best”: John Player & Sons, Cricket, and the Politics of Tobacco Sport Sponsorship in Britain, 1969–1986}, volume={80}, ISSN={2666-7711}, url={http://dx.doi.org/10.1163/26667711-bja10022}, DOI={10.1163/26667711-bja10022}, number={1}, journal={European Journal for the History of Medicine and Health}, publisher={Brill}, author={O’Neill, Daniel and Greenwood, Anna}, year={2022}, month={Aug}, pages={152–184} }',
'mla': 'O’Neill, Daniel, and Anna Greenwood. “‘Bringing You the Best’: John Player & Sons, Cricket, and the Politics of Tobacco Sport Sponsorship in Britain, 1969–1986.” European Journal for the History of Medicine and Health, vol. 80, no. 1, Aug. 2022, pp. 152–84. Crossref, https://doi.org/10.1163/26667711-bja10022.'},
{'bibtex': '@article{2022, paperIDInfo={collection:PMCOA, id_value:2, id_type:int}, title={Acetylcholine Boosts Dendritic NMDA Spikes in a CA3 Pyramidal Neuron Model}, volume={489}, ISSN={0306-4522}, url={http://dx.doi.org/10.1016/j.neuroscience.2021.11.014}, DOI={10.1016/j.neuroscience.2021.11.014}, journal={Neuroscience}, publisher={Elsevier BV}, author={Humphries, Rachel and Mellor, Jack R. and O’Donnell, Cian}, year={2022}, month={May}, pages={69–83} }',
'mla': 'Humphries, Rachel, et al. “Acetylcholine Boosts Dendritic NMDA Spikes in a CA3 Pyramidal Neuron Model.” Neuroscience, vol. 489, May 2022, pp. 69–83. Crossref, https://doi.org/10.1016/j.neuroscience.2021.11.014.'}]}
api_gateway_address+"/ml-api/title-generic-search/v1.0"
'http://localhost:8060/ml-api/title-generic-search/v1.0'
-
Function: Given a list of titles, find the corresponding papers from our database.
-
Input:
a) "titles":-
"titles" can be a list of strings. In this case, we assume each string is the title of a paper;
-
"titles" can also be a list of dictionaries, and each dictionary contains the metadata (Title and Author) of a paper, among the metadata, the key "Title" is mandatory, while "Author" is not mandantory but recommended. For example:
[ {"Title":"bird song ...","Author":[{"GivenName":"Tom", "FamilyName":"Lee"}]}, {"Title": "machine learning"} ]
b) "projection": same as pagination api
optional. Default: None (return all the content of the found paper)
"projection" specifies which field to return as usual. If "projection" is set to None, all the content of the found paper will be returned. -
-
Output: A list of papers, each of which has structured information confined by the "projection". If a paper is not found, then the value of the key "found" is False. If the paper is found, then "found"=True, and "collection","id_field","id_type" and "id_value" will be returned apart from the items defined by the projection
- This API returns a list of papers, each paper is a dictionary. Given a title/metadata, if we find the paper, we will return the id information (collection, id_field, id_type, id_value), and in the dictionary, the value of "found" will be True. Moreover, the paper's content will also be included in the returned results, determined by the "projection" given when calling the API.
- If we do not find the paper, in the returned dictionary the value of "found" is False.
query_data = {
"titles":[
## Either is okay: 1) a title's string or 2) a dictionary with "Title" (and "Author") as keys.
"A rapid cell-free expression and screening platform for antibody discovery ",
{"Title":"Adaptive GLCM sampling for transformer-based COVID-19 detection on CT ",
"Author": [{'GivenName': 'Jung', 'FamilyName': 'Okchul'}]
},
],
"projection": {}
}
searched_papers = requests.post( api_gateway_address+"/ml-api/title-generic-search/v1.0",
data = json.dumps( query_data ),
headers = {"Content-Type": "application/json"} ).json()["response"]
searched_papers
[{'Title': 'A rapid cell-free expression and screening platform for antibody discovery ',
'Author': [],
'First_Author': '',
'found': True,
'collection': 'PMCOA',
'id_field': 'id_int',
'id_type': 'int',
'id_value': 251,
'_id': 'PMCOA_251'},
{'Title': 'Adaptive GLCM sampling for transformer-based COVID-19 detection on CT ',
'Author': [{'GivenName': 'Jung', 'FamilyName': 'Okchul'}],
'First_Author': 'Jung Okchul',
'found': False}]