刘耀文

刘耀文

java开发者
github

python腳本,批量獲取bibtex

需求:在使用 latex 寫論文的時候,你是否有這個需求,需要將引用轉換為 bibtex 格式,如果文獻量很大,這個重複工作實在不值得做,如果你實現使用了文獻管理工具,例如 endnote、zotero,可以一件導出,但是沒有的話,本文提供一個解決方案

方案:crossref API+google scholar API

crossref 是最大的外文 doi 發布平台,基本包含了所有的外文文獻的元數據,但是也有一些包括不限於 arXiv 等文獻是查詢不到的,這個時候需要 google scholar 幫忙

為了節省大家的時間,這兩個 api 我已經進行了封裝,只需要使用 pip 下載下來

pip install get_bibtex

之後可以按照下面的使用方法

from apiModels.get_bibtex_from_crossref import GetBibTex
from apiModels.get_bibtex_from_google_scholar import GetBibTexFromGoogleScholar

if __name__ == '__main__':
    google_scholar_api_key = "your_google_scholar_api_key"
    get_bibtex_from_crossref = GetBibTex("[email protected]")
    get_bibtex_from_google_scholar = GetBibTexFromGoogleScholar(google_scholar_api_key, GetBibTexFromGoogleScholar.APA)

    with open("inputfile/Bibliographyraw.txt", "r", encoding='utf-8') as f:
        raws = f.readlines()
    
    # get bibtex from CrossRef and failed search results
    success_bibtexs_crossref, failed_results = get_bibtex_from_crossref.get_bibtexs(raws)
    
    # for each failed search result, get bibtex from Google Scholar
    success_bibtexs_google, failed_results = get_bibtex_from_google_scholar.get_bibtexs(failed_results)

    with open("outputfile/BibliographyCrossRef.txt", "w", encoding='utf-8') as f:
        for bibtex in success_bibtexs_crossref:
            f.write(bibtex)

    with open("outputfile/BibliographyGoogleScholar.txt", "w", encoding='utf-8') as f:
        for index, bibtex in enumerate(success_bibtexs_google):
            f.write("[]".format(index) + " " + bibtex + "\n")

    with open("outputfile/not_find.txt", "w", encoding='utf-8') as f:
        for result in failed_results:
            f.write(result+"\n")

    print("find bibtex from CrossRef: ", len(success_bibtexs_crossref))
    print("find bibtex from Google Scholar: ", len(success_bibtexs_google))
    print("not find: ", len(failed_results))

關鍵代碼解釋

Bibliographyraw.txt裡面是需要查詢的文件
例如:
J. Hu, L. Shen, S. Albanie, G. Sun, and A. Vedaldi, “Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks.” arXiv, Jan. 12, 2019. doi: 10.48550/arXiv.1810.12348.
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local Neural Networks.” arXiv, Apr. 13, 2018. doi: 10.48550/arXiv.1711.07971.
------------------
success_bibtexs_crossref, failed_results = get_bibtex_from_crossref.get_bibtexs(raws)
返回的第一個參數為bibtex列表,第二個為沒有查詢到的原文獻

success_bibtexs_google, failed_results = get_bibtex_from_google_scholar.get_bibtexs(failed_results)
將沒有查到的文獻繼續使用google API查詢,一般都是可以查詢到的,沒有文獻在google scholar查詢不到了吧
提示,這裡返回的其實是APA格式的,在上面初始化指定,也有一個參數可以設置返回bibtex格式,例如
get_bibtex_from_google_scholar = GetBibTexFromGoogleScholar(google_scholar_api_key, GetBibTexFromGoogleScholar.APA, flag = True)
但是需要設置代理伺服器,例如:
import os
import re
import requests
os.environ["http_proxy"]="127.0.0.1:7890"
os.environ["https_proxy"]="127.0.0.1:7890"

!!!!!!!注意:需要先去申請 API,每個月有 100 的免費查詢次數,一般是夠用的,在 serpapi.com 申請

之後的代碼都一目了然了哈哈

當然也有請求單個查詢的:

get_bibtex() 去掉s就可以了 

2024.4.16 更新#

1、添加了 DBLP 接口

from apiModels.get_bibtex_from_dblp import GetBibTexFromDBLP

2、提高使用便捷性

現在提供一個封裝好的類提供使用,這個方法已經封裝好了Crosref和DBLP的API
from apiModels.workflow.crossref2dblp import Crossref2Dblp

使用方法(無google scholar API
crossref2dblp = Crossref2Dblp("your email", "inputfile/Bibliographyraw.txt", "outputfile/Bibliography.txt")
crossref2dblp.running()
坐等運行完成

(有了google scholar API
from apiModels.workflow.crossref2dblp import Crossref2Dblp
from apiModels.get_bibtex_from_google_scholar import GetBibTexFromGoogleScholar
get_bibtex_from_google_scholar = GetBibTexFromGoogleScholar(api_key="your api key")
在最後面參數加上你封裝的API
crossref2dblp = Crossref2Dblp("[email protected]", "inputfile/Bibliographyraw.txt", "outputfile/Bibliography.txt",get_bibtex_from_google_scholar)
crossref2dblp.running()
坐等運行完成

或者你想自己定義api之間的調用順序
from apiModels.workflow.make_workflow import MakeWorkflow
from apiModels.get_bibtex_from_google_scholar import GetBibTexFromGoogleScholar
from apiModels.get_bibtex_from_crossref import GetBibTex

get_bibtex_from_google_scholar = GetBibTexFromGoogleScholar(api_key="your api key")
get_bibtex_from_crossref = GetBibTex("[email protected]")
make_workflow = MakeWorkflow("inputfile/Bibliographyraw.txt", "outputfile/Bibliography.txt", get_bibtex_from_google_scholar, get_bibtex_from_crossref)
make_workflow.running()

使用之前: 
pip install get_bibtex = 1.1.0

歡迎改進

此文由 Mix Space 同步更新至 xLog
原始鏈接為 https://me.liuyaowen.club/posts/default/20240816and2


載入中......
此文章數據所有權由區塊鏈加密技術和智能合約保障僅歸創作者所有。