LLMs之FinDKG:《FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets》翻譯與解讀
導(dǎo)讀:這篇論文提出了一種利用大型語言模型構(gòu)建金融領(lǐng)域動態(tài)知識圖譜?(DKG) 并用于金融市場趨勢預(yù)測的方法,名為 FinDKG。通過 ICKG 和 KGTransformer 的結(jié)合,該方法在自動化知識圖譜構(gòu)建、鏈接預(yù)測和主題投資方面都取得了顯著成果,為金融科技領(lǐng)域的研究和應(yīng)用提供了新的思路。
>> 背景痛點(diǎn):
● 金融數(shù)據(jù)復(fù)雜性:金融市場信息分散在各種非結(jié)構(gòu)化數(shù)據(jù)源中(例如新聞文章),難以有效提取和利用其中的關(guān)系信息。
● 動態(tài)知識圖譜的局限性:現(xiàn)有的動態(tài)知識圖譜構(gòu)建方法效率低,難以處理大規(guī)模金融數(shù)據(jù),且缺乏對金融領(lǐng)域特定關(guān)系的有效建模。
● 大型語言模型在金融領(lǐng)域的應(yīng)用不足:雖然大型語言模型 (LLM) 在自然語言處理方面表現(xiàn)出色,但其在動態(tài)知識圖譜構(gòu)建和金融趨勢預(yù)測中的應(yīng)用仍處于早期階段,缺乏針對金融領(lǐng)域的專門模型。
>> 具體的解決方案:論文提出了兩大核心方案:
● ICKG (Integrated Contextual Knowledge Graph Generator):一個基于大型語言模型的開放源碼微調(diào)模型,用于從金融新聞文章中提取實(shí)體、關(guān)系和時(shí)間戳,構(gòu)建動態(tài)知識圖譜。
● KGTransformer:一種基于注意力機(jī)制的圖神經(jīng)網(wǎng)絡(luò)?(GNN) 架構(gòu),用于分析 FinDKG,進(jìn)行鏈接預(yù)測和主題投資分析。KGTransformer 能夠有效地利用元實(shí)體信息(實(shí)體類別)來提高模型性能。
>> 核心思路步驟:FinDKG 系統(tǒng)構(gòu)建流程如下:
● 數(shù)據(jù)收集:收集大量的金融新聞文章。
● 知識圖譜生成:使用 ICKG 模型從新聞文章中提取實(shí)體、關(guān)系、時(shí)間戳和實(shí)體類別,形成事件五元組?(subject, relation, object, time, entity category)。
● 實(shí)體消歧:使用 Sentence-BERT 對提取的實(shí)體進(jìn)行消歧。
● 動態(tài)知識圖譜構(gòu)建:將提取的事件五元組構(gòu)建成 FinDKG。
● 圖學(xué)習(xí):使用 KGTransformer 模型對 FinDKG 進(jìn)行分析,進(jìn)行鏈接預(yù)測和主題投資分析。KGTransformer 利用時(shí)間演化嵌入和結(jié)構(gòu)嵌入來建模 DKG 的時(shí)間和結(jié)構(gòu)特征。
● 趨勢識別和主題投資:通過分析 FinDKG 中的圖中心性指標(biāo)來識別金融趨勢,并利用 KGTransformer 進(jìn)行主題投資策略的制定和評估。
>> 優(yōu)勢:
● 自動化知識圖譜構(gòu)建:ICKG 模型實(shí)現(xiàn)了金融知識圖譜的自動化構(gòu)建,提高了效率和可擴(kuò)展性。
● 高效的圖神經(jīng)網(wǎng)絡(luò)架構(gòu):KGTransformer 模型有效地結(jié)合了注意力機(jī)制、元實(shí)體信息和時(shí)間信息,提高了鏈接預(yù)測的準(zhǔn)確性。
● 開放源碼:ICKG 模型和 FinDKG 數(shù)據(jù)集都是開放源碼的,方便研究人員進(jìn)行復(fù)現(xiàn)和進(jìn)一步研究。
● 應(yīng)用于主題投資:FinDKG 和 KGTransformer 可以用于主題投資,并取得了優(yōu)于現(xiàn)有主題 ETF 的投資回報(bào)。
>> 結(jié)論和觀點(diǎn):
● KGTransformer 在鏈接預(yù)測任務(wù)上表現(xiàn)優(yōu)異,在 FinDKG 數(shù)據(jù)集上取得了超過 10% 的性能提升。
● FinDKG 成功地捕捉了金融新聞中的重要趨勢,例如 COVID-19 大流行的影響。
● 基于 FinDKG 的主題投資策略 (FinDKG-AI) 在 AI 主題投資中取得了優(yōu)于市場基準(zhǔn)和現(xiàn)有 AI 主題 ETF 的回報(bào)。
● 該研究證明了大型語言模型在構(gòu)建金融領(lǐng)域動態(tài)知識圖譜和進(jìn)行金融市場趨勢預(yù)測方面的潛力。
《FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets》翻譯與解讀
地址 | 論文地址:https:///abs/2407.10909 | 時(shí)間 | 2024年7月15日,2024年10月15日更新 | 作者 | 倫敦帝國理工學(xué)院 |
Abstract
Dynamic knowledge graphs (DKGs) are popular structures to express different types of connections between objects over time. They can also serve as an efficient mathematical tool to represent information extracted from complex unstructured data sources, such as text or images. Within financial applications, DKGs could be used to detect trends for strategic thematic investing, based on information obtained from financial news articles. In this work, we explore the properties of large language models (LLMs) as dynamic knowledge graph generators, proposing a novel open-source fine-tuned LLM for this purpose, called the Integrated Contextual Knowledge Graph Generator (ICKG). We use ICKG to produce a novel open-source DKG from a corpus of financial news articles, called FinDKG, and we propose an attention-based GNN architecture for analysing it, called KGTransformer. We test the performance of the proposed model on benchmark datasets and FinDKG, demonstrating superior performance on link prediction tasks. Additionally, we evaluate the performance of the KGTransformer on FinDKG for thematic investing, showing it can outperform existing thematic ETFs. | 動態(tài)知識圖譜(DKG)是一種流行的結(jié)構(gòu),用于在不同時(shí)間點(diǎn)表達(dá)對象之間的不同類型連接。它們還可以作為一種有效的數(shù)學(xué)工具,用于表示從復(fù)雜非結(jié)構(gòu)數(shù)據(jù)源(如文本或圖像)中提取的信息。在金融應(yīng)用中,DKG可用于基于從金融新聞文章中獲取的信息來檢測趨勢,用于戰(zhàn)略性主題投資。在本工作中,我們探索了大型語言模型(LLM)作為動態(tài)知識圖譜生成器的特性,并提出了一種名為集成上下文知識圖譜生成器(ICKG)的開源微調(diào)LLM。我們使用ICKG從金融新聞文章的語料庫中生成一種新型開源DKG,稱為FinDKG,并提出了一種基于注意力的圖神經(jīng)網(wǎng)絡(luò)(GNN)架構(gòu),用于對其進(jìn)行分析,稱為KGTransformer。我們對所提模型在基準(zhǔn)數(shù)據(jù)集和FinDKG上的性能進(jìn)行了測試,證明其在鏈接預(yù)測任務(wù)上的性能優(yōu)于現(xiàn)有模型。此外,我們評估了KGTransformer在FinDKG上的主題投資性能,并證明其可以優(yōu)于現(xiàn)有的主題ETF。 | Dynamic knowledge graphs, graph attention networks, graph neural networks, graph transformers, large language models. | 動態(tài)知識圖譜、圖注意力網(wǎng)絡(luò)、圖神經(jīng)網(wǎng)絡(luò)、圖變換器、大型語言模型 |
1、Introduction
A knowledge graph (KG) is a data structure that encodes information consisting in entities and different types of relations between them. Formally, a KG can be represented as?��={?,?,?}, where???and???denote the sets of entities and relations respectively, and????×?×??represents a set of facts, consisting in relations of different types between entities. The triplet?(s,r,o)∈??is the fundamental building block of a KG, where?s∈??represents the source entity,?r∈??the relation, and?o∈??the object entity. For instance, the triplet?(OpenAI, Invent, ChatGPT)?shows how entities and relations combine to form a fact, with?OpenAI?and?ChatGPT?as entities and?Invent?as the relation. Temporal or dynamic knowledge graphs (DKGs) extend static KGs by incorporating temporal dynamics. Each fact in a DKG is associated with a timestamp?t∈?+, allowing the model to capture the temporal evolution of events. Therefore, events occur in quadruples?(si,ri,oi,ti)∈?×?×?×?+, where?ti?is the event time, such that?ti≤tj?for?i<j,i,j∈?. Then, the DKG?��t=(?,?,?t)?at time?t?can be expressed via a time-varying set of facts??t?defined as (1) ?t={(si,ri,oi,ti):si,oi∈?,ri∈?,ti<t}. The task of estimating a model for?��t?from observed data is called?dynamic knowledge graph learning. This typically involves data-driven training of graph neural networks, designed to model both the structure and the temporal dynamics of the KGs over time. | 知識圖譜(KG)是一種數(shù)據(jù)結(jié)構(gòu),用于編碼由實(shí)體和它們之間的不同類型關(guān)系組成的信息。形式上,KG可以表示為��={?,?,?},其中?和?分別表示實(shí)體和關(guān)系的集合,而???×?×?表示一組事實(shí),由實(shí)體之間的不同類型的關(guān)系組成。KG的基本構(gòu)建塊是三元組(s,r,o)∈?,其中s∈?表示源實(shí)體,r∈?表示關(guān)系,o∈?表示目標(biāo)實(shí)體。例如,三元組(OpenAI, Invent, ChatGPT)展示了實(shí)體和關(guān)系如何組合形成事實(shí),其中OpenAI和ChatGPT是實(shí)體,Invent是關(guān)系。 動態(tài)或時(shí)間相關(guān)的知識圖譜(DKG)通過引入時(shí)間動態(tài)擴(kuò)展靜態(tài)KG。DKG中的每個事實(shí)都與一個實(shí)數(shù)t∈?+相關(guān)聯(lián),使模型能夠捕捉事件的時(shí)空演化。因此,事件發(fā)生在四元組(si,ri,oi,ti)∈?×?×?×?+中,其中ti是事件時(shí)間,使得ti≤tj對于i<j,i,j∈?。然后,在時(shí)間t處的DKG��t=(?,?,?t)可以通過定義為時(shí)間可變的事實(shí)集?t來表示: (1) ?t={(si,ri,oi,ti):si,oi∈?,ri∈?,ti<t}。利用觀測數(shù)據(jù)對��t進(jìn)行建模的任務(wù)稱為動態(tài)知識圖學(xué)習(xí)。通常涉及基于圖神經(jīng)網(wǎng)絡(luò)的數(shù)據(jù)驅(qū)動訓(xùn)練,旨在同時(shí)建模知識圖譜的結(jié)構(gòu)和隨時(shí)間變化的動態(tài)特性。 | In real-world applications such as finance, entities and relations can be further grouped into?categories, often called?meta-entities. For example, consider the relation between the entity?Jeff Bezos?which is of type?Person, and the entity?Amazon, which is of type?Company. The relation between them is?Founder Of, which could be considered to have the type?Business action. In this work, inspired by heterogeneous graph transformers?(HGT, Hu et?al.,?2020), we discuss a way to introduce the additional meta-entity information within a dynamic knowledge graph learning procedure based on graph attention networks?(GAT, Veli?kovi? et?al.,?2017)?and EvoKG?(Park et?al.,?2022). This results in the?Knowledge Graph Transformer?(KGTransformer), an attention-based graph neural network (GNN) designed to create dynamic lower-dimensional representations of entities and relations. In addition to DKGs, Large Language Models (LLMs) have also been gaining popularity recently within the financial sector, demonstrating potential in enhancing various financial tasks through advanced natural language processing (NLP) capabilities?(Nie et?al.,?2024). Popular models such as BERT, the GPT series, and financial-specific variants such as FinBERT?(Araci,?2019)?and FinGPT?(Yang et?al.,?2023)?leverage LLMs to improve the state-of-the-art in tasks such as financial sentiment analysis. | 在金融等現(xiàn)實(shí)世界應(yīng)用中,實(shí)體和關(guān)系可以進(jìn)一步歸類為所謂的元實(shí)體。例如,考慮實(shí)體Jeff Bezos(類型為Person)與實(shí)體Amazon(類型為Company)之間的關(guān)系。它們之間的關(guān)系是Founder Of,可以認(rèn)為具有類型Business action。在這項(xiàng)工作中,我們借鑒了異構(gòu)圖變換器(HGT,Hu等人,2020),討論了在基于圖注意力網(wǎng)絡(luò)(GAT,Veli?kovi?等人,2017)和EvoKG(Park等人,2022)的動態(tài)知識圖學(xué)習(xí)過程中引入額外的元實(shí)體信息的方法。這導(dǎo)致了知識圖變換器(KGTransformer)的產(chǎn)生,這是一種基于注意力的圖神經(jīng)網(wǎng)絡(luò)(GNN),旨在為實(shí)體和關(guān)系創(chuàng)建動態(tài)的較低維度表示。除了DKG外,最近在金融領(lǐng)域,大型語言模型(LLM)也越來越受歡迎,通過先進(jìn)的自然語言處理(NLP)能力展示了增強(qiáng)各種金融任務(wù)的潛力(Nie et al., 2024)。例如,流行的模型如BERT、GPT系列以及金融特定變體如FinBERT(Araci, 2019)和FinGPT(Yang et al., 2023)利用LLM來改善金融情感分析等任務(wù)的現(xiàn)狀。 | The application of LLMs to dynamic knowledge graphs has been so far limited in the literature. Therefore, one of the main contributions of this work is to also propose a pipeline for generative knowledge graph construction (KGC) via Large Language Models (LLMs), resulting in the?Integrated Contextual Knowledge Graph Generator (ICKG)?large language model. In particular, we develop a fine-tuned LLM to systematically extract entities and relationships from textual data via engineered input queries or?“prompts”, subsequently assembling them into event quadruples of the same form as (1). We use the proposed ICKG LLM to generate an open-sourced financial knowledge graph dataset, called FinDKG. In summary, our contributions in this work are threefold: (1)、We propose KGTransformer, an attention-based GNN architecture for dynamic knowledge graph learning that includes information about meta-entities (cf.?Section?4), combining existing work on GATs?(Veli?kovi? et?al.,?2017), HGTs?(Hu et?al.,?2020)?and EvoKG?(Park et?al.,?2022). We demonstrate substantial improvements in link prediction metrics (cf.?Section?5.1) on real-world DKGs. (2)、We develop an open-source LLM for dynamic knowledge graph generation for finance called?Integrated Contextual Knowledge Graph Generator?(ICKG,?cf.?Section?3). (3)、We utilise ICKG to create an open-source dynamic knowledge graph based on financial news articles, called?FinDKG?(cf.?Section?3.1). FinDKG is used for thematic investing upon capitalizing on the AI trend, improving upon other AI-themed portfolios (cf.?Section?5.3). The remainder of this work is organised as follows: Section?2?discusses related literature. Next, Section?3?and?4?discuss the main contributions of our work: ICKG and KGTransformer. Finally, Section?5?discusses applications on real-world DKGs. | 將LLM應(yīng)用于動態(tài)知識圖譜在文獻(xiàn)中迄今為止是有限的。因此,本工作的主要貢獻(xiàn)之一是提出通過大型語言模型(LLM)的生成式知識圖譜構(gòu)造(KGC)管道,從而生成集成上下文知識圖譜生成器(ICKG)大型語言模型。具體來說,我們開發(fā)了一個經(jīng)過微調(diào)的LLM,通過人工設(shè)計(jì)的輸入查詢或“提示”系統(tǒng)地從文本數(shù)據(jù)中提取實(shí)體和關(guān)系,隨后將它們組裝成與(1)中相同的事件四元組。我們使用所提出的ICKG LLM生成一個開源的金融知識圖譜數(shù)據(jù)集,稱為FinDKG??傊?#xff0c;我們在這項(xiàng)工作中的貢獻(xiàn)主要有以下三點(diǎn): (1) 我們提出了KGTransformer,這是一種基于注意力的GNN架構(gòu),用于動態(tài)知識圖譜學(xué)習(xí),包括元實(shí)體的信息(見第4節(jié)),結(jié)合了現(xiàn)有的GATs(Veli?kovi?等人,2017)、HGTs(Hu等人,2020)和EvoKG(Park等人,2022)的工作。我們在真實(shí)世界的DKG上展示了顯著的鏈接預(yù)測指標(biāo)改進(jìn)(見第5.1節(jié))。 (2) 我們開發(fā)了一個名為Integrated Contextual Knowledge Graph Generator(ICKG)的開源LLM,用于生成金融領(lǐng)域的動態(tài)知識圖譜(見第3節(jié))。 (3) 我們利用ICKG創(chuàng)建了一個基于金融新聞文章的開源動態(tài)知識圖譜,稱為FinDKG(見第3.1節(jié))。FinDKG用于主題投資,利用AI趨勢,改進(jìn)了其他基于AI主題的投資組合(見第5.3節(jié))。 本工作的其余部分組織如下:第2節(jié)討論相關(guān)文獻(xiàn)。接下來,第3節(jié)和第4節(jié)討論我們工作的主要貢獻(xiàn):ICKG和KGTransformer。最后,第5節(jié)討論在真實(shí)世界的DKG上的應(yīng)用。 |
Conclusion
In this work, we provided three contributions around the use of dynamic knowledge graphs (DKGs) and large language models (LLMs) within financial applications. First, we investigated the performance of fine-tuned open-source LLMs in generating knowledge graphs, proposing the novel open-source Integrated Contextual Knowledge Graph Generator (ICKG) LLM. Next, the ICKG LLM is used to create an open-source dataset from a corpus of financial news articles, called FinDKG. Additionally, we proposed an attention-based architecture called KGTransformer, which incorporates information from meta-entities within the learning process, combining architectures such as HGT (Hu et al., 2020) and EvoKG (Park et al., 2022). | 在這項(xiàng)工作中,我們圍繞在金融應(yīng)用程序中使用動態(tài)知識圖(DKGs)和大型語言模型(LLMs)提出了三項(xiàng)貢獻(xiàn)。首先,我們研究了微調(diào)的開源LLMs在生成知識圖方面的性能,并提出了新穎的開源集成上下文知識圖生成器(ICKG)LLM。接下來,使用ICKG LLM從金融新聞文章的語料庫中創(chuàng)建了一個開源數(shù)據(jù)集,稱為FinDKG。此外,我們還提出了一個基于注意力的架構(gòu),名為KGTransformer,它在學(xué)習(xí)過程中結(jié)合了來自元實(shí)體的信息,結(jié)合了諸如HGT(胡等,2020)和EvoKG(樸等,2022)等架構(gòu)。 | Our findings show that the proposed KGTransformer architecture improves the state-of-the-art link prediction performance on two benchmark datasets, and it achieves the best performance with over 10% uplift on FinDKG. The generalizability of the ICKG LLM extends beyond the financial news and financial domain, as evidenced by applications in the recent literature adopting similar frameworks (Sarmah et al., 2024; Ouyang et al., 2024). Code associated with this work can be found in the GitHub repository xiaohui-victor-li/FinDKG, and an online portal to visualise FinDKG is available at https://xiaohui-victor-li./FinDKG/. | 我們的研究結(jié)果表明,所提出的KGTransformer架構(gòu)在兩個基準(zhǔn)數(shù)據(jù)集上的鏈接預(yù)測性能優(yōu)于現(xiàn)有技術(shù)水平,并且在FinDKG上實(shí)現(xiàn)了超過10%的提升。ICKG LLM的泛化能力不僅限于金融新聞和金融領(lǐng)域,最近采用類似框架的文獻(xiàn)應(yīng)用也證明了這一點(diǎn)(Sarmah 等,2024;Ouyang等,2024)。與這項(xiàng)工作相關(guān)的代碼可以在GitHub倉庫xiaohui-victor-li/FinDKG中找到,并且可以在https://xiaohui-victor-li./FinDKG/上在線查看FinDKG的門戶網(wǎng)站。 |
|