OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.05.2026, 07:33

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Fine-tuning small and open LLMs to automate geoscience data analysis workflows: A scalable approach

2025·2 Zitationen·Applied Computing and GeosciencesOpen Access
Volltext beim Verlag öffnen

2

Zitationen

6

Autoren

2025

Jahr

Abstract

With the recent integration of Large Language Models (LLMs) into geoscience applications, agentic LLM-driven workflows have emerged as an innovative approach to streamline automated data analysis processes. Advanced proprietary LLMs like ChatGPT demonstrate strong performance in customized workflows due to their substantial computational resources and extensive pretraining on diverse datasets. However, deploying such workflows with commercial LLMs can incur significant costs, especially in terms of token consumption, necessitating a shift toward open-source models. In this study, we fine-tuned an open-source LLM (Llama 3.1) to handle geoscience data analysis tasks, leveraging the self-instruct method to generate synthetic training datasets. The proposed pipeline for designing LLM-driven workflows and fine-tuning open-source models using synthetic datasets enables scalability, allowing the integration of additional LLM agents to accommodate more complex tasks. Furthermore, this workflow serves as a template for researchers in other domains to develop similar solutions tailored to their specific needs. Our experimental evaluation compares the performance of ChatGPT-4o with the fine-tuned Llama 3.1 in the context of the proposed geoscience data analysis workflow. Results demonstrate that the fine-tuned open-source model achieves performance comparable to proprietary models, extending the applicability of open LLMs to domain-specific agentic workflows in data analysis. • Present a workflow of using open LLMs to automate geoscience data analysis. • The fine-tuned open LLM shows performance comparable to commercial models for the domain-specific tasks. • Deploy a quick technique of generating synthetic dataset for LLM fine-tuning. • The open LLM-driven workflow is scalable and can be adapted to many topics.

Ähnliche Arbeiten

Autoren

Themen

Topic ModelingMachine Learning in Materials ScienceArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen