Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

OpenResume: Advancing Career Trajectory Modeling with Anonymized and Synthetic Resume Datasets

2024·2 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

Despite substantial advancements in various fields of AI, computational research in career and job domains has been significantly hindered by a critical lack of accessible datasets. This limitation is mainly due to the proprietary nature of job platforms, which restrict the sharing of job-domain datasets with the research community. The scarcity is particularly pronounced for career trajectory and resume datasets, severely constraining academic researchers in developing and evaluating new models. In this paper, we address the crucial issue of resume dataset unavailability in the job domain, identified through our comprehensive comparison of existing job-domain machine learning studies. To the best of our knowledge, we introduce OpenResume, the first publicly available, anonymized, and structured resume dataset, specifically designed for job-domain downstream tasks. This dataset aims to catalyze advancements in AI and foster new markets for machine learning and data science within career trajectory modeling. OpenResume is comprehensively processed from real-world resume data. We anonymize and substitute personal identifiers and company names, normalize job titles into ESCO-based ones (i.e., one of the most common occupation taxonomies), and employ differential privacy techniques on temporal features to ensure open accessibility and privacy protection. Additionally, we augment OpenResume with a synthetically generated resume dataset derived from the post-processed real-world data, extending its diversity and utility. To demonstrate that OpenResume retains challenges and properties similar to real-world job datasets, we benchmark OpenResume on state-of-the-art job-domain prediction models across four prevalent downstream tasks: (1) next job title prediction, (2) next company prediction, (3) turnover prediction, and (4) link prediction. Our experimental results show that these job-domain models perform comparably on OpenResume and the original data across all tasks, demonstrating OpenResume as a valuable career trajectory dataset for both academic research and practical applications. We also indicate the OpenResume applicability for the other eight downstream tasks. Our datasets are available at: https://tinyurl.com/OpenResumeData.

Autoren

Institutionen

Themen

Machine Learning in HealthcarePrivacy-Preserving Technologies in DataArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

OpenResume: Advancing Career Trajectory Modeling with Anonymized and Synthetic Resume Datasets

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen