This project implements a scalable, cloud-native ETL (Extract, Transform, Load) pipeline designed to aggregate, enrich, and persist historical music chart data spanning over six decades (1965–Present). The architecture demonstrates a modern approach to data engineering by combining traditional web scraping with Generative AI (LLMs) and third-party APIs to build a comprehensive metadata repository. It emphasizes high availability, cost optimization through caching strategies, and the integration of unstructured data processing.