Accepted Papers: 6th International Conference on Cloud and Big Data (CLBD 2025)

Accepted Papers

Leveraging AI to Reduce Technical DEBT

Vijay Pahuja and Vishal Padh, Technical Debt, AI, Software Engineering, Development Tools United States of America

ABSTRACT

Technical Debt is one of the biggest issues hindering the digital transformation of organizations. Cost of addressing debt has been rising. AI powered tools can overcome problems of traditional tools as they continuously learn and adapt new patterns. They can proactively detect issues, suggest refactoring, and provide insight to areas of improvement in the codebase, pushing for more sustainable software development practices. While AI offers tremendous potential for managing and reducing technical debt, AI based tools come with their own challenges as AI is heavily dependent on the quality and quantity of data on which they are trained. As organizations rely more and more on AI, they may end up with monotonous codebases producing mediocre products as use of AI will lead to skill degradation and affect critical thinking.

KEYWORDS

Technical Debt, AI, Software Engineering, Development Tools.

Multi-label Commit Message Classification Through P-Tuning

Xia Li, Tanvi Mistry, The Department of Software Engineering and Game Design and Development, Kennesaw State University, Marietta, USA

ABSTRACT

Version control systems (VCS) play a crucial role by enabling developers to record changes, revert to previous versions, and coordinate work across distributed teams. In version control systems (e.g., GitHub), commit message serves as concise descriptions of code changes made during development. In our study, we evaluate the performance of multi-label commit message classification using p-tuning (learnable prompt templates) through three pre-trained models such as BERT, RoBERTa and DistilBERT. The experimental results demonstrate that RoBERTa model outperforms other two models in terms of the widely used evaluation metrics (e.g., achieving 81.99% F1 score).

KEYWORDS

Multi-label commit message classification, p-tuning, pre-trained models.

Explainable Optimized Machine Learning for Customer Churn Prediction

Nuwan Kaluarachchi, Arathi Arakala, Sevvandi Kandanaarachchi and Kristen Moore, Technical University of Crete, , Greece

ABSTRACT

This study proposes a multi-objective Mayfly Optimization Algorithm to tackle the NP-hard problem of selecting tuning hyperparameters and features in Random Forest models. The optimization simultaneously targets enhanced accuracy and F1 score, while minimizing the number of input variables, thus improving model efficiency without compromising performance. The results indicate that the optimized model performs better than standard configurations, achieving notable improvements in both predictive metrics and model simplicity. A critical aspect of the study lies in the interpretability of the model’s outputs through SHAP (SHapley Additive exPlanations) values, which offer transparency into how individual features influence predictions. In the context of large-scale customer datasets, the use of SHAP proved valuable in isolating the dominant predictors of churn, such as age, number of products, and activity status. The SHAP beeswarm plot demonstrates that older age and fewer products strongly correlate with higher churn risk, whereas active users and higher engagement tend to reduce it. While some features like salary and credit score were less impactful, the explainable outputs enhanced the trust and usability of the model, especially in complex environments with high-dimensional data. The study underscores the value of combining optimization with explainability for handling real-world big data applications, where accuracy alone isnt enough without knowing why a model behaves as it does.

KEYWORDS

random forest, mayfly, hyperparameters, feature selection, SHAP .

Skip-gram Based Grammar Corrector Using Semantic and Syntactic Analyzer for Nepali

Archit Yajnik Department of Mathematics, Sikkim Manipal Institute of Technology, Sikkim, India

ABSTRACT

The article represents the Grammar corrector (GC) based on the syntactic and semantic information of a Nepali sentence. Skip-gram model is used for the word to vector encoding. Window size of 3 context words is employed for the word to vector encoding. The network is trained up to the negative log entropy goes to 0.05. The network is tested over 500 incorrect syntactics and semantics of Nepali sentences. The network has suggested the corrections with the accuracy of 96.4%.

KEYWORDS

Skip-Gram, Grammar Corrector, Word Embedding.

Transformative Applications of Machine Learning Across Industry Domains

Michael O. Eniolade Department of Information Technology, University of the Cumberlands, USA

ABSTRACT

Machine learning (ML) represents a pivotal development in the technological era, offering solutions across numerous industries by enabling systems to learn and adapt without explicit programming. The expansive field of ML applications has significantly influenced healthcare, finance, natural language processing (NLP), and computer vision, among others. This paper explores the fundamental principles of ML applications, highlighting key developments, use cases, challenges, and emerging trends. Drawing upon recent scholarly research and industry practices, we illustrate how ML has evolved from theoretical concepts to practical tools that transform industries. The findings suggest that while ML offers unprecedented opportunities for innovation, it also presents significant challenges that demand ethical consideration and methodological advancement.

KEYWORDS

Machine Learning (ML), applications, natural language processing (NLP), healthcare, finance, digital, computer vision

Multimodal Proposal for AI Based Tool Increasing Cross Assessment of Messages

Alejandro Alvarez Castro¹ and Joaqu´ın Ordieres-Mere², ¹Master student at the AI master. Universidad Politecnica de Madrid. Madrid, 28040, ²Industrial Engineering School. Universidad Politecnica de Madrid. Madrid, 28006

ABSTRACT

Earnings calls represent a uniquely rich and semi-structured source of financial communication, blending scripted managerial commentary with unscripted analyst dialogue. While recent advances in financial sentiment analysis have integrated multimodal signals, such as textual content and vocal tone—most systems rely on flat document-level or sentence-level models, failing to capture the layered discourse structure of these interactions. This paper introduces a novel multimodal framework that encodes entire earnings calls as hierarchical discourse trees. Each node, comprising either a monologue or a question–answer pair, is enriched with emotional signals derived from text, audio, and video, as well as structured metadata including coherence scores, topic labels, and answer coverage assessments. A twostage transformer architecture is proposed: the first encodes multimodal content and discourse metadata at the node level using contrastive learning, while the second synthesizes a global embedding for the entire conference. Experimental results reveal that the resulting embeddings form stable, semantically meaningful representations that reflect affective tone, structural logic, and thematic alignment. Beyond financial reporting, the proposed system generalizes to other unscripted, high-stakes communicative domains such as telemedicine, education, and political discourse, offering a robust and explainable approach to multimodal discourse representation.

KEYWORDS

Multimodal Learning, Neural Machine Translation (NMT), Speech-Text Alignment, Crossmodal Embeddings, Transformer Models, Multilingual Corpora, Representation Learning, Sequence-toSequence Models, Self-supervised Learning.

Enhanced Bartangi Lemmatization and Word Embedding Pipeline: Improving Linguistic Consistency for Low-resource Nlp

Warda Tariq, Department of Computer Science, Higher School of Economics (HSE University), Moscow, Russia

ABSTRACT

This paper presents a reproducible and systematically improved pipeline for Bartangi language lemmatization and word embedding modeling, designed to advance the state of computational methods for this under-resourced language. Building upon prior work, we address several key challenges that limited the ef ectiveness of earlier versions. Specifically, we focus on the issues of over-filtering, which resulted in the exclusion of important lexical items; af ix misanalysis, which introduced incorrect lemma representations; and sentence-level sparsity, which reduced the contextual richness of the lemmatized data. These problems significantly impacted the quality, interpretability, and usability of the resulting corpora in linguistic and computational tasks.To resolve these challenges, we propose a set of refined morphological parsing and part-of-speech (POS) filtering rules. Our approach ensures that semantically and syntactically meaningful tokens, including infinitive verbs, common nouns, and core grammatical elements, are preserved during preprocessing. At the same time, af ix-like tokens, short or malformed lemmas, and other non-essential forms are systematically removed. This process produces a clean and linguistically reliable Bartangi corpus, which is well-suited for training distributional semantic models.Using this corpus, we train two word embedding models based on the Word2Vec framework, namely Skip-gram and CBOW. These models capture the co-occurrence patterns and latent structure of Bartangi words in dif erent contexts. To better understand and compare the learned representations, we employ dimensionality reduction techniques, including Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE). These visualizations reveal distinct clustering behaviors, where Skip-gram embeddings more ef ectively capture rare and context-sensitive words, while CBOW embeddings are more suitable for modeling frequent and contextually stable lexical relations. The entire pipeline encompassing the enhanced lemmatization process, the cleaned corpus, trained Word2Vec models, and the visualization tools is publicly released through a dedicated GitHub repository. By making these resources openly available, this work contributes to the growing body of reproducible research in low-resource language processing. Furthermore, it of ers a valuable foundation for future ef orts in endangered language documentation, computational linguistics, and NLP applications targeting Bartangi and related languages.

KEYWORDS

Bartangi language, Low-resource languages, Lemmatization, Morphological parsing, Word embeddings, Word2Vec (CBOW and Skip-gram), t-SNE visualization, NLP pipeline.

AI-powered Digital Literacy for Adult Learners: A Practice-based Study on Confidence and Skill Development in Technology use

Salih Mansur, Harvard University, Division of Continuing Education, Cambridge, USA

ABSTRACT

This practice-based study examines the impact of an AI-supported digital literacy course (DigiLit) on adult learners’ technology skills and confidence. Using a mixed-methods approach, we collected before-and-after surveys and written reflections from 11 participants, primarily from service-based professions. Learners used ChatGPT and Google Workspace tools to complete real-life tasks. Results showed a 40% increase in confidence using AI tools and marked improvement in digital fluency. Qualitative data highlighted themes such as overcoming fear of technology and applying new skills in work scenarios. The findings suggest that accessible, well-structured digital literacy programs can significantly support adult learners navigating modern digital environments.

KEYWORDS

artificial intelligence (AI), digital literacy, adult education (Andragogy), Google tools, technology confidenc.

CLBD

6^th International Conference on Cloud and Big Data (CLBD 2025)

August 23 ~ 24, 2025, Dubai, UAE

6^th International Conference on Cloud and Big Data (CLBD 2025)

August 23 ~ 24, 2025, Dubai, UAE

6^th International Conference on Cloud and Big Data (CLBD 2025)

August 23 ~ 24, 2025, Dubai, UAE

6^th International Conference on Cloud and Big Data (CLBD 2025)

August 23 ~ 24, 2025, Dubai, UAE

6^th International Conference on Cloud and Big Data (CLBD 2025)

August 23 ~ 24, 2025, Dubai, UAE

6^th International Conference on Cloud and Big Data (CLBD 2025)

August 23 ~ 24, 2025, Dubai, UAE

Accepted Papers

Leveraging AI to Reduce Technical DEBT

ABSTRACT

KEYWORDS

Multi-label Commit Message Classification Through P-Tuning

ABSTRACT

KEYWORDS

Explainable Optimized Machine Learning for Customer Churn Prediction

ABSTRACT

KEYWORDS

Skip-gram Based Grammar Corrector Using Semantic and Syntactic Analyzer for Nepali

ABSTRACT

KEYWORDS

Transformative Applications of Machine Learning Across Industry Domains

ABSTRACT

KEYWORDS

Multimodal Proposal for AI Based Tool Increasing Cross Assessment of Messages

ABSTRACT

KEYWORDS

Enhanced Bartangi Lemmatization and Word Embedding Pipeline: Improving Linguistic Consistency for Low-resource Nlp

ABSTRACT

KEYWORDS

AI-powered Digital Literacy for Adult Learners: A Practice-based Study on Confidence and Skill Development in Technology use

ABSTRACT

KEYWORDS

Reach Us