This paper proposes a novel clustering approach based on Gower distance and a new centroid cluster selection technique. Traditional clustering methods often struggle to improve the performance of standard classifiers when dealing with high-dimensional and heterogeneous data, leading to suboptimal outcomes. Moreover, methods like K-means are highly dependent on the initial selection of centers, which must be predefined, and they fail to effectively identify complex manifold clusters. The introduced approach overcomes these problems by segmenting records into blocks having 2–10 samples to understand their distribution and, further, choose cluster centers in an effective manner. Finally, Gower distance is used to cluster the data based on record similarity. Validation of the proposed clustering method was done using the two medical datasets, the Parkinson’s Disease (PD) and Wisconsin Diagnostic Breast Cancer (WDBC), and Bonn EEG datasets. Results obtained show drastic improvement in machine learning classifiers’ performance, achieving an accuracy of almost 100% for both the PD and WDBC datasets, and 98% for the Bonn dataset. This demonstrates the robustness and effectiveness of the proposed approach compared to traditional methods such as Particle Swarm Optimization (PSO), which requires iterations of optimization. The proposed approach simplifies the clustering process, making it more efficient and less computationally intensive. Implementing this clustering approach in medical organizations can enhance patient care by providing more accurate and reliable data classification. By efficiently handling high-dimensional data, healthcare professionals can make better-informed treatment decisions, improving overall care quality.
Efficient clustering approach based on Gower distance for high-dimensional medical datasets
Cluster Computing
Vol. 28
Issue 756
1-19
2025
Efficient clustering approach based on Gower distance for high-dimensional medical datasets
Salwa Shakir Baawi • Mustafa Noaman Kadhim• Dhiah Al-Shammary
Key frame (KF) extraction process is considered as a crucial task in video structure analysis, it plays important role in video summarization, content analysis, video compression and so on. This process aimed to give a good video summarization by extracting the frame or set of frames that provide a comprehensive representation for the video sequence and removing the other frames that are considered redundant. In this paper, a new method has been proposed, the proposed method utilized Histogram of Oriented Gradient (HOG) as feature and adaptive thresholding technique, enabling the detection of substantial changes in visual content between consecutive frames in each video shot. By calculating the HOG differences and applying adaptive threshold, the method divides the shot's frames into groups, then from each groups a key frame with substantial features is selected, while other frames are considered as redundant and removed. Experimental evaluations on different videos, shows that the proposed methods able to extract key frames that accurately reflect the video's primary content and produce a good summarization. The results reduce the redundancy while preserving the essential idea of the video.
Efficient feature selection based on Ruzicka similarity for EEG diagnosis
International Journal of Information Technology
Vol. 17
Issue 6
3373–3387
2025
Efficient feature selection based on Ruzicka similarity for EEG diagnosis
Sarah L Alzamili, Salwa Shakir Baawi, Mustafa Noaman Kadhim, Dhiah Al-Shammary, Ayman Ibaida
This work describes a novel feature selection approach for detecting epileptic seizures in an EEG dataset that is based on a Hilbert similarity measurement of a convex set. Because the medical dataset has a high dimensionality, feature selection is vital for identifying diseases early and protecting human health. Moreover, high-dimensional data affect the prediction accuracy of machine learning algorithms and increase the system's complexity, making the results inefficient. Therefore, 1: This research presents a feature selection model mainly based on 2: Hilbert mathematical similarity measurement for computing similarity and related features based on high harmony inside the same feature. Using the proposed model, 3: The similarity between signals is computed and optimal, and 4: High similarity features are selected, while 5: Ineffective features are removed. Using the EEG data from Bonn University, we assessed the performance of the system. In addition, we assessed the proposed model based on recall, precision, and accuracy, which was compared to other earlier methods. Research findings confirmed that it was effective at extracting optimum features from EEG data. The proposed model obtained 100% accuracy within 10% of the feature selection.
Categorization of Celebrity Photos Based on Deep Machine Learning for Feature Extraction and Classification
Wasit Journal of Computer and Mathematics Science
Vol. 4
Issue 1
1-16
2025
Categorization of Celebrity Photos Based on Deep Machine Learning for Feature Extraction and Classification
Salwa Shakir Baawi, Farah Jawad Al-Ghanim, Nisreen Ryadh Hamza
The use of strong encryption and hiding techniques is essential for secure digitalcommunication. The aim of this article is to give a thorough study of advanced steganography methodsdesigned especially for embedding information in Arabic text, which may be sensitive in nature and needsto be transmitted securely. This review is focused on four main criteria; embedding rate, invisibility,robustness against detection, and text quality. We conducted a comprehensive survey of text steganographyand examined its distinct characteristics, including the nature of the script, accents, and linguistic rules.Several researchers have utilized these distinct characteristics of Arabic writing for developing a numberof techniques for embedding information. This article gives a detailed analysis of the strengths, weaknesses,possible applications, and major challenges. The conclusions of this review can help researchers andpractitioners to select an ideal method according to their own needs regarding storage capacity, security,and text quality. Moreover, the article underscores the need for giving priority to Arabic text steganographywhich has become highly important as the use of hidden communication increases in Arabic-speakingcommunities, while also preserving the cultural and linguistic integrity
A Proposed Arabic Text Classification Model using Multi-Label System
Journal of Al-Qadisiyah for Computer Science and Mathematics
Vol. 15
Issue 3
117-129
2023
A Proposed Arabic Text Classification Model using Multi-Label System
Hussain A.Rahmana, Salwa S. Baawi
Journal of Al-Qadisiyah for Computer Science and Mathematics
Steganography is the practice of hiding data, such as images, videos, or text, within a cover image without it being detectable by the human eye. Several factors, such as the capacity, security, and robustness of the technique, are essential when transferring information using this method. In this study, we propose a new approach to image steganography that improves the Least Significant Bit (LSB) technique by utilizing images of 24 bits in each pixel. Improving the capacity and security of LSB-based steganography requires a combination of techniques, such as indirect embedding, embedding in multiple channels, and applying cryptographic and compression techniques. This approach conducts by encrypting each compressed secret message bit with the most significant bit of the red channel and then saving the output bit (hidden bit) in the least significant bit of the (green/blue) channel according to the row value (odd/ even). To further security, the suggested approach uses multi-level encryption; employs RSA to encrypt the secret message before applying the Huffman compression and encrypting the hidden bit by (XOR/XNOR) in the embedding method based on the row value of the pixel. Meanwhile, it is planned to use the Huffman coding technique to shorten the length of the encrypted message that will be inserted in the cover image. For the color photos in this work, the standard images were acquired from a standard dataset (USC-SIPI). The suggested approach performs better when measured in terms of mean square error (MSE), peak signal-to-noise ratio (PSNR), and comparison to findings from related prior efforts.
Comparative Analysis on Text Watermarking Techniques: Literature Review
Journal of Al-Qadisiyah for Computer Science and Mathematics
Vol. 14
Issue 4
111-120
2022
Comparative Analysis on Text Watermarking Techniques: Literature Review
Ahmed Abdulameer Naji , Salwa Shakir Baawi
Journal of Al-Qadisiyah for Computer Science and Mathematics
One of the common vast practices used for text authentication and ownership verification nowadays is through an information-hiding technique known as watermarking, which dates back to the 1990s. This paper investigates digital watermarking and its techniques that primarily focus on text watermarking. At the same time, it also tries to give a distinctive taxonomy in dealing with watermarking based on each technique. Two types of watermarking taxonomy were discussed, that includes of the type technique and attack. Techniques of text watermarking can then be further separated into three categories: embedding, approaches, and extraction. The approaches of text watermarking can be split into four categories: structure or format-based methods, text-image-based, zero-text watermarking, and linguistics ways. Ways belonging to each category were studied, and comparisons between each way are introduced by highlighting the findings. This paper also confirmed that there are principal requirements that require to be further explored and taken into account in the design of future watermarking systems, which are imperceptibility, capacity, security, and robustness.
An Image Watermarking Technique Proposed Based on Discrete Cosine Transformation and Pseudo-Random Generator
Journal of Education for Pure Science-University of Thi-Qar
Vol. 11
Issue 1
108-118
2021
An Image Watermarking Technique Proposed Based on Discrete Cosine Transformation and Pseudo-Random Generator
The proposed technique to embed an image watermark in a host image depends on the discrete cosine transform and the Geffe algorithm in this search. The proposed algorithm uses the Geffe algorithm for encryption of the watermark. Discrete cosine transform (DCT) coefficients are then computed by dividing the host image into non-overlapped 8x8 blocks. The performance evaluation of the suggested technique was conducted using the peak signal-to-noise ratio for watermark 64x64 and 512x512 host images: Lena, Cameraman, and Pepper images give similar results. This technique does not require a host image in the watermark extraction process, then also watermark is extracted completely and better than its counterparts available.