LINK PREDICTION BASED ON TOPOLOGICAL AND CONTENT ANALYSIS IN CO-AUTHORSHIP NETWORKS
DOI:
https://doi.org/10.21271/ZJPAS.37.5.13Keywords:
Link Prediction; Co-authorship Networks; Social Network Analysis; Graph Mining; Text Mining.Abstract
In network analysis, the prediction of the connections or associations between entities or nodes within the network becomes important. Link Prediction is the problem of predicting or identifying the existence of a link between two entities in a network. However, it still the main issue in the complex network data application field, particularly in the type of analysis related to co-authorship networks despite its wide usage. Topological methods and content-based methods are the two different approaches that have been proposed for the link prediction in collaboration networks. However, topological methods are based on the structural analysis of the network, and content-based approaches rely on textual information from academic papers in the network. In this paper, we introduce the Content and Graph-Based Link Prediction (CGLP) approach, which integrates topological and content-based features from networks in a hybrid manner for predicting links in co-authorship networks. The efficacy of the proposed approach was already tested using three academic datasets: Hep-th, Hep-lat, and AMC by applying various machine learning models. Results indicated that all models showed almost the same efficiency on all three datasets and outperformed the state-of-the-art approach with a maximum F1 score of 98.05% and ROC AUC of 98.74%.
References
Affonso, F., Santiago, M. D. O. & Rodrigues Dias, T. M. 2022. Analysis of the evolution of scientific collaboration networks for the prediction of new co-authorships. Transinformação, 34, e200033.
Antunes, J. B., Antunes, J. B., Filho, H. F. B. P., Maia, R. D., De Queiroz, R. B. & Da Silva, C. M. R. CONPREDICT: A METHOD FOR LINK PREDICTION IN CO-AUTHORED CONTENT-BASED NETWORKS.
Bergmeir, C., Hyndman, R. J. & Koo, B. 2018. A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics & Data Analysis, 120, 70-83.
Blei, D. M. & Lafferty, J. D. 2007. A correlated topic model of science.
Borgs, C., Brautbar, M., Chayes, J., Khanna, S. & Lucier, B. The power of local information in social networks. International Workshop on Internet and Network Economics, 2012. Springer, 406-419.
Campbell, J. C., Hindle, A. & Stroulia, E. 2015. Latent Dirichlet allocation: extracting topics from software engineering data. The art and science of analyzing software data. Elsevier.
Chen, J., He, H., Wu, F. & Wang, J. Topology-aware correlations between relations for inductive link prediction in knowledge graphs. Proceedings of the AAAI conference on artificial intelligence, 2021. 6271-6278.
Chuan, P. M., Son, L. H., Ali, M., Khang, T. D., Huong, L. T. & Dey, N. 2018. Link prediction in co-authorship networks based on hybrid content similarity metric. Applied Intelligence, 48, 2470-2486.
Daud, N. N., Ab Hamid, S. H., Saadoon, M., Sahran, F. & Anuar, N. B. 2020. Applications of link prediction in social networks: A review. Journal of Network and Computer Applications, 166, 102716.
Do, P., Pham, P., Phan, T. & Nguyen, T. T-MPP: A Novel Topic-Driven Meta-path-Based Approach for Co-authorship Prediction in Large-Scale Content-Based Heterogeneous Bibliographic Network in Distributed Computing Framework by Spark. Intelligent Computing & Optimization 1, 2019. Springer, 87-97.
Esposito, C., Landrum, G. A., Schneider, N., Stiefl, N. & Riniker, S. 2021. GHOST: adjusting the decision threshold to handle imbalanced data in machine learning. Journal of Chemical Information and Modeling, 61, 2623-2640.
Goswami, S., Murthy, C. & Das, A. K. 2018. Sparsity measure of a network graph: Gini index. Information Sciences, 462, 16-39.
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H. & Bing, G. 2017. Learning from class-imbalanced data: Review of methods and applications. Expert systems with applications, 73, 220-239.
Hasin, H. A. & Hassan, D. 2022. Link prediction in co-authorship networks. Science Journal of University of Zakho, 10, 235-257.
Hassan, D. Supervised link prediction in co-authorship networks based on research performance and similarity of research interests and affiliations. 2019 International Conference On Machine Learning And Cybernetics (ICMLC), 2019. IEEE, 1-6.
Keshari, S., Rarani, Z. H., Kishore, A. & Das, J. 2025. Cracking the code of co-authorship networks geo-temporally using interpretable machine learning. bioRxiv.
Kong, X., Shi, Y., Yu, S., Liu, J. & Xia, F. 2019. Academic social networks: Modeling, analysis, mining and applications. Journal of Network and Computer Applications, 132, 86-103.
Kotsiantis, S., Kanellopoulos, D. & Pintelas, P. 2006. Handling imbalanced datasets: A review. GESTS international transactions on computer science and engineering, 30, 25-36.
Kumar, A., Mishra, S., Singh, S. S., Singh, K. & Biswas, B. 2020a. Link prediction in complex networks based on significance of higher-order path index (SHOPI). Physica A: Statistical Mechanics and its Applications, 545, 123790.
Kumar, A., Singh, S. S., Singh, K. & Biswas, B. 2020b. Link prediction techniques, applications, and performance: A survey. Physica A: Statistical Mechanics and its Applications, 553, 124289.
Kumari, A., Behera, R. K., Sahoo, K. S., Nayyar, A., Kumar Luhach, A. & Prakash Sahoo, S. 2022. Supervised link prediction using structured‐based feature extraction in social network. Concurrency and Computation: practice and Experience, 34, e5839.
Lande, D., Fu, M., Guo, W., Balagura, I., Gorbov, I. & Yang, H. 2020. Link prediction of scientific collaboration networks based on information retrieval. World Wide Web, 23, 2239-2257.
Lichtenwalter, R. N., Lussier, J. T. & Chawla, N. V. New perspectives and methods in link prediction. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010. 243-252.
Lü, L. & Zhou, T. 2011. Link prediction in complex networks: A survey. Physica A: statistical mechanics and its applications, 390, 1150-1170.
Mahesh, B. 2020. Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet], 9, 381-386.
Nasiri, E., Berahmand, K. & Li, Y. 2021. A new link prediction in multiplex networks using topologically biased random walks. Chaos, Solitons & Fractals, 151, 111230.
Newman, M. E. 2004. Coauthorship networks and patterns of scientific collaboration. Proceedings of the national academy of sciences, 101, 5200-5205.
Pandey, D., Niwaria, K. & Chourasia, B. 2019. Machine learning algorithms: A review. Machine Learning, 6.
Peng, S., Yang, H. & Yamamoto, A. 2024. BERT4FCA: A method for bipartite link prediction using formal concept analysis and BERT. Plos one, 19, e0304858.
Peng, Y.-L. & Lee, W.-P. 2021. Data selection to avoid overfitting for foreign exchange intraday trading with machine learning. Applied Soft Computing, 108, 107461.
Power, A., Burda, Y., Edwards, H., Babuschkin, I. & Misra, V. 2022. Grokking: Generalization beyond overfitting on small algorithmic datasets. arXiv preprint arXiv:2201.02177.
Quercia, D., Askham, H. & Crowcroft, J. Tweetlda: supervised topic classification and link prediction in twitter. Proceedings of the 4th Annual ACM Web Science Conference, 2012. 247-250.
Ramalingam, V., Dandapath, A. & Raja, M. K. 2018. Heart disease prediction using machine learning techniques: a survey. International Journal of Engineering & Technology, 7, 684-687.
Razzaq, S., Malik, A. K., Raza, B., Khattak, H. A., Zegarra, G. M. & Zelada, Y. F. D. 2022. Research collaboration influence analysis using dynamic co-authorship and citation networks. IJIMAI, 7, 103-116.
Resce, G., Zinilli, A. & Cerulli, G. 2022. Machine learning prediction of academic collaboration networks. Scientific Reports, 12, 21993.
Sachan, M. & Ichise, R. 2010. Using semantic information to improve link prediction results in network datasets. International Journal of Engineering and Technology, 2, 334.
Samad, A., Qadir, M., Nawaz, I., Islam, M. A. & Aleem, M. 2020. A comprehensive survey of link prediction techniques for social network. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 7, e3-e3.
Schizas, I. D. 2018. Graph filtering for data reduction and reconstruction. arXiv preprint arXiv:1809.09266.
Wang, P., Xu, B., Wu, Y. & Zhou, X. 2014. Link prediction in social networks: the state-of-the-art. arXiv preprint arXiv:1411.5118.
Wu, H., Wang, S. & Fang, H. LP-UIT: A Multimodal Framework for Link Prediction in Social Networks. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2021. IEEE, 742-749.
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C. & Yu, P. S. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32, 4-24.
Yin, N., Shen, L., Wang, M., Luo, X., Luo, Z. & Tao, D. 2023. Omg: Towards effective graph classification against label noise. IEEE Transactions on Knowledge and Data Engineering, 35, 12873-12886.
Yuliansyah, H., Othman, Z. A. & Bakar, A. A. 2020. Taxonomy of link prediction for social network analysis: a review. IEEE Access, 8, 183470-183487.
Zhang, M. & Chen, Y. 2018. Link prediction based on graph neural networks. Advances in neural information processing systems, 31.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Hajar A. Hasin, Diman Hassan, Ismael A. Ali

This work is licensed under a Creative Commons Attribution 4.0 International License.




