Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Page Not Found

Page not found. Your pixels are in another canvas.

Jupyter notebook markdown generator

Posts

Meduim

less than 1 minute read

Published: January 11, 2023

Check my ChatGPT article on Medium.

projects

GEIGER Cybersecurity Counter Project

EU Horizon 2020 Project

publications

Arabic Relation Extraction: A Survey

Published in IJCIT, Vol. 5, issue 5, 2016

Being the intersection between lexical and computational science, Natural Language Processing (NLP) has been earning a vast amount of attention in the past years. Relation Extraction is a well-studied subject when it comes to English language. However, due to the complexity of the Arabic language, it is challenging to extract relations from Arabic text. The foremost goal of this paper is to discuss the major techniques used in Arabic relation extraction and investigate their strengths and weaknesses in order to guide future research towards creating an enhanced convenient relation extraction algorithm.

Recommended citation: Injy Sarhan, Yasser El-Sonbaty, Mohamed Abou Elnasr, “Arabic Relation Extraction: A Survey”, International Journal of Computer and Information Technology. (IJCIT, Vol. 5, issue 5). https://www.ijcit.com/archives/volume5/issue5/Paper050503.pdf

Semi-Supervised Pattern Based Algorithm for Arabic Relation Extraction

Published in IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), 2016

While several relation extraction algorithms have been developed in the past decade, mainly in the English language, only few researchers target the Arabic language owing to its complexity and rich morphology. This paper proposes a semi-supervised pattern-based bootstrapping technique to extract Arabic semantic relation that lies between entities. In order to enhance the performance to suit the morphologically rich Arabic language, stemming, semantic expansion using synonyms, and an automatic scoring technique to measure the reliability of the generated patterns and extracted relations were used. To further improve performance, a dependency parser was then used to omit negative relations. The proposed system was tested by applying it to two corpora, which differ in both size and genre, scoring a highest F-measure of 75.06%. Furthermore, the effect of adding stemming and synonyms was also experimentally tested. The results show that this bootstrapping methodology achieves higher performance than existing state-of-the-art methods, and can be expanded to include more relations for use in various NLP tasks.

Recommended citation: Injy Sarhan, Yasser El-Sonbaty, Mohamed Abou Elnasr, “A Semi-Supervised Pattern-Based Algorithm for Arabic Relation Extraction”, 28th IEEE International Conference on Tools with Artificial Intelligence, San Jose, California, USA. (2016, Nov.). https://ieeexplore.ieee.org/document/7814596

Uncovering Algorithmic Approaches in Open Information Extraction

Published in 30th Benelux Conference on Artificial Intelligence, 2018

The explosion of mostly unstructured data has further motivated researchers to focus on Natural Language Processing (NLP), hereby encouraging the development of Information Extraction (IE) techniques that target the retrieval of crucial information from unstructured texts. In this paper we present a literature review on Open Information Extraction (OIE). We compare both machine learning and handcrafted rules-based algorithmic approaches and identify the recently proposed Neural OIE approach as a particularly promising area for further research.

Recommended citation: Injy Sarhan and Marco Spruit, “Uncovering Algorithmic Approaches in Open Information Extraction: A Literature Review”, 30th Benelux Conference on Artificial Intelligence, Hertogenbosch, The Netherlands. (2018, Nov.). https://www.researchgate.net/publication/333922642_uncovering_algorithmic_approaches_in_open_information_extraction

Contextualized Word Embeddings in a Neural Open Information Extraction Model

Published in International Conference on Applications of Natural Language to Information Systems, 2019

Open Information Extraction (OIE) is a challenging task of extracting relation tuples from an unstructured corpus. While several OIE algorithms have been developed in the past decade, only few employ deep learning techniques. In this paper, a novel OIE neural model that leverages Recurrent Neural Networks (RNN) using Gated Recurrent Units (GRUs) is presented. Moreover, we integrate the innovative contextual word embeddings into our OIE model, which further enhances the performance. The results demonstrate that our proposed neural OIE model outperforms the existing state-of-art on two datasets.

Recommended citation: Injy Sarhan and Marco R. Spruit. "Contextualized Word Embeddings in a Neural Open Information Extraction Model.", International Conference on Applications of Natural Language to Information Systems. Springer England. (2019, June). https://link.springer.com/chapter/10.1007/978-3-030-23281-8_31

Can We Survive without Labelled Data in NLP? Transfer Learning for Open Information Extraction

Published in Appl. Sci (2020), 10, 5758, 2020

Various tasks in natural language processing (NLP) suffer from lack of labelled training data, which deep neural networks are hungry for. In this paper, we relied upon features learned to generate relation triples from the open information extraction (OIE) task. First, we studied how transferable these features are from one OIE domain to another, such as from a news domain to a bio-medical domain. Second, we analyzed their transferability to a semantically related NLP task, namely, relation extraction (RE). We thereby contribute to answering the question: can OIE help us achieve adequate NLP performance without labelled data? Our results showed comparable performance when using inductive transfer learning in both experiments by relying on a very small amount of the target data, wherein promising results were achieved. When transferring to the OIE bio-medical domain, we achieved an F-measure of 78.0%, only 1% lower when compared to traditional learning. Additionally, transferring to RE using an inductive approach scored an F-measure of 67.2%, which was 3.8% lower than training and testing on the same task. Hereby, our analysis shows that OIE can act as a reliable source task.

Recommended citation: Injy Sarhan and Marco R. Spruit. “Can We Survive without Labelled Data in NLP? Transfer Learning for Open Information Extraction.”, Appl. Sci (2020), 10, 5758. https://www.mdpi.com/2076-3417/10/17/5758

SYMBALS: A Systematic Review Methodology Blending Active Learning and Snowballing

Published in Frontiers in Research Metrics and Analytics 6, 2021

Research output has grown significantly in recent years, often making it difficult to see the forest for the trees. Systematic reviews are the natural scientific tool to provide clarity in these situations. However, they are protracted processes that require expertise to execute. These are problematic characteristics in a constantly changing environment. To solve these challenges, we introduce an innovative systematic review methodology: SYMBALS. SYMBALS blends the traditional method of backward snowballing with the machine learning method of active learning. We applied our methodology in a case study, demonstrating its ability to swiftly yield broad research coverage. We proved the validity of our method using a replication study, where SYMBALS was shown to accelerate title and abstract screening by a factor of 6. Additionally, four benchmarking experiments demonstrated the ability of our methodology to outperform the state-of-the-art systematic review methodology FAST.

Recommended citation: Max van Haastrecht, Injy Sarhan, Bilge Yigit Ozkan, Matthieu Brinkhuis, and Marco Spruit. "SYMBALS: A Systematic Review Methodology Blending Active Learning and Snowballing.", Frontiers in research metrics and analytics 6 (2021). https://www.frontiersin.org/articles/10.3389/frma.2021.685591/full

A Threat-Based Cybersecurity Risk Assessment Approach Addressing SME Needs

Published in ARES 2021: The 16th International Conference on Availability, Reliability and Security, 2021

Cybersecurity incidents are commonplace nowadays, and Small- and Medium-Sized Enterprises (SMEs) are exceptionally vulnerable targets. The lack of cybersecurity resources available to SMEs implies that they are less capable of dealing with cyber-attacks. Motivation to improve cybersecurity is often low, as the prerequisite knowledge and awareness to drive motivation is generally absent at SMEs. A solution that aims to help SMEs manage their cybersecurity risks should therefore not only offer a correct assessment but should also motivate SME users. From Self-Determination Theory (SDT), we know that by promoting perceived autonomy, competence, and relatedness, people can be motivated to take action. In this paper, we explain how a threat-based cybersecurity risk assessment approach can help to address the needs outlined in SDT. We propose such an approach for SMEs and outline the data requirements that facilitate automation. We present a practical application covering various user interfaces, showing how our threat-based cybersecurity risk assessment approach turns SME data into prioritised, actionable recommendations.

Recommended citation: Max van Haastrecht, Injy Sarhan, Alireza Shojaifar, Louis Baumgartner, Wissam Mallouli, and Marco Spruit. “A Threat-Based Cybersecurity Risk Assessment Approach Addressing SME Needs.”, International Conference on Availability, Reliability, and Security (2021, August). https://dl.acm.org/doi/10.1145/3465481.3469199

Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph

Published in Knowledge-Based Systems, Volume 233, 2021

Instant analysis of cybersecurity reports is a fundamental challenge for security experts as an immeasurable amount of cyber information is generated on a daily basis, which necessitates automated information extraction tools to facilitate querying and retrieval of data. Hence, we present Open-CyKG: an Open Cyber Threat Intelligence (CTI) Knowledge Graph (KG) framework that is constructed using an attention-based neural Open Information Extraction (OIE) model to extract valuable cyber threat information from unstructured Advanced Persistent Threat (APT) reports. More specifically, we first identify relevant entities by developing a neural cybersecurity Named Entity Recognizer (NER) that aids in labeling relation triples generated by the OIE model. Afterwards, the extracted structured data is canonicalized to build the KG by employing fusion techniques using word embeddings. As a result, security professionals can execute queries to retrieve valuable information from the Open-CyKG framework. Experimental results demonstrate that our proposed components that build up Open-CyKG outperform state-of-the-art models.

Recommended citation: Injy Sarhan and Marco R. Spruit. “Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph.”, Knowledge-Based Systems (2021) 107524. https://www.sciencedirect.com/science/article/pii/S0950705121007863

talks

Conference Presentation: A Semi-Supervised Pattern-Based Algorithm for Arabic Relation Extraction

Published: November 06, 2016

Presented for the 28th IEEE International Conference on Tools with Artificial Intelligence.

Conference Presentation: Contextualized Word Embeddings in a Neural Open Information Extraction Model

Published: June 26, 2019

Presented for the International Conference on Applications of Natural Language to Information Systems.

Guest Lecture: Information Extraction in Data Analytics and Natural Language Processing

Published: October 01, 2019

Delivered a lecture on Information Extraction in Data Analytics and Natural Language Processing M.Sc. course.

Injy Sarhan

Sitemap

Pages

Posts

projects

publications

talks

teaching