Submissions

Status

The Art of Perfect Bloom Structures

The Bloom filter is a space-efficient data structure for storing an approximation S′ to a set S such that S ⊆ S′ and any element not in S belongs to S′ with probability at most ǫ. This problem happens in many areas and a Bloom filter has widespread use. In this paper we present a practical implementation of a theoretical result that provides the same functionality of a Bloom filter for static sets.

Accepted

Efficient N-gram Language Modeling Algorithms with Applications to Statistical Machine Translation, Speech Recognition, and Optical Character Recognition

Accepted
 

The Gentle Early Detection Congestion Algorithm

The developments in computer networks in recent days such as the internet have increased rapidly. Connections of these networks necessitate resources in order to send their data to their prospective destinations. Further, the connections require high speed router buffers which they route data in high speed. Congestion is one of the main issues that occur at the router buffer cause deterioration of the network performance, i.e. increasing average waiting time, decreasing throughput, etc. Gentle Random Early Detection (GRED) is one of the known congestion control algorithms proposed to detect congestion before the router buffer overflows. In fact, GRED improves the setting of the parameters for the maximum threshold position ( threshold max ) at the router buffer and the maximum value for the packet dropping probability. This paper presents an Adaptive GRED algorithm that detects congestion at router buffers in an preliminary stage, and enhances the parameters setting of the threshold max and the max_D . During congestion, the simulation results reveal that the Adaptive GRED drops fewer packets than GRED, and it marginally offers better performance results than either GRED or BLUE-Linear analytical model with regard to mean queue length and average queuing delay when heavy congestion is existed.

Under Review
 

VoIP Technologies

Under Review

SESAME and linkSCEEM as Regional Research Opportunities in Jordan

Under Review

WiMap: an Efficient Wi-Fi Access Point Localization Mechanism

The wide-spread of Wi-Fi clouds and hotspots in public places such as parks, cafés, and campuses has created high user dependency on such networks due to its low cost and easy access to the Internet. In addition, the popularity of multimedia-based Internet applications such as video calls and stored multimedia streaming requires relatively high-bandwidth access, which in turn demands that the client be within a close proximity of the Wi-Fi access point. This paper presents WiMap; a simple and efficient Wi-Fi access point localization mechanism for unknown environments. WiMap is based on randomly collecting and processing a relatively small set of signal strength measurements at known locations within the coverage areas of a Wi-Fi access point. The experimental testing demonstrated that WiMap can improve the positioning accuracy of existing similar mechanisms by about 50% with only 20% of the cost. Using a filtered set of 12 out of 60 random signal strength measurements, WiMap can achieve an average positioning error of 1.37m with 80% of the errors are below 1.5m and 97% of the errors below 2m.

Accepted

Effects of Input Plaintext Patterns and Storage on the Performance of AES-128 ECB Encryption on GPU

In the recent years, the Graphics Processing Units (GPUs) have gained popularity for general purpose applications, immensely outperforming traditional optimized CPU based implementations. A class of such applications implemented on GPUs to achieve faster execution than CPUs include cryptographic techniques like the Advanced Encryption Standard (AES) which is a widely deployed symmetric encryption/decryption scheme in various electronic communication domains. With the dras-tic advancements in electronic communication technology, and growth in the user space, the size of data exchanged electronically has increased substantially. So, such cryptographic techniques become a bottleneck to fast transfers of information. In this work, we implement the AES-128 ECB Encryption on two of the recent and advanced GPUs (NVIDIA Quadro FX 7000 and Tesla K20c) with different memory usage schemes and varying input plaintext sizes and patterns. We obtained a speedup of up to 87x against an advanced CPU (Intel Xeon X5690) based implementation. Moreover, our experiments reveal that the different degrees of pattern repetitions in input plaintext affect the encryption performance on GPU.

Under Review

 

Thrust: C++ template library for CUDA

Accepted

Bank Conflict Free Access for Matrix Transposition Without Padding on GPU

The advances of Graphic Processing Units (GPU) technology and the introduction of CUDA programming model facilitates developing new solutions for sparse and dense linear algebra solvers. Matrix Transpose is an important linear algebra procedure that has deep impact in various computational science and engineering applications. Several factors hinder the expected performance of large matrix transpose on GPU devices. The degradation in performance involves the memory access pattern such as coalesced access in the global memory and bank conflict in the shared memory of streaming multiprocessors within the GPU. In this paper, two matrix transpose algorithms are proposed to alleviate the aforementioned issues of ensuring coalesced access and conflict free bank access. The proposed algorithms have comparable execution times with the NVIDIA SDK bank conflict - free matrix transpose implementation. The main advantage of proposed algorithms is that they eliminate bank conflicts while allocating shared memory exactly equal to the tile size (T x T) of the problem space. However, to the best of our knowledge an extra space of Tx(T+1) needs to be allocated in the published research. We have also applied the proposed transpose algorithm to recursive Gaussian implementation of NVIDIA SDK and achieved about 6% improvement in performance.

Under Review

MASSIVELY PARALLEL DATA PROCESSING USING CUDA

Under Review

Fast Hash Table Lookup Using Extended Bloom Filter

Hash tables are fundamental components of several network processing algorithms and applications, including route lookup, packet classification, per-flow state management and network monitoring. These applications, which typically occur in the data-path of high-speed routers, must process and forward packets with little or no buffer, making it important to maintain wire-speed throughout. A poorly designed hash table can critically affect the worst-case throughput of an application, since the number of memory accesses required for each lookup can vary. Hence, high throughput applications require hash tables with more predictable worst-case lookup performance. While published papers often assume that hash table lookups take constant time, there is significant variation in the number of items that must be accessed in a typical hash table search, leading to search times that vary by a factor of four or more. We present a novel hash table data structure and lookup algorithm which improves the performance over a naive hash table by reducing the number of memory accesses needed for the most time-consuming lookups. This allows designers to achieve higher lookup performance for a given memory bandwidth, with-out requiring large amounts of buffering in front of the lookup engine. Our algorithm extends the multiple-hashing Bloom Filter data structure to support exact matches and exploits recent advances in embedded memory technology. Through a combination of analysis and simulations we show that our algorithm is significantly faster than a naive hash table using the same amount of memory, hence it can support better throughput for router applications that use hash tables.

 
Under Review

Real-Time Tweet Search Engine with Perfect Bloom Structures

The demand for real-time search engine against a high-velocity stream of incoming documents such as Twitter is rising. Users desire a filtered list of relevant results as fast as the query is typed or spoken. We introduce Perfect Bloom Structures that can represent keywords with a constant false positive rate. Our results empirically show the tradeoff defined by retrieval quality, effectiveness, ranking speed, and succint memory footprint for this novel data structure.

Accepted

Agile Analysis for Big Data

As massive data acquisition and storage becomes increasingly affordable, many enterprises are resorting to sophisticated and agile data analysis. This paper presents data-parallel algorithms for sophisticated statistical techniques and database system features that enable agile design and flexible algorithm development. SQL and MapReduce interfaces over key value storage mechanisms are compared.

Under Review
 

OPTIMAL SEQUENCE PATTERN SEARCH FOR TIME SERIES DATABASE SYSTEMS - Two Way Approach

Many database applications require processing and analyzing sequential data which mainly focus on finding typical patterns and trends in the sequential data. The algorithm of Boyer Moore used to search for pattern of interest is considered and we generalize it to search for the complex constraint based sequential patterns of interest in a given database which can be even a time series database by considering the logical interdependencies between the elements of a sequential pattern. We search for sequence of tuple with rich structure and infinite possibilities from both ends of the sequential database. The performance of the algorithm is naturally better than naïve algorithm as it is not required to pass over each element in the input to search for the required pattern. Also the number of comparisons done using the proposed algorithm is naturally less than the OPS algorithm using generalization of KMP algorithm [4] and has better performance even when the pattern is in the end for large databases.

Under Review
 

Efficient Distributed Malware Detection

We present the design and implementation of a novel anti-malware system that performs an additional screening step prior to the signature matching phase found in existing approaches. The screening step filters out most non-infected files (90%) and also identifies malware signatures that are not of interest (99%). The screening step significantly improves end-to-end performance because safe files are quickly identified and are not processed further, and malware files can subsequently be scanned using only the signatures that are necessary. Our approach naturally leads to a network-based anti-malware solution in which clients only receive signatures they needed, not every malware signature ever created as with current approaches. We have implemented the algorithm as an extension to ClamAV, the most popular open source anti-malware software. For the current number of signatures, our implementation is 2 faster and requires 2 less memory than the original ClamAV. These gaps widen as the number of signatures grows.

Under Review
EFFICIENT DOCUMENT CLUSTERING USING HYBRID XOR LOGIC

Clustering is one of the method which is gaining lot of practical importance into many applications such as software reuse, medical diagnosis , text classification and many. We provide a new approach for clustering of a set of given documents or text files or software components based on the new similarity function called hybrid XOR function defined for the purpose of finding degree of similarity among two document sets. We construct a matrix called similarity matrix of the order n-1 by n for n document sets given by applying hybrid XOR function for each pair of document sets. We define and design the algorithm for document clustering which has the input as similarity matrix and output being set of clusters formed dynamically as compared to other clustering algorithms that predefine the count of clusters and documents being fit to one of those clusters or classes finally. The approach can be justified as it carries out very simple computational logic and efficient in terms of processing.

Under Review
 

Database Design Tool An E-learning Tool to Perform Conventional and Automated Normalization

In the literature several textbooks and technical papers have been published with an aim to explore normalization. But most of them have restricted their work only to the definition of various normal forms and left the students or readers to normalize the relations. However most of the readers fail to understand the database design process which is extremely essential for a CSE/IT/IS curricula student. Some authors made an effort to bring the concept of e-learning tools but their work was restricted to only specified number of functional dependencies say 5 to 10. In this work we have come out with a interactive web based Normalization tool which shows conventional normalization process step by step and also automated method of normalization using tabular form using dynamic programming principle which can handle as many as 30 redundant attributes in the FDs and more than 50 complex functional dependencies presently. The Database Design tool that we have developed can form an asset to faculty, students and can even be helpful for the Database Design engineers of the industry to verify their work.

Under Review

 

Constraint Based Sequential Pattern Mining with Three Lookaheads in Time Series Databases

Most of the algorithms available in literature are 99% text based and do not concentrate on finding sequential patterns with constraints in a given database. Also in the query languages such as SQL available today the select clause does not allow using non aggregate functions as part of query compilation. we propose the pattern matching algorithm based on pre-processing of the sequence pattern by considering three consecutive characters of the input that immediately follow the aligned pattern window in an event of mismatch between pattern and database input. The algorithm makes use of two sliding sequential patterns with three look ahead's used in the event of mismatch.

Under Review

 

Effective Retrieval with Concept Maps

We review state of the art of effective high precision retrieval with using large network of nodes maintained on external storage, or distributed over the net. The network fashioned as a distributed concept map acts as a huge search graph.

Accepted

 

Information Security Audit in Virtual Environment

IT environment due to the regulatory mandates, contractual obligations, and other compliance requirements. Virtualization has gained immense popularity due to its economic benefits and other characteristics such as scalability, availability and high performance. However, auditing in virtual environment has become complex due to the creation of abstractions that change the dynamics of architecture, administrator privileges and system separation. Due to the immaturity of standards and frameworks for auditing in virtual environment, auditors are unaware of the clear process to perform a comprehensive audit for attestation. Hence, in this article, the authors have proposed an audit process and framework for a successful audit in virtual environment.

Under Review

 

State Space reduction of asymmetric TSP problem

The solution of TSP problem using Branch and Bound is a NP Hard problem. And if it is a complete graph it turns out to be further complex. The solution to asymmetric TSP is shown in this work that reduces the size of state space tree.

Under Review

 

Web based ETL Tool with Secure Extraction Phase for Database File Transmission Using Microsoft 8-Bit Extended ASCII Character Set

Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration. The conventional process of developing custom code or scripts for this is always a costly, error prone and time consuming. There are several ETL Tools in the market but to the best of our knowledge none of them incorporates security feature. In this paper we extend propose a web based ETL framework [2] with added feature of security at extraction phase along with unique feature of preconfigured multi source connection which can be stored and used in future if needed to perform sequence of transformations [5]. A viewable transformation report with time taken to perform the transformations and mapping source to target metadata is made available that provides scope to user to measure data quality and accuracy. Also new feature of entire loading process of data movement from source to target system is made visible to the user.

Under Review
 

Big Data Analytics: Opportunities and Challenges

In the era of information explosion, enormous amounts of data have become available on hand to decision makers. Big data refers to datasets that grow so huge that they become difficult to handle using traditional tools and techniques. Due to the rapid growth of such data, solutions need to be studied and provided in order to handle and extract value and knowledge from these datasets. Such value can only be provided by using big data analytics, which is the application of advanced analytics techniques on big data. This paper aims to analyze the different methods and tools which can be applied to big data, as well as the opportunities provided and the challenges which much be faced.

Under Review
 

Understanding OpenStack as an IaaS Cloud Platform

This paper discusses an opensource IaaS based cloud, OpenStack. The aim of the paper is to define the OpenStack cloud platform,
describe its components, salient features and the services it provides to the users as an IaaS cloud platform.

Under Review
 

Machine Learning Approach to Detect Malicious Links in Arabic Web pages

The largest reservoir the humanity ever known is the Internet which used by several billion users worldwide. This reservoir has credible, accurate, and spam. So we have to invent techniques to filter the Internet from spam information, to guarantee the flow of trusted information to billion of users worldwide, regardless of the language used to express this information. There are many websites post misleading information and embedding malicious links or Trojans. This may eventually hurt such website reputation and the users may hence get distorted.  This study focuses on proposing an approach to detect the malicious links in Arabic Web pages. We analyzed the impact of some Web metrics on these links, and evaluating them using machine learning classifiers.

Under Review
 

Secured Anti Bot Verification Scheme

CAPTCHA stands for Completely Automated Public Turing Tests to Tell Computers and Humans Apart. A CAPTCHA is a program that protects websites against bots –automated scripts by generating and grading tests that humans can pass but current computer programs cannot. The aim is to allow the server to identify the visitor is a human or a computer, and only provide services to human. It can improve the current server system and user information security.The increase in bots breaking CAPTCHAs shows the ineffectiveness of the text-based CAPTCHAs that are used on most websites and Webmail services today. Bots can easily read the distorted letters and words using optical character recognition (OCR) or break the CAPTCHA using a dictionary attack. The weakness of each CAPTCHA scheme is summarized and accordingly we make an approach to build our CAPTCHA scheme. Considering the case study results and including other points which may pose difficulty for the OCR systems. In this paper we proposed a new technique to build a CAPTCHA which is hybrid (both Picture and Text based with multiple fonts). An image is being rendered on the screen and many text labels of multiple fonts drawn over it. A user has to identify the correct name of the underlying image among the set of text labels that are scattered over it, in order to pass a human verification test. We proposed to use multiple fonts for each letter of a single word inside a Captcha which increases the more complexity of training OCR software.

Under Review

Bloom Filters – Short Tutorial

Accepted

A Reference Architecture for Building Highly Available and Scalable Web Application on Amazon Cloud

Building a highly available and scalable web application is not an easy process. Developers should design the system to handle both expected and unexpected failures. Furthermore, the system should be scalable to meet the extra load that might happen. Building such an application using traditional web hosting is very costly, that is why entrepreneurs with new web application ideas start to build their applications on the cloud. This paper introduces a reference architecture and best practices to build highly available and scalable web applications on Amazon Cloud. A case study is used to validate the architecture and to illustrate how the best practices are implemented

Under Review

A HEURISTIC MATHEMATICAL DECISION SUPPORT MODEL FOR NoSQL DATABASE ADOPTABILITY

Many factors should be taken into consideration while deciding whether to adopt an RDBMS or NoSQL database. This paper introduces the most important differences between RDBMS and NoSQL databases and their usage. In addition, the paper presents a heuristic mathematical model to help in choosing between NoSQL and RDBMS. The model functions as a decision support tool and is designed to factor in the features of NoSQL and RDBMS databases, as well as quantifies developers’ needs of these features in their projects.

Under Review

Which NoSQL Database Do I Need? A Reference Framework for NoSQL Database Selection

Many NoSQL databases products are available for developers to choose from. These products are categorized under four types: Key/Value, Graph, Column Family, and Document Based data store. This paper describes the features and the differences between the four NoSQL types. In addition, the paper compares multiple NoSQL database products (MongoDB, CouchDB, DynamoDB, Riak, HBase, Cassandra, Titan, and Neo4j) and provides a framework to be used as a reference for developers to help choose the right product and understand its features.

Under Review