81818app手机版下载

Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

87 Full-Text Articles 233 Authors 0 Downloads 40 Institutions

All Articles in Data Science

Faceted Search

87 full-text articles. Page 1 of 5.

Detecting Hacker Threats: Performance Of Word And Sentence Embedding Models In Identifying Hacker Communications, Susan McKeever, Brian Keegan, Andrei Quieroz 2020 Technological University Dublin 81818app手机版下载

Detecting Hacker Threats: Performance Of Word And Sentence Embedding Models In Identifying Hacker Communications, Susan Mckeever, Brian Keegan, Andrei Quieroz

Conference papers

Abstract—Cyber security is striving to find new forms of protection against hacker attacks. An emerging approach nowadays is the investigation of security-related messages exchanged on deep/dark web and even surface web channels. This approach can be supported by the use of supervised machine learning models and text mining techniques. In our work, we compare a variety of machine learning algorithms, text representations and dimension reduction approaches for the detection accuracies of software-vulnerability-related communications. Given the imbalanced nature of the three public datasets used, we investigate appropriate sampling approaches to boost detection accuracies of our models. In addition, we ...


Imaging Data On Characterization Of Retinal Autofluorescent Lesions In A Mouse Model Of Juvenile Neuronal Ceroid Lipofuscinosis (Cln3 Disease), Qing Jun Wang, Kyung Sik Jung, Kabhilan Mohan, Mark E. Kleinman 2020 University of Kentucky 81818app手机版下载

Imaging Data On Characterization Of Retinal Autofluorescent Lesions In A Mouse Model Of Juvenile Neuronal Ceroid Lipofuscinosis (Cln3 Disease), Qing Jun Wang, Kyung Sik Jung, Kabhilan Mohan, Mark E. Kleinman

Ophthalmology and Visual Science Faculty Publications

Juvenile neuronal ceroid lipofuscinosis (JNCL, aka. juvenile Batten disease or CLN3 disease), a lethal pediatric neurodegenerative disease without cure, often presents with vision impairment and characteristic ophthalmoscopic features including focal areas of hyper-autofluorescence. In the associated research article “Loss of CLN3, the gene mutated in juvenile neuronal ceroid lipofuscinosis, leads to metabolic impairment and autophagy induction in retinal pigment epithelium” (Zhong et al., 2020) [1], we reported ophthalmoscopic observations of focal autofluorescent lesions or puncta in the Cln3Δex7/881818app手机版下载 mouse retina at as young as 8 month old. In this data article, we performed differential interference contrast and confocal ...


A Tree Frog (Boana Pugnax) Dataset Of Skin Transcriptome For The Identification Of Biomolecules With Potential Antimicrobial Activities, Yamil Liscano Martinez, Claudia Marcela Arenas Gómez, Jeramiah J. Smith, Jean Paul Delgado 2020 Universidad de Antioquia, Colombia 81818app手机版下载

A Tree Frog (Boana Pugnax) Dataset Of Skin Transcriptome For The Identification Of Biomolecules With Potential Antimicrobial Activities, Yamil Liscano Martinez, Claudia Marcela Arenas Gómez, Jeramiah J. Smith, Jean Paul Delgado

Biology Faculty Publications

Increases in the prevalence of multiply resistant microbes have necessitated the search for new molecules with antimicrobial properties. One noteworthy avenue in this search is inspired by the presence of native antimicrobial peptides in the skin of amphibians. Having the second highest diversity of frogs worldwide, Colombian anurans represent an extensive natural reservoir that could be tapped in this search. Among this diversity, species such as Boana pugnax81818app手机版下载 (the Chirique-Flusse Treefrog) are particularly notable, in that they thrive in a diversity of marginal habitats, utilize both aquatic and arboreal habitats, and are members of one of few genera that are ...


Topic Modeling To Understand Technology Talent, Chad Madding, Allen Ansari, Chris Ballenger, Aswini Thota 2020 Southern Methodist University

Topic Modeling To Understand Technology Talent, Chad Madding, Allen Ansari, Chris Ballenger, Aswini Thota

SMU Data Science Review

Attracting technology talent in today’s hiring climate is more complicated than ever. Recruiting for technology talent in non-technology industries is even more challenging. This intense hiring landscape is motivating companies not only to attract the right talent but also to create a culture that can retain and grow that talent. In this paper, we developed algorithms and present insights that use data provided in reviews to glean information employers can use to address or even change their priorities to meet the demands of an ever-changing job market. The core of our research is to investigate and attribute the role ...


Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed 2020 Southern Methodist University 81818app手机版下载

Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed

SMU Data Science Review

81818app手机版下载Music is incorporated into our daily lives whether intentional or unintentional. It evokes responses and behavior so much so there is an entire study dedicated to the psychology of music. Music creates the mood for dancing, exercising, creative thought or even relaxation. It is a powerful tool that can be used in various venues and through advertisements to influence and guide human reactions. Music is also often "borrowed" in the industry today. The practices of sampling and remixing music in the digital age have made cover song identification an active area of research. While most of this research is focused ...


Time Series Analysis Of Offshore Buoy Light Detection And Ranging (Lidar) Windspeed Data, Aditya Garapati, Charles J. Henderson, Carl Walenciak, Brian T. Waite 2020 Southern Methodist University

Time Series Analysis Of Offshore Buoy Light Detection And Ranging (Lidar) Windspeed Data, Aditya Garapati, Charles J. Henderson, Carl Walenciak, Brian T. Waite

SMU Data Science Review

In this paper, modeling techniques for the forecasting of wind speed using historical values observed by Light Detection and Ranging (LIDAR) sensors in an offshore context are described. Both univariate time series and multivariate time series modeling techniques leveraging meteorological data collected simultaneously with the LIDAR data are evaluated for potential contributions to predictive ability. Accurate and timely ability to predict wind values is essential to the effective integration of wind power into existing power grid systems. It allows for both the management of rapid ramp-up / down of base production capacity due to highly variable wind power inputs and integration ...


Toxic Language Detection Using Robust Filters, Deepti Kunupudi, Shantanu Godbole, Pankaj Kumar, Suhas Pai 2020 Southern Methodist University(SMU) 81818app手机版下载

Toxic Language Detection Using Robust Filters, Deepti Kunupudi, Shantanu Godbole, Pankaj Kumar, Suhas Pai

SMU Data Science Review

Social networks sometimes become a medium for threats, insults, and other types of cyberbullying. A large number of people are involved in online social networks. Hence, the protection of network users from anti-social behavior is a critical activity [19]. One of the significant tasks of such activity is the detection of toxic language. Abusive/Toxic language in user-generated online content has become an issue of increasing importance in recent years. Most current commercial methods use blacklists and regular expressions; however, these measures fall short when contending with more subtle, lesser-known examples of hate speech, profanity, or swearing[6]. Abusive language ...


Reducing Age Bias In Machine Learning: An Algorithmic Approach, Adriana Solange Garcia de Alford, Steven K. Hayden, Nicole Wittlin, Amy Atwood 2020 Southern Methodist University 81818app手机版下载

Reducing Age Bias In Machine Learning: An Algorithmic Approach, Adriana Solange Garcia De Alford, Steven K. Hayden, Nicole Wittlin, Amy Atwood

SMU Data Science Review

In this paper, we study the prevalence of bias in machine learning; we explore the life cycle phases where bias is potentially introduced into a machine learning model; and lastly, we present how adversarial learning can be leveraged to measure unwanted bias and unfair behavior from a machine learning algorithm. This study focuses particularly on the topics of age bias in predicting employee attrition and presents a practical approach for how adversarial learning can be successful in mitigating age bias. To measure bias, we calculate group fairness metrics across five-year age groups and evaluate fairness between a baseline predictive model ...


Forecasting Spare Parts Sporadic Demand Using Traditional Methods And Machine Learning - A Comparative Study, Bhuvana Adur Kannan, Ganesh Kodi, Oscar Padilla, Dough Gray, Barry C. Smith 2020 Southern Methodist University

Forecasting Spare Parts Sporadic Demand Using Traditional Methods And Machine Learning - A Comparative Study, Bhuvana Adur Kannan, Ganesh Kodi, Oscar Padilla, Dough Gray, Barry C. Smith

SMU Data Science Review

81818app手机版下载Sporadic demand presents a particular challenge to traditional time forecasting methods. In the past 50 years, there has been developments, such as, the Croston Model [3], which has improved forecast performance. With the rise of Machine Learning (ML) there is abundant research in the field of applying ML algorithms to predict sporadic demand [8][12][9]. However, most existing research has analyzed this problem from the demand side [17]. In this paper, we tackle this predictive analytics challenge from the supply side. We perform a comparative analysis utilizing a spare parts demand dataset from an Original Equipment Manufacturer (OEM). Since ...


Floor Regularization And Investigation Of Transfer Learning Through Sharing Of Probability Distribution Parameters, Daniel Byrne, Stacey Smith, Joanna Duran, John Santerre 2020 Southern Methodist University 81818app手机版下载

Floor Regularization And Investigation Of Transfer Learning Through Sharing Of Probability Distribution Parameters, Daniel Byrne, Stacey Smith, Joanna Duran, John Santerre

SMU Data Science Review

81818app手机版下载In this work we introduce a simple new regularization technique, aptly named Floor, which drops low weight connections on every forward pass whenever they fall below a specified event horizon threshold. We compare the results of this technique side by side on identical network architectures between regular Dropout and Floor algorithms. We report similar or improved regularization, with the Floor algorithm versus regular Dropout and/or in concert with regular Dropout.

81818app手机版下载In this paper we also describe our research into transfer learning by sharing of probability distribution parameters in which we investigated methods of transferring Gaussian prior parameters derived from ...


Impact Of Lost Gas Tax Revenue Due To Sale Of Electric Vehicles: Analysis And Recommendations For The 50 States, Jennifer Ricciuti 2020 La Salle University

Impact Of Lost Gas Tax Revenue Due To Sale Of Electric Vehicles: Analysis And Recommendations For The 50 States, Jennifer Ricciuti

Analytics Capstones

81818app手机版下载Although states might have policy reasons to encourage the use of Electric Vehicles (EVs), the impact of future U.S. EV sales present a significant loss of gas tax revenue for each of the states, as these vehicles do not require gas to operate. For the last three years the number of Electric Vehicle registrations have doubled and are steadily increasing as a result of people becoming more economically and ecologically minded. This is proving to be an optimal choice for car purchasers over standard Internal Combustion Engine (ICE) vehicles, as research has shown that Electric Vehicles are superior for ...


Tommy John Surgery: Potential Risk Factors And Causes In Major League Pitchers, Ethan Rhinehart 2020 La Salle University

Tommy John Surgery: Potential Risk Factors And Causes In Major League Pitchers, Ethan Rhinehart

Analytics Capstones

81818app手机版下载Since 1974, over 270 Tommy John surgeries have been performed on pitchers at the major league level. Thousands more surgeries have been performed on minor league, college, high school and youth pitchers. As more biomechanical and statistical research has been conducted over the past few decades, a clearer picture of some of the risks and causes that lead to serious elbow injuries in pitchers have been found. This paper explores the research surrounding several of those factors, including pitching mechanics, pitch velocity, and pitch type. Using a data set comprised of major league pitchers that have undergone Tommy John surgery ...


The Transcript Profile Changes With Developmental Maturation Of Fetal Lung Type 2 Cells: An Analysis Of Rnaseq Data, Heber C. Nielsen, Volodymyr Orlov, Rebecca Holsapple, Monnie McGee 2020 Southern Methodist University 81818app手机版下载

The Transcript Profile Changes With Developmental Maturation Of Fetal Lung Type 2 Cells: An Analysis Of Rnaseq Data, Heber C. Nielsen, Volodymyr Orlov, Rebecca Holsapple, Monnie Mcgee

SMU Data Science Review

In this paper, we utilize next-generation sequencing (NGS) data from the LungMap project to identify and characterize the developmental RNA transcriptome in alveolar epithelial type II cells of embryonic mouse lungs of gestational ages embryonic days 16 (E16) and 18 (E18). Late gestation lung cellular maturation is necessary for survival at birth. Using R and the BioConductor packages for RNAseq analysis, we analyze changes in the mouse lung RNA transcriptome as this maturation process takes place. We particularly identify the cluster of genes whose expression changes markedly between immature (E16) and mature (E18) lungs which can be used to define ...


Forecasting Power Consumption In Pennsylvania During The Covid-19 Pandemic: A Sarimax Model With External Covid-19 And Unemployment Variables, Jackson Au, Javier Saldaña Jr., Ben Spanswick, John Santerre 2020 Southern Methodist University 81818app手机版下载

Forecasting Power Consumption In Pennsylvania During The Covid-19 Pandemic: A Sarimax Model With External Covid-19 And Unemployment Variables, Jackson Au, Javier Saldaña Jr., Ben Spanswick, John Santerre

SMU Data Science Review

81818app手机版下载In this paper, we present how electrical consumption can reveal insight into the novel COVID-19 pandemic spread. We analyze electrical power consumption provided by PPL Electric Utilities, Department of Labor’s unemployment claims, and the COVID-19 cases/deaths for the State of Pennsylvania to study the impact of the pandemic on the infrastructure. Using a SARIMA model as our benchmark and we analyzed the use of a SARIMAX model to forecast the power consumption in Pennsylvania 14 days ahead. Our work quantifies and illuminates the effect that the strict legislation passed to minimize the spread of COVID19 had a on ...


Compressed Dna Representation For Efficient Amr Classification, John Partee, Robert Hazell, Anjli Solsi, John Santerre 2020 SMU

Compressed Dna Representation For Efficient Amr Classification, John Partee, Robert Hazell, Anjli Solsi, John Santerre

SMU Data Science Review

81818app手机版下载In this paper, we explore a representation methodology for the compression of DNA isolates. Using lossless string compression via tokenization of frequently repeated segments of DNA, we reduce the length of the isolates to be counted as k-mers for classification. With this new representation, we apply a previously established feature sampling method to dramatically reduce the feature space. In understanding the genetic diversity, we also look at conserving biological function across these spaces. Using a random forest model we were able to predict the resistance or susceptibility of bacteria with 85-90\% accuracy, with a 30-50\% reduction in overall isolate length ...


Spoken Language Recognition On Open-Source Datasets, Brady Arendale, Samira Zarandioon, Ryan Goodwin, Douglas Reynolds 2020 Southern Methodist University 81818app手机版下载

Spoken Language Recognition On Open-Source Datasets, Brady Arendale, Samira Zarandioon, Ryan Goodwin, Douglas Reynolds

SMU Data Science Review

81818app手机版下载The field of speaker and language recognition is constantly being researched and developed, but much of this research is done on private or expensive datasets, making the field more inaccessible than many other areas of machine learning. In addition, many papers make performance claims without comparing their models to other recent research. With the recent development of public multilingual speech corpora such as Mozilla's Common Voice as well as several single-language corpora, we now have the resources to attempt to address both of these problems. We construct an eight-language dataset from Common Voice and a Google Bengali corpus as ...


Predicting Attrition - A Driver For Creating Value, Realizing Strategy, And Refining Key Hr Processes, Kevin Mendonsa, Maureen Stolberg, Vivek Viswanathan, Scott Crum 2020 Southern Methodist University 81818app手机版下载

Predicting Attrition - A Driver For Creating Value, Realizing Strategy, And Refining Key Hr Processes, Kevin Mendonsa, Maureen Stolberg, Vivek Viswanathan, Scott Crum

SMU Data Science Review

Talent is the most important asset for every organization's success. While attrition (or churn) and turnover can refer to both employees and customers, this paper will focus on employee attrition only. Many organizations accept attrition as an inevitable cost of doing business and do nothing to adopt or implement mitigating strategies to combat it. World class companies on the other hand take deliberate measures to understand, control and mitigate attrition (turnover) at every stage. Unmitigated attrition can have a devastating effect on an organization's bottom line and market value. In addition, the “invisible" costs of low employee morale ...


An Effective Method For Attribute Subset Selection, Considering The Resource In Pattern Recognition, Bakhtiyorjon Bakirovich Akbaraliev 2020 Tashkent University of Information Technologies named after Muhammad al-Khwarizimi Address: Amir Temur st., 1002000, Tashkent city, Republic of Uzbekistan E-mail:b.akbaraliev@gmail.com; b.akbaraliev@tuit.uz, Phone:+998-93-376-54-00. 81818app手机版下载

An Effective Method For Attribute Subset Selection, Considering The Resource In Pattern Recognition, Bakhtiyorjon Bakirovich Akbaraliev

Chemical Technology, Control and Management

An analytical method for determining informative sets of features (INP) is developed, taking into account the resource for criteria based on the use of a measure of dispersion of classified objects. The areas of existence of the solution are defined. The statements and properties for the Fischer-type information criterion are proved, using which the proposed analytical method for determining the INP guarantees optimal results in the sense of maximizing the selected functional. The appropriateness of choosing this type of informative criterion is justified. A method for transforming attributes is proposed. The universality of the method in relation to the type ...


Fast Streaming K-Means Clustering With Coreset Caching, Yu Zhang, Kanat Tangwongsan, Srikanta Tirthapura 2020 Iowa State University

Fast Streaming K-Means Clustering With Coreset Caching, Yu Zhang, Kanat Tangwongsan, Srikanta Tirthapura

Electrical and Computer Engineering Publications

We present new algorithms for k-means clustering on a data stream with a focus on providing fast responses to clustering queries. Compared to the state-of-the-art, our algorithms provide substantial improvements in the query time for cluster centers while retaining the desirable properties of provably small approximation error and low space usage. Our proposed clustering algorithms systematically reuse the "coresets" (summaries of data) computed for recent queries in answering the current clustering query, a novel technique which we refer to as coreset caching. We also present an algorithm called OnlineCC that integrates the coreset caching idea with a simple sequential streaming ...


Ranking Comments: An Entropy-Based Method With Word Embedding Clustering, Yuyang Zhang 2020 The University of Western Ontario 81818app手机版下载

Ranking Comments: An Entropy-Based Method With Word Embedding Clustering, Yuyang Zhang

Electronic Thesis and Dissertation Repository

Automatically ranking comments by their relevance plays an important role in text mining and text summarization area. In this thesis, firstly, we introduce a new text digitalization method: the bag of word clusters model. Unlike the traditional bag of words model that treats each word as an independent item, we group semantic-related words as clusters using pre-trained word2vec word embeddings and represent each comment as a distribution of word clusters. This method can extract both semantic and statistical information from texts. Next, we propose an unsupervised ranking algorithm that identifies relevant comments by their distance to the “ideal” comment. The ...