智能论文笔记

Fake Hilsa Fish Detection Using Machine Vision

Mirajul Islam , Jannatul Ferdous Ani , Abdur Rahman , Zakia Zaman

分类：计算机视觉 | 人工智能

2022-01-08

希尔萨是孟加拉国的国家鱼。孟加拉国通过出口这条鱼赚了很多外币。不幸的是，最近几天，一些肆无忌惮的商人正在销售假的HILSA鱼类来获得利润。沙丁鱼和撒丁岛是市场上最销售的希尔萨。孟加拉国政府机构，即孟加拉国食品安全管理局表示，这些假希腊鱼类含有高水平的镉和铅，这对人类有害。在这项研究中，我们提出了一种可以容易地识别原始HILSA鱼和假HILSA鱼的方法。基于在线文学上的研究，我们是第一个识别原始HILSA鱼的研究。我们收集了超过16,000个原装和假冒Hilsa鱼的图像。要对这些图像进行分类，我们使用了几种基于深度学习的模型。然后，在它们之间比较了性能。在这些模型中，Densenet201实现了97.02％的最高精度。

translated by 谷歌翻译

Deep Learning Based Classification System For Recognizing Local Spinach

Mirajul Islam , Nushrat Jahan Ria , Jannatul Ferdous Ani , Abu Kaisar Mohammad Masum , Sheikh Abujar , Syed Akhter Hossain

分类：计算机视觉 | 机器学习

2022-01-06

深度学习模型通过从训练的数据集学习来提供图像处理的令人难以置信的结果。菠菜是一种含有维生素和营养素的叶蔬菜。在我们的研究中，已经使用了一种可以自动识别菠菜的深度学习方法，并且该方法具有总共五种菠菜的数据集，其中包含3785个图像。四种卷积神经网络（CNN）模型用于对我们的菠菜进行分类。这些模型为图像分类提供更准确的结果。在应用这些模型之前，存在一些预处理图像数据。为了预处理数据，需要发生一些方法。那些是RGB转换，过滤，调整大小和重新划分和分类。应用这些方法后，图像数据被预处理并准备好在分类器算法中使用。这些分类器的准确性在98.68％至99.79％之间。在这些模型中，VGG16实现了99.79％的最高精度。

translated by 谷歌翻译

BDSL 49: A Comprehensive Dataset of Bangla Sign Language

Ayman Hasib , Saqib Sizan Khan , Jannatul Ferdous Eva , Mst. Nipa Khatun , Ashraful Haque , Nishat Shahrin , Rashik Rahman , Hasan Murad , Md. Rajibul Islam , Molla Rashied Hussein

分类：计算机视觉

2022-08-14

语言是个人表达思想的方法。每种语言都有自己的字母和数字字符集。人们可以通过口头或书面交流相互交流。但是，每种语言都有同类语言。聋哑和/或静音的个人通过手语交流。孟加拉语还具有手语，称为BDSL。数据集是关于孟加拉手册图像的。该系列包含49个单独的孟加拉字母图像。 BDSL49是一个数据集，由29,490张具有49个标签的图像组成。在数据收集期间，已经记录了14个不同成年人的图像，每个人都有不同的背景和外观。在准备过程中，已经使用了几种策略来消除数据集中的噪声。该数据集可免费提供给研究人员。他们可以使用机器学习，计算机视觉和深度学习技术开发自动化系统。此外，该数据集使用了两个模型。第一个是用于检测，而第二个是用于识别。

translated by 谷歌翻译

Jamdani Motif Generation using Conditional GAN

MD Tanvir Rouf Shawon , Raihan Tanvir , Humaira Ferdous Shifa , Susmoy Kar , Mohammad Imrul Jubair

分类：计算机视觉

2022-12-22

Jamdani is the strikingly patterned textile heritage of Bangladesh. The exclusive geometric motifs woven on the fabric are the most attractive part of this craftsmanship having a remarkable influence on textile and fine art. In this paper, we have developed a technique based on the Generative Adversarial Network that can learn to generate entirely new Jamdani patterns from a collection of Jamdani motifs that we assembled, the newly formed motifs can mimic the appearance of the original designs. Users can input the skeleton of a desired pattern in terms of rough strokes and our system finalizes the input by generating the complete motif which follows the geometric structure of real Jamdani ones. To serve this purpose, we collected and preprocessed a dataset containing a large number of Jamdani motifs images from authentic sources via fieldwork and applied a state-of-the-art method called pix2pix to it. To the best of our knowledge, this dataset is currently the only available dataset of Jamdani motifs in digital format for computer vision research. Our experimental results of the pix2pix model on this dataset show satisfactory outputs of computer-generated images of Jamdani motifs and we believe that our work will open a new avenue for further research.

translated by 谷歌翻译

Stochastic Nonlinear Ensemble Modeling and Control for Robot Team Environmental Monitoring

Victoria Edwards , Thales C. Silva , M. Ani Hsieh

分类：机器人

2022-12-22

We seek methods to model, control, and analyze robot teams performing environmental monitoring tasks. During environmental monitoring, the goal is to have teams of robots collect various data throughout a fixed region for extended periods of time. Standard bottom-up task assignment methods do not scale as the number of robots and task locations increases and require computationally expensive replanning. Alternatively, top-down methods have been used to combat computational complexity, but most have been limited to the analysis of methods which focus on transition times between tasks. In this work, we study a class of nonlinear macroscopic models which we use to control a time-varying distribution of robots performing different tasks throughout an environment. Our proposed ensemble model and control maintains desired time-varying populations of robots by leveraging naturally occurring interactions between robots performing tasks. We validate our approach at multiple fidelity levels including experimental results, suggesting the effectiveness of our approach to perform environmental monitoring.

translated by 谷歌翻译

Proportional Control for Stochastic Regulation on Allocation of Multi-Robots

Thales C. Silva , Victoria Edwards , M. Ani Hsieh

分类：机器人

2022-12-19

Any strategy used to distribute a robot ensemble over a set of sequential tasks is subject to inaccuracy due to robot-level uncertainties and environmental influences on the robots' behavior. We approach the problem of inaccuracy during task allocation by modeling and controlling the overall ensemble behavior. Our model represents the allocation problem as a stochastic jump process and we regulate the mean and variance of such a process. The main contributions of this paper are: Establishing a structure for the transition rates of the equivalent stochastic jump process and formally showing that this approach leads to decoupled parameters that allow us to adjust the first- and second-order moments of the ensemble distribution over tasks, which gives the flexibility to decrease the variance in the desired final distribution. This allows us to directly shape the impact of uncertainties on the group allocation over tasks. We introduce a detailed procedure to design the gains to achieve the desired mean and show how the additional parameters impact the covariance matrix, which is directly associated with the degree of task allocation precision. Our simulation and experimental results illustrate the successful control of several robot ensembles during task allocation.

translated by 谷歌翻译

Receding Horizon Control on the Broadcast of Information in Stochastic Networks

Thales C. Silva , Li Shen , Xi Yu , M. Ani Hsieh

分类：机器人

2022-12-19

This paper focuses on the broadcast of information on robot networks with stochastic network interconnection topologies. Problematic communication networks are almost unavoidable in areas where we wish to deploy multi-robotic systems, usually due to a lack of environmental consistency, accessibility, and structure. We tackle this problem by modeling the broadcast of information in a multi-robot communication network as a stochastic process with random arrival times, which can be produced by irregular robot movements, wireless attenuation, and other environmental factors. Using this model, we provide and analyze a receding horizon control strategy to control the statistics of the information broadcast. The resulting strategy compels the robots to re-direct their communication resources to different neighbors according to the current propagation process to fulfill global broadcast requirements. Based on this method, we provide an approach to compute the expected time to broadcast the message to all nodes. Numerical examples are provided to illustrate the results.

translated by 谷歌翻译

Online Estimation of the Koopman Operator Using Fourier Features

Tahiya Salam , Alice Kate Li , M. Ani Hsieh

分类：机器人 | 机器学习

2022-12-03

Transfer operators offer linear representations and global, physically meaningful features of nonlinear dynamical systems. Discovering transfer operators, such as the Koopman operator, require careful crafted dictionaries of observables, acting on states of the dynamical system. This is ad hoc and requires the full dataset for evaluation. In this paper, we offer an optimization scheme to allow joint learning of the observables and Koopman operator with online data. Our results show we are able to reconstruct the evolution and represent the global features of complex dynamical systems.

translated by 谷歌翻译

BARTSmiles: Generative Masked Language Models for Molecular Representations

Gayane Chilingaryan , Hovhannes Tamoyan , Ani Tevosyan , Nelly Babayan , Lusine Khondkaryan , Karen Hambardzumyan , Zaven Navoyan , Hrant Khachatrian , Armen Aghajanyan

分类：机器学习

2022-11-29

We discover a robust self-supervised strategy tailored towards molecular representations for generative masked language models through a series of tailored, in-depth ablations. Using this pre-training strategy, we train BARTSmiles, a BART-like model with an order of magnitude more compute than previous self-supervised molecular representations. In-depth evaluations show that BARTSmiles consistently outperforms other self-supervised representations across classification, regression, and generation tasks setting a new state-of-the-art on 11 tasks. We then quantitatively show that when applied to the molecular domain, the BART objective learns representations that implicitly encode our downstream tasks of interest. For example, by selecting seven neurons from a frozen BARTSmiles, we can obtain a model having performance within two percentage points of the full fine-tuned model on task Clintox. Lastly, we show that standard attribution interpretability methods, when applied to BARTSmiles, highlight certain substructures that chemists use to explain specific properties of molecules. The code and the pretrained model are publicly available.

translated by 谷歌翻译

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding

Zilong Wang , Jiuxiang Gu , Chris Tensmeyer , Nikolaos Barmpalios , Ani Nenkova , Tong Sun , Jingbo Shang , Vlad I. Morariu

分类：计算机视觉

2022-11-27

Document images are a ubiquitous source of data where the text is organized in a complex hierarchical structure ranging from fine granularity (e.g., words), medium granularity (e.g., regions such as paragraphs or figures), to coarse granularity (e.g., the whole page). The spatial hierarchical relationships between content at different levels of granularity are crucial for document image understanding tasks. Existing methods learn features from either word-level or region-level but fail to consider both simultaneously. Word-level models are restricted by the fact that they originate from pure-text language models, which only encode the word-level context. In contrast, region-level models attempt to encode regions corresponding to paragraphs or text blocks into a single embedding, but they perform worse with additional word-level features. To deal with these issues, we propose MGDoc, a new multi-modal multi-granular pre-training framework that encodes page-level, region-level, and word-level information at the same time. MGDoc uses a unified text-visual encoder to obtain multi-modal features across different granularities, which makes it possible to project the multi-granular features into the same hyperspace. To model the region-word correlation, we design a cross-granular attention mechanism and specific pre-training tasks for our model to reinforce the model of learning the hierarchy between regions and words. Experiments demonstrate that our proposed model can learn better features that perform well across granularities and lead to improvements in downstream tasks.

translated by 谷歌翻译