Mirko Mazzoleni 2022-08-03T10:49:38+00:00 http://mirkomazzoleni.github.io An agent-based model to assess large-scale COVID-19 vaccination campaigns for the Italian territory: The case study of Lombardy region 2022-07-15T00:00:00+00:00 http://mirkomazzoleni.github.io/journal/2022/07/15/CMPB_Covasim <h3 id="abstract">Abstract</h3> <p><em>Background</em> In Italy, the administration of COVID-19 vaccines began in late 2020. In the early stages, the number of available doses was limited. To maximize the effectiveness of the vaccine campaign, the national health agency assigned priority access to at-risk individuals, such as health care workers and the elderly. Current vaccination campaign strategies do not take full advantage of the latest mathematical models, which capture many subtle nuances, allowing different territorial situations to be analyzed aiming to make context-specific decisions. <em>Objectives</em> The main objective is the definition of an agent-based model using open data and scientific literature to assess and optimize the impact of vaccine campaigns for an Italian region. Specifically, the aim is twofold: (i) estimate the reduction in the number of infections and deaths attributable to vaccines, and (ii) assess the performances of alternative vaccine allocation strategies. <em>Methods</em> The COVID-19 Agent-based simulator Covasim has been employed to build an agent-based model by considering the Lombardy region as case study. The model has been tailored by leveraging open data and knowledge from the scientific literature. Dynamic mobility restrictions and the presence of Variant of Concern have been explicitly represented. Free parameters have been calibrated using the grid search methodology. <em>Results</em> The model mimics the COVID-19 wave that hit Lombardy from September 2020 to April 2021. It suggests that 168,492 cumulative infections 2,990 cumulative deaths have been avoided due to the vaccination campaign in Lombardy from January 1 to April 30, 2021. Without vaccines, the number of deaths would have been 66% greater in the 80–89 age group and 114% greater for those over 90. The best vaccine allocation strategy depends on the goal. To minimize infections, the best policy is related to dose availability. If at least 1/3 of the population can be covered in 4 months, targeting at-risk individuals and the elderly first is recommended; otherwise, the youngest people should be vaccinated first. To minimize overall deaths, priority is best given to at-risk groups and the elderly in all scenarios. <em>Conclusions</em> This work proposes a methodological approach that leverages open data and scientific literature to build a model of COVID-19 capable of assessing and optimizing the impact of vaccine campaigns. This methodology can help national institutions to design regional mathematical models that can support pandemic-related decision-making processes. [<strong><a href="https://www.sciencedirect.com/science/article/pii/S0169260722004114">Paper</a></strong>, <strong>[Code]</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> A. Cattaneo, A. Vitali, M. Mazzoleni, F. Previdi, "An agent-based model to assess large-scale COVID-19 vaccination campaigns for the Italian territory: The case study of Lombardy region," in Computer Methods and Programs in Biomedicine, <a href="https://doi.org/10.1016/j.cmpb.2022.107029"> doi: 10.1016/j.cmpb.2022.107029 </a>, 2022. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{CATTANEO2022107029, title = {An agent-based model to assess large-scale COVID-19 vaccination campaigns for the Italian territory: The case study of Lombardy region}, journal = {Computer Methods and Programs in Biomedicine}, volume = {224}, pages = {107029}, year = {2022}, issn = {0169-2607}, doi = {10.1016/j.cmpb.2022.107029}, author = {Andrea Cattaneo and Andrea Vitali and Mirko Mazzoleni and Fabio Previdi}, } </code></pre></div></div> Experimental fault detection of input gripping pliers in bottling plants 2022-06-11T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2022/06/11/IFAC_SAFEPROCESS_Pliers <h3 id="abstract">Abstract</h3> <p>This paper presents a signal-based fault detection scheme for input gripping pliers of the blow molding machine in plastic bottling plants, using accelerometers data. The focus of the diagnosis is on the bearings that support the pliers movements on their mechanical cam. Therationale of the algorithm lies in interpreting the pliers\x92 bearings as the balls in a traditional rolling bearing. Then, strategies inspired by bearing diagnosis are employed and adapted to the specific case of this work. The developed algorithm is validated with experimental tests, following a fault injection step, directly on the real blow molding machine. [<strong><a href="https://www.sciencedirect.com/science/article/pii/S2405896322006061">Paper</a></strong>, <strong>[Code]</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> N. Valceschini, M. Mazzoleni, L. Pitturelli, F. Previdi, "Experimental fault detection of input gripping pliers in bottling plants," in 11th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes (SAFEPROCESS) 2022, vol. 55, n.6, pp. 778-783, ISSN: 2405-8963 <a href="https://doi.org/10.1016/j.ifacol.2022.07.221"> doi: 10.1016/j.ifacol.2022.07.221 </a>, 2022. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{VALCESCHINI2022778, title = {Experimental fault detection of input gripping pliers in bottling plants}, journal = {IFAC-PapersOnLine}, volume = {55}, number = {6}, pages = {778-783}, year = {2022}, note = {11th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes SAFEPROCESS 2022}, issn = {2405-8963}, doi = {10.1016/j.ifacol.2022.07.221}, author = {N. Valceschini and M. Mazzoleni and L. Pitturelli and F. Previdi} } </code></pre></div></div> Model-based fault diagnosis of sliding gates electro-mechanical actuators transmission components with motor-side measurements 2022-06-11T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2022/06/11/IFAC_SAFEPROCESS_EMA_motor_meas <h3 id="abstract">Abstract</h3> <p>This paper presents a model-based fault detection and isolation scheme for the transmission components of Electro-Mechanical Actuators, applied to the actuation of sliding gates. The most important failures are investigated by a Failure Mode, Effects, and Criticality Analysis procedure. Following Failure Mode, Effects, and Criticality Analysis, the components selected for the development of the diagnostic algorithm are the nylon gear and pinion of the Electro-Mechanical Actuator, and the rack of the gate. The proposed diagnostic algorithm is able to isolate two out of the three types of faults. The overall procedure is validated by experimental results. [<strong><a href="https://www.sciencedirect.com/science/article/pii/S2405896322006073">Paper</a></strong>, <strong>[Code]</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> N. Valceschini, M. Mazzoleni,F. Previdi, ""Model-based fault diagnosis of sliding gates electro-mechanical actuators transmission components with motor-side measurements," in 11th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes (SAFEPROCESS) 2022, vol. 55, n.6, pp. 784-789, ISSN: 2405-8963 <a href="https://doi.org/10.1016/j.ifacol.2022.07.222"> doi: 10.1016/j.ifacol.2022.07.222 </a>, 2022. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{VALCESCHINI2022784, title = {Model-based fault diagnosis of sliding gates electro-mechanical actuators transmission components with motor-side measurements}, journal = {IFAC-PapersOnLine}, volume = {55}, number = {6}, pages = {784-789}, year = {2022}, note = {11th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes SAFEPROCESS 2022}, issn = {2405-8963}, doi = {10.1016/j.ifacol.2022.07.222}, author = {N. Valceschini and M. Mazzoleni and F. Previdi}, </code></pre></div></div> Kernel-based system identification with manifold regularization: A Bayesian perspective 2022-04-30T00:00:00+00:00 http://mirkomazzoleni.github.io/journal/2022/04/30/AUTOMATICA_Bayesian_manifold <h3 id="abstract">Abstract</h3> <p>This paper presents a nonparametric Bayesian interpretation of kernel-based function learning with manifold regularization. We show that manifold regularization corresponds to an additional likelihood term derived from noisy observations of the function gradient along the regressors graph. The hyperparameters of the method are estimated by a suitable empirical Bayes approach. The effectiveness of the method in the context of dynamical system identification is evaluated on a simulated linear system and on an experimental switching system setup. [<strong><a href="https://www.sciencedirect.com/science/article/pii/S0005109822002722">Paper</a></strong>, <strong>[Code]</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, A. Chiuso, M. Scandella, S. Formentin, F. Previdi, "Kernel-based system identification with manifold regularization: A Bayesian perspective," in Automatica, <a href="https://doi.org/10.1016/j.automatica.2022.110419"> doi: 10.1016/j.automatica.2022.110419 </a>, 2022. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI2022110419, title = {Kernel-based system identification with manifold regularization: A Bayesian perspective}, journal = {Automatica}, volume = {142}, pages = {110419}, year = {2022}, issn = {0005-1098}, doi = {10.1016/j.automatica.2022.110419}, author = {Mirko Mazzoleni and Alessandro Chiuso and Matteo Scandella and Simone Formentin and Fabio Previdi}, } </code></pre></div></div> A kernel-based control approach for multi-period assets allocation based on lower partial moments 2022-01-24T00:00:00+00:00 http://mirkomazzoleni.github.io/journal/2022/01/24/EAAI_Kernel_portfolio <h3 id="abstract">Abstract</h3> <p>In quantitative finance, multi-period portfolio optimization can be reformulated as a stochastic optimal control problem, and standard feedback tools can be employed for its analysis. The performance of the trading solutions strongly depend on the quality of the model of the returns. Therefore, data-driven solutions have been recently proposed to optimize simple-linear allocation policies, based only on a set of possible market scenarios. In this work, kernel-based methods are proposed to design more complex and effective control actions, providing better trade-offs in terms of risk and investment performance with respect to linear ones, by preserving convexity. The proposed approach relies on the minimization of the Lower Partial Moments (LPM) risk measure. The effectiveness of the method is shown on a set of real historical financial data. [<strong><a href="https://www.sciencedirect.com/science/article/pii/S0952197621004589">Paper</a></strong>, <strong>[Code]</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, G. Maroni, S. Formentin, F. Previdi, "A kernel-based control approach for multi-period assets allocation based on lower partial moments," in Engineering Applications of Artificial Intelligence, <a href="https://doi.org/10.1016/j.engappai.2021.104659"> doi: 10.1016/j.engappai.2021.104659 </a>, 2022. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI2022104659, title = {A kernel-based control approach for multi-period assets allocation based on lower partial moments}, journal = {Engineering Applications of Artificial Intelligence}, volume = {110}, pages = {104659}, year = {2022}, issn = {0952-1976}, doi = {10.1016/j.engappai.2021.104659}, } </code></pre></div></div> Visualizing Classification Results: Confusion Star and Confusion Gear 2022-01-16T00:00:00+00:00 http://mirkomazzoleni.github.io/blog/2022/01/16/confusion_gears <p><strong>Authors:</strong> Amalia <strong>Luque</strong>$$^1$$, Mirko <strong>Mazzoleni</strong>$$^2$$, Alejandro <strong>Carrasco</strong>$$^3$$ and Antonio <strong>Ferramosca</strong>$$^2$$<br /> $$^1$$ <em>Departamento de Ingeniería del Diseño, Escuela Politécnica Superior, Universidad de Sevilla, 41011 Seville, Spain</em> <br /> $$^2$$ <em>Department of Management, Information and Production Engineering, University of Bergamo, 24044 Dalmine, Italy</em><br /> $$^3$$ <em>Departamento de Tecnología Electrónica, School of Computer Engineering, Universidad de Sevilla, 41012 Seville, Spain</em></p> <p>In this post, we present two new visualization tools, called <strong>confusion star</strong> and <strong>confusion gear</strong>, for the <strong>representation of classification performance of multiclass classifiers</strong>. More details can be found in the paper [1]:</p> <ul> <li> <p><a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;arnumber=9658486">Open-access paper on IEEE Access</a></p> </li> <li> <p><a href="https://github.com/amalialuque/confusionstar">Python code Github repository</a></p> </li> </ul> <p>You can cite the paper as:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@ARTICLE{9658486, author={Luque, Amalia and Mazzoleni, Mirko and Carrasco, Alejandro and Ferramosca, Antonio}, journal={IEEE Access}, title={Visualizing Classification Results: Confusion Star and Confusion Gear}, year={2022}, volume={10}, number={}, pages={1659-1677}, doi={10.1109/ACCESS.2021.3137630} } </code></pre></div></div> <h2 id="matrices-for-representing-classification-results">Matrices for representing classification results</h2> <p>We start our discussion by presenting three matrices:</p> <ul> <li> <p>the <strong>confusion matrix</strong>, used to represent the number of correctly and incorrectly classified instances;</p> </li> <li> <p>the <strong>hits matrix</strong><em>,</em> used to represent the number of correctly classified instances;</p> </li> <li> <p>the <strong>errors matrix</strong><em>,</em> used to represent the number of incorrectly classified instances.</p> </li> </ul> <h3 id="confusion-matrix">Confusion matrix</h3> <p>Classification results from machine learning algorithms are commonly summarized in the form of a <strong>confusion matrix</strong>, especially for representing a dichotomous ($$2$$-classes) classification tasks. As the name suggests, a confusion matrix is a table representing the <strong>number of correctly and incorrectly classified instances</strong>, see Table 1.</p> <p><img src="/images/2022-01-16-confusion_gears/confusion_matrix.jpg" alt="" /><!-- --> <em>Table 1: Binary confusion matrix for a $$2$$-classes classification task</em></p> <p>In the two-classes case, this matrix representation is visually effective: it is immediately clear if the classifier is performing well, as <strong>the number of false positives and false negatives should be as low as possible.</strong></p> <p>However, this representation suffers when <strong>more than two classes</strong> are present. A simple way to visualize the results of a $$C$$-classes classification problem (when $$C&gt;2$$) would be to employ a <strong>one-vs-all strategy:</strong></p> <ul> <li> <p>select a class to be the “Positive” one. All the other classes will constitute the “Negative” class;</p> </li> <li> <p>build a binary confusion matrix representing the Positive vs. Negative classes. Repeat this process until all classes played the role of the “Positive” one;</p> </li> <li> <p>visualize the $$C$$ binary confusion matrices.</p> </li> </ul> <p>While easy to perform, this one-vs-all representation is far less effective.</p> <p>As an example, consider the famous MNIST (Modified National Institute of Standards and Technology) [2], that contains 70.000 images, each of them representing an handwritten digit (0 to 9). We trained a shallow MLP (Multi-Layer Perceptron) neural network on the first 60.000 images, and evaluated its results on the remaining 10.000 images. In the MNIST case, $$C=10$$ figures have to be depicted and the overall algorithm performance is difficult to evaluate. Figure 1 shows the $$10$$ confusion matrices, with cell elements expressed as percentages normalized by column (the actual class).</p> <p><img src="/images/2022-01-16-confusion_gears/onevsall_cm.png" alt="" title="Unit binary confusion matrices for MNIST test data" /> <em>Figure 1: Unit binary confusion matrices for MNIST test data.</em></p> <p>In order to understand the performance of the classifier upon $$10$$ classes, we have to look at each confusion matrix singularly. An <strong>overall interpretation is possible but rather difficult.</strong></p> <p>Instead of visualizing multiple classification matrices, we propose two new visualizations for better assessing the classification performance of a multiclass classifier:</p> <ul> <li> <p><strong>Confusion star:</strong> focused on representing <strong>classification errors</strong> (the lower the better) for each class;</p> </li> <li> <p><strong>Confusion gears:</strong> focused on representing <strong>classification hits</strong> (the higher the better) for each class.</p> </li> </ul> <p>First, we need to introduce what <strong>errors matrix</strong> and <strong>hits matrix</strong> are.</p> <h3 id="errors-matrix">Errors matrix</h3> <p>Suppose to have at disposal the confusion matrix of a $$3$$-class classification task as</p> $\mathbf{CM} \equiv \begin{bmatrix} m_{11} &amp; m_{12} &amp; m_{13} \\ m_{21} &amp; m_{22} &amp; m_{23} \\ m_{31} &amp; m_{23} &amp; m_{33} \\ \end{bmatrix}$ <p>where $$m_{ij}$$ is the number of instances of the $$j$$-th class classified to the $$i$$-th class. Ideally, <em>for a perfect classification,</em> we want $$m_{ij}=0$$ for $$i\neq j$$.</p> <p>The <strong>total number of instances classified</strong> <em>to the</em> $$i$$<em>-th class</em> is the sum of the $$i$$-th row of the confusion matrix, that is</p> $m_i = \sum_{j=1}^{C} m_{ij}$ <p>The <strong>errors matrix</strong> is defined by the $$i,j$$ elements</p> $e_{ij} \equiv \begin{cases} m_i - m_{ii} &amp; i=j \\ m_{ij} &amp; i\neq j \end{cases}$ <p>so that, in our example, we get</p> $\mathbf{EM} \equiv \begin{bmatrix} e_{11} &amp; e_{12} &amp; e_{13} \\ e_{21} &amp; e_{22} &amp; e_{23} \\ e_{31} &amp; e_{32} &amp; e_{33} \\ \end{bmatrix} = \begin{bmatrix} m_{1}-m_{11} &amp; m_{12} &amp; m_{13} \\ m_{21} &amp; m_2-m_{22} &amp; m_{23} \\ m_{31} &amp; m_{32} &amp; m_3-m_{33} \\ \end{bmatrix}$ <p>Here, a <em>perfect classification</em> means $$e_{ij}=0$$ for all $$i,j$$, i.e. $$m_{ij}=0$$ (no misclassifications), so that</p> $\mathbf{EM} = \begin{bmatrix} 0 &amp; 0 &amp; 0 \\ 0 &amp; 0 &amp; 0 \\ 0 &amp; 0 &amp; 0 \\ \end{bmatrix}$ <p>The $$i$$-th row $$\mathbf{EM}_i$$ of this matrix can also be formulated in terms of the ratio over the total number of instances belonging to the $$i$$-th class</p> $\mathbf{EM}_i \equiv \begin{bmatrix} \epsilon_{i1}\cdot m_i &amp; \epsilon_{i2}\cdot m_i &amp; \epsilon_{i3}\cdot m_i \end{bmatrix}$ <p>where $$\epsilon_{ij}=e_{ij}/m_i$$. The matrix $$\mathbf{E}=\{\psi_{ij}\}$$ is called the <strong>unit errors matrix</strong><em>,</em> and it is a normalized version (entries with values 0-1) of the original hit matrix. The unit errors matrix allows to reason in 0-100% percentages or errors.</p> <h3 id="hits-matrix">Hits matrix</h3> <p>The <strong>hits matrix</strong> is defined by the $$i,j$$ elements</p> $w_{ij} \equiv \begin{cases} m_{ii} &amp; i=j \\ m_{i} - m_{ij} &amp; i\neq j \end{cases}$ <p>so that, in our example, we get</p> $\mathbf{HM} \equiv \begin{bmatrix} w_{11} &amp; w_{12} &amp; w_{13} \\ w_{21} &amp; w_{22} &amp; w_{23} \\ w_{31} &amp; w_{32} &amp; w_{33} \\ \end{bmatrix} = \begin{bmatrix} m_{11} &amp; m_{1} - m_{12} &amp; m_{1} - m_{13} \\ m_{2} - m_{21} &amp; m_{22} &amp; m_{2} - m_{23} \\ m_{3} - m_{31} &amp; m_{3} - m_{32} &amp; m_{33} \\ \end{bmatrix}$ <p>Here, a <em>perfect classification</em> means $$w_{ij}=m_{i}$$ for all $$j$$, i.e. $$m_{ij}=0$$ (no misclassifications), so that</p> $\mathbf{HM} = \begin{bmatrix} m_{1} &amp; m_1 &amp; m_1 \\ m_2 &amp; m_2 &amp; m_2 \\ m_{3} &amp; m_{3} &amp; w_{3} \\ \end{bmatrix}$ <p>The $$i$$-th row $$\mathbf{HM}_i$$ of this matrix can also be formulated in terms of the ratio over the total number of instances belonging to the $$i$$-th class</p> $\mathbf{HM}_i \equiv \begin{bmatrix} \psi_{i1}\cdot m_i &amp; \psi_{i2}\cdot m_i &amp; \psi_{i3}\cdot m_i \end{bmatrix}$ <p>where $$\psi_{ij}=w_{ij}/m_i$$. The matrix $$\boldsymbol{\Psi}=\{\psi_{ij}\}$$ is called the <strong>unit hits matrix</strong><em>,</em> and it is a normalized version (entries with values 0-1) of the original hit matrix. The unit hits matrix allows to reason in 0-100% percentages of correct classifications.</p> <h2 id="confusion-star-plot">Confusion star plot</h2> <p>The confusion star plot depicts the <strong>non-diagonal elements of the errors matrix</strong> $$\mathbf{EM}$$ or $$\mathbf{E}$$ in a circle. So, we have a visual inspection of the <strong>classification errors from one class relative to all other ones</strong>. The circle is divided in $$C$$ arcs (one for each class), and each arc is again divided in $$C-1$$ sectors (all but the sector corresponding to the relative arc’s class).</p> <h4 id="confusion-star-for-final-classification-performance">Confusion star for final classification performance</h4> <p>Figure 2 shows the confusion star based on the unit errors matrix of the model trained on the MNIST dataset. We have $$C=10$$ arcs, named from 0 to 9. Then, each arc has $$C-1=9$$ sectors, each one numbered from 0 to 9 but without the corresponding class of that arc.</p> <p>Consider for instance <strong>class 0</strong> and <strong>class 2</strong>. Here, it is immediate to observe how class 0 instances are mostly confused with class 5 ones, and that class 2 struggles with class 1 and class 7. We remark that a <strong>perfect classification will results in an empty confusion star</strong><em>.</em></p> <p style="color:gray; font-size: 100%; text-align: center;"><img src="/images/2022-01-16-confusion_gears/balanced_cf.png" style="width: 400px;" class="center_img" /> <em>Figure 2: Confustion star.</em></p> <p>When small errors hinder the visualization, it is possible to employ the <strong>logarithm of the error matrix</strong>, see Figure 3. Here, the center of the circle does not correspond to a null error but to an arbitrarily chosen small value (0.01 in the graphic).</p> <p style="color:gray; font-size: 100%; text-align: center;"><img src="/images/2022-01-16-confusion_gears/balanced_cf_log.png" style="width: 400px;" class="center_img" /> <em>Figure 3: Logarithmic confusion star.</em></p> <h4 id="confusion-star-for-understanding-the-learning-process">Confusion star for understanding the learning process</h4> <p>Comparing confusion stars at <strong>different number of training data</strong> can be useful to understand the learning process in addition to the <strong>learning curve</strong>, see Figure 4.</p> <p style="color:gray; font-size: 100%; text-align: center;"><img src="/images/2022-01-16-confusion_gears/learning_curve.png" style="width: 400px;" class="center_img" /> <em>Figure 4: Learning curve of the MNIST dataset using a neural network with a single 128-neurons hidden layer.</em></p> <p>Consider the significant increasing in the accuracy occurring around 500 training instances. While the <strong>learning curve does not detail what this improvement is due to or how it is distributed</strong> in each of the classes, an analysis of the confusion stars can shed more light on the question.</p> <p>In Figure 5 the confusion stars corresponding to a point with 502 samples (before the jump, accuracy of 38%) and another point with 610 samples (after the jump, accuracy of 67%) are shown. Quite important improvements (smaller errors) can be observed in, for example, the 0 classified as a 2, the 2 classified as a 1, and so on. In other words, <strong>the representation of the confusion matrix not only informs us of the overall improvement of the classifier, but also of how this improvement is distributed.</strong></p> <p style="color:gray; font-size: 100%; text-align: center;"><img src="/images/2022-01-16-confusion_gears/learning_cf.png" class="center_img" /> <em>Figure 5: Confusion stars (in logarithmic scale) corresponding to a pair of points before and after the first jump in the learning curve: 502 instances (left; accuracy of 38.13%) and 610 instances (right; accuracy of 67.23%).</em></p> <h2 id="confusion-gear-plot">Confusion gear plot</h2> <p>The confusion gear plot depicts the <strong>non-diagonal elements of the hits matrix</strong> $$\mathbf{HM}$$ or $$\boldsymbol{\Psi}$$ in a circle. So, we have a visual inspection of the <strong>classification hits from one class relative to all other ones</strong>. Like the confusion star, the circle is divided in $$C$$ arcs (one for each class), and each arc is again divided in $$C-1$$ sectors (all but the sector corresponding to the relative arc’s class).</p> <p>Figure 6 depicts the confusion gear for the MNIST example. Consider again <strong>class 2</strong> as reference. Here, we se lower values corresponding to class 1 and class 7, according to the higher errors found in Figure 2 or Figure 3. We remark that a <strong>perfect classification will results in a full confusion gear</strong><em>.</em></p> <p>Since the hits are usually not so small, there is no need to consider logarithm transformations as in confusion stars.</p> <p style="color:gray; font-size: 100%; text-align: center;"><img src="/images/2022-01-16-confusion_gears/confusion_gear.png" style="width: 400px;" class="center_img" /> <em>Figure 6: Confusion gear.</em></p> <h2 id="conclusions">Conclusions</h2> <p>A new way of representing the information conveyed by confusion matrices is proposed in the form of a <strong>confusion star</strong> (focusing on the errors) or a <strong>confusion gear</strong> (centered on the hits). The new tools successfully represents <strong>multiclass classification results in the form of a radial plot.</strong></p> <p>The traditional way to represent confusion matrix uses colors (and eventually texts) to indicate the number of instances belonging to an actual class that are classified to an estimated class. Instead, confusion stars and gears <strong>use shapes to convey that information</strong>. Changing colors by shapes significantly improves the readability of the proposed graphics.</p> <p>An additional property of the confusion stars and gears is that <strong>the enclosed area provides information about the overall classification performance</strong>. The relation of these areas to standard classification metrics has also been derived.</p> <p>Finally, the new graphic tools can usefully be employed to <strong>visualize the performance of a sequence of classifiers.</strong></p> <h2 id="references">References</h2> <p>[1] A. Luque, M. Mazzoleni, A. Carrasco and A. Ferramosca, “Visualizing Classification Results: Confusion Star and Confusion Gear,” in <em>IEEE Access</em>, vol. 10, pp. 1659-1677, 2022, doi: 10.1109/ACCESS.2021.3137630.</p> <p>[2] L. Deng, “The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web],” in <em>IEEE Signal Processing Magazine</em>, vol. 29, no. 6, pp. 141-142, Nov. 2012, doi: 10.1109/MSP.2012.2211477.</p> Visualizing Classification Results: Confusion Star and Confusion Gear 2021-12-22T00:00:00+00:00 http://mirkomazzoleni.github.io/journal/2021/12/22/IEEE_ACCESS_confusion_gears <h3 id="abstract">Abstract</h3> <p>Recent developments in machine learning applications are deeply concerned with the poor interpretability of most of these techniques. To gain some insights in the process of designing data-based models it is common to graphically represent the algorithm’s results, either in their final or intermediate stage. Specially challenging is the task of plotting multiclass classification results as they involve categorical variables (classes) rather than numeric results. Using the well-known MNIST dataset and a simple neural network as an example, this paper reviews the existing techniques to visualize classification results, from those centered on a particular instance or set of instances, to those representing an overall performance metric. As classification results are commonly summarized in the form of a confusion matrix, special attention is paid to its graphical representation. From this analysis, a new visualization tool is derived, which is presented in two forms: confusion star and confusion gear. The confusion star is centered on the classification errors, while the confusion gear focuses on the classification hits. The proposed visualization tools are also evaluated when facing: (i) balanced and imbalanced classifiers issues; (ii) the problem of representing errors with different orders of magnitude. By using shapes instead of colors to represent the value of each matrix cell, the new tools significantly improve the readability of the confusion matrices. Furthermore, we show how the area enclosed by the confusion stars and gears are directly related to standard classification metrics. The new graphic tools can be also usefully employed to visualize the performances of a sequence of classifiers. [<strong><a href="https://ieeexplore.ieee.org/document/9658486">Paper</a></strong>, <strong><a href="https://github.com/amalialuque/confusionstar">Code</a></strong>]</p> <h4 id="reference">Reference</h4> <blockquote> A. Luque, M. Mazzoleni, A. Carrasco and A. Ferramosca, "Visualizing Classification Results: Confusion Star and Confusion Gear," in IEEE Access, <a href="https://doi.org/10.1109/ACCESS.2021.3137630"> doi: 10.1109/ACCESS.2021.3137630 </a>, 2021. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@ARTICLE{9658486, author={Luque, Amalia and Mazzoleni, Mirko and Carrasco, Alejandro and Ferramosca, Antonio}, journal={IEEE Access}, title={Visualizing Classification Results: Confusion Star and Confusion Gear}, year={2021}, volume={}, number={}, pages={1-1}, doi={10.1109/ACCESS.2021.3137630} } </code></pre></div></div> Nonparametric continuous-time identification of linear systems: theory, implementation and experimental results 2021-10-27T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2021/10/27/IFAC_MECC_kernel_cont_application <h3 id="abstract">Abstract</h3> <p>This paper presents an algorithm for continuous-time identification of linear dynamical systems using kernel methods. When the system is asymptotically stable, also the identified model is guaranteed to share such a property. The approach embeds the selection of the model complexity through optimization of the marginal likelihood of the data thanks to its Bayesian interpretation. The output of the algorithm is the continuous-time transfer function of the estimated model. In this work, we show the algorithmic and computational details of the approach, and test it on real experimental data from an Electro Hydro-Static Actuator (EHSA). [<strong><a href="https://www.sciencedirect.com/science/article/pii/S2405896321022977">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, M. Scandella, S. Formentin, F. Previdi, "Nonparametric continuous-time identification of linear systems: theory, implementation and experimental results", IFAC-PapersOnLine, Volume 54, Issue 20, 2021, pp. 699-704, ISSN 2405-8963, <a href="https://doi.org/10.1016/j.ifacol.2021.11.253 "> doi: 10.1016/j.ifacol.2021.11.253 </a>. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI2021699, title = {Nonparametric continuous-time identification of linear systems: theory, implementation and experimental results}, journal = {IFAC-PapersOnLine}, volume = {54}, number = {20}, pages = {699-704}, year = {2021}, note = {Modeling, Estimation and Control Conference MECC 2021}, issn = {2405-8963}, doi = {https://doi.org/10.1016/j.ifacol.2021.11.253}, author = {M. Mazzoleni and M. Scandella and S. Formentin and F. Previdi}, } </code></pre></div></div> A comparison of envelope and statistical analyses for bearing diagnosis in hot steel rolling mill lines 2021-10-16T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2021/10/16/IEEE_IECON_Rolling_mill <h3 id="abstract">Abstract</h3> <p>Steel-working industries are characterized by high temperatures and pressures, elevated production speeds, and intense throughput, so that their sudden interruption leads to great money losses. Undoubtedly, they would extremely benefit from Industry 4.0 advancements in predicting anomalies and breakdowns. However, in these industries, the adoption of predictive maintenance methodologies based on the analysis of historical data is a challenging task. Indeed, to avoid costly and dangerous breakdowns, plant managers prefer to apply an early substitution of machine components long before the end of their useful life, making data on fault events, as well as trends on parts degradation, rarely available. This paper reports the outcome of an industrial research project on data-driven fault diagnosis in a steel making production process. The study aims to identify early stage degradations in rotating machines components in hot rolling mill lines. We compare two methodologies: a well-known frequency-domain analysis of vibrations data is correlated with an ad-hoc designed statistical analysis. The comparison has been conducted on experimental data collected in a steel making plant placed in the South of Italy. [<strong><a href="https://ieeexplore.ieee.org/abstract/document/9589440">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> K. Sarda, A. Acernese, L. Russo and M. Mazzoleni, "A comparison of envelope and statistical analyses for bearing diagnosis in hot steel rolling mill lines," IECON 2021 – 47th Annual Conference of the IEEE Industrial Electronics Society, 2021, pp. 1-6, <a href="https://doi.org/10.1109/IECON48115.2021.9589440"> doi: 10.1109/IECON48115.2021.9589440 </a>. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@INPROCEEDINGS{9589440, author={Sarda, Kisan and Acernese, Antonio and Russo, Luigi and Mazzoleni, Mirko}, booktitle={IECON 2021 – 47th Annual Conference of the IEEE Industrial Electronics Society}, title={A comparison of envelope and statistical analyses for bearing diagnosis in hot steel rolling mill lines}, year={2021}, volume={}, number={}, pages={1-6}, doi={10.1109/IECON48115.2021.9589440}} </code></pre></div></div> A SIAT3HE model of the COVID-19 pandemic in Bergamo, Italy 2021-09-22T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2021/09/22/IFAC_BMS_SIATHE <h3 id="abstract">Abstract</h3> <p>The aim of this article is to give a better understanding of the dynamics of the SARS-CoV-2 pandemic in the Bergamo province (Italy), one of the most hit areas of the world, between February and April 2020. A new compartmental model, called SIAT3HE, was designed and fitted on accurate data about the pandemic provided by ATS Bergamo, the health protection agency of the Bergamo province. Our results show that SARS-CoV-2 reached Bergamo in January and infected 318,000 people, the 28.8% of the province population. The 43.1% of the infected individuals stayed asymptomatic. As 6,028 people died due to COVID-19 till April 30th, the infection fatality ratio of SARS-CoV-2 in the Bergamo province was 1.9%. These results are in very good agreement with available information: the number of infections is consistent with the results of recent serological surveys and the number of deaths due to COVID-19 is close to the excess mortality of the considered period. [<strong><a href="https://www.sciencedirect.com/science/article/pii/S2405896321016694">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Polver, F. Previdi, M. Mazzoleni, A. Zucchi, "A SIAT3HE model of the COVID-19 pandemic in Bergamo, Italy", IFAC-PapersOnLine, Volume 54, Issue 15, 2021, pp. 263-268, ISSN 2405-8963, <a href="https://doi.org/10.1016/j.ifacol.2021.10.266"> doi: 10.1016/j.ifacol.2021.10.266 </a> </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{POLVER2021263, title = {A SIAT3HE model of the COVID-19 pandemic in Bergamo, Italy}, journal = {IFAC-PapersOnLine}, volume = {54}, number = {15}, pages = {263-268}, year = {2021}, note = {11th IFAC Symposium on Biological and Medical Systems BMS 2021}, issn = {2405-8963}, doi = {https://doi.org/10.1016/j.ifacol.2021.10.266}, author = {Marco Polver and Fabio Previdi and Mirko Mazzoleni and Alberto Zucchi}, } </code></pre></div></div> Inertial load classification of low-cost electro-mechanical systems under dataset shift with fast end of line testing 2021-08-31T00:00:00+00:00 http://mirkomazzoleni.github.io/journal/2021/08/31/EAAI_dataset_shift <h3 id="abstract">Abstract</h3> <p>This paper presents a rationale for designing a machine learning algorithm under dataset shift. In particular, we focus on the classification of the inertial load of low-cost Electro-Mechanical Actuators (EMAs) into several weight categories. In these low-cost settings, due to uncertainties in the manufacturing process, raw materials and usage, even if the EMA part number is the same, its serial numbers (i.e. items or exemplars) may show different physical behaviors. Thus, a learning model trained on data from a set of items can perform poorly when applied to other ones. The proposed solution comprises tailored normalization and cross validation procedures for training the classifier, along with suitable End Of Line (EOL) experiments for the characterization of a new produced EMA item. The approach is experimentally validated on the classification of the mass of sliding gates, using only measurements available on the gate EMA. [<strong><a href="https://www.sciencedirect.com/science/article/pii/S0952197621002943?dgcid=coauthor">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> N. Valceschini, M. Mazzoleni and F. Previdi, "Inertial load classification of low-cost electro-mechanical systems under dataset shift with fast end of line testing," in <strong>Engineering Applications of Artificial Intelligence</strong>, Elsevier, <a href="https://doi.org/10.1016/j.engappai.2021.104446"> doi: 10.1016/j.engappai.2021.104446</a>, ISSN: 0952-1976, vol. 105, 2021. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{VALCESCHINI2021104446, title = {Inertial load classification of low-cost electro-mechanical systems under dataset shift with fast end of line testing}, journal = {Engineering Applications of Artificial Intelligence}, volume = {105}, pages = {104446}, year = {2021}, issn = {0952-1976}, author = {Nicholas Valceschini and Mirko Mazzoleni and Fabio Previdi}, doi = {https://doi.org/10.1016/j.engappai.2021.104446}, url = {https://www.sciencedirect.com/science/article/pii/S0952197621002943}, } </code></pre></div></div> Piecewise nonlinear regression with data augmentation 2021-07-13T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2021/07/13/IFAC_SYSID_pwnl <h3 id="abstract">Abstract</h3> <p>Piecewise regression represents a powerful tool to derive accurate yet modular models describing complex phenomena or physical systems. This paper presents an approach for learning PieceWise NonLinear (PWNL) functions in both a supervised and semi-supervised setting. We further equip the proposed technique with a method for the automatic generation of additional unsupervised data, which are leveraged to improve the overall accuracy of the estimate. The performance of the proposed approach is preliminarily assessed on two simple simulation examples, where we show the benefits of using nonlinear local models and artificially generated unsupervised data. [<strong><a href="https://www.sciencedirect.com/science/article/pii/S2405896321011708?via%3Dihub">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, V. Breschi, S. Formentin, "Piecewise nonlinear regression with data augmentation", in <strong>19th IFAC Symposium on System Identification</strong> IFAC-PapersOnLine, vol. 54, Issue 7, 2021, pp. 421-426, ISSN 2405-8963, <a href="https://doi.org/10.1016/j.ifacol.2021.08.396"> doi: 10.1016/j.ifacol.2021.08.396.</a> </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI2021421, title = {Piecewise nonlinear regression with data augmentation}, journal = {IFAC-PapersOnLine}, volume = {54}, number = {7}, pages = {421-426}, year = {2021}, note = {19th IFAC Symposium on System Identification SYSID 2021}, issn = {2405-8963}, doi = {https://doi.org/10.1016/j.ifacol.2021.08.396} } </code></pre></div></div> Modeling and simulation of bimetallic strips in industrial circuit breakers 2021-07-13T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2021/07/13/IFAC_SYSID_bimetalli <h3 id="abstract">Abstract</h3> <p>This paper presents a dynamical model for the dynamics of the bimetallic strip in industrial circuit breakers. The strip acts as thermo-mechanical actuator that opens the circuit breaker in case of overloads. The overall model can be decomposed in two submodels: an electrothermal and a thermo-mechanical one. The first submodel is derived as a gray-box, while the second one as a black-box. Given the overall estimated model, the final aim is to determine appropriate calibration actions on the device prior to its delivery. The developed model is tested on experimental data of real industrial circuit-breakers. [<strong><a href="https://www.sciencedirect.com/science/article/pii/S2405896321012350">Paper</a></strong>. <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, L. Maurelli, F. Previdi, "Modeling and simulation of bimetallic strips in industrial circuit breakers", in <strong>19th IFAC Symposium on System Identification</strong> IFAC-PapersOnLine, vol. 54, Issue 7, 2021, pp. 803-808, ISSN 2405-8963, <a href="https://doi.org/10.1016/j.ifacol.2021.08.460"> doi: 10.1016/j.ifacol.2021.08.460</a>. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAURELLI2021803, title = {Modeling and simulation of bimetallic strips in industrial circuit breakers}, journal = {IFAC-PapersOnLine}, volume = {54}, number = {7}, pages = {803-808}, year = {2021}, note = {19th IFAC Symposium on System Identification SYSID 2021}, issn = {2405-8963}, doi = {https://doi.org/10.1016/j.ifacol.2021.08.460} } </code></pre></div></div> Electro-Mechanical Actuators for the More Electric Aircraft 2021-01-30T00:00:00+00:00 http://mirkomazzoleni.github.io/book/2021/01/30/SPRINGER_FDI_Aero_book <p><img src="/images/2021-01-30-SPRINGER_FDI_Aero_book/FDI_aero_book_cover_full.jpg" style="width: 250px;" class="center_img" /></p> <h3 id="outline-of-the-book">Outline of the book</h3> <ol> <li>Introduction</li> <li>Reliability and Safety of Electro-Mechanical Actuators for Aircraft Applications</li> <li>Fault Diagnosis and Condition Monitoring Approaches</li> <li>Fault Diagnosis and Condition Monitoring of Aircraft Electro-Mechanical Actuators</li> <li>Concluding Remarks</li> </ol> <h3 id="abstract">Abstract</h3> <p>This book presents recent results on fault diagnosis and condition monitoring of airborne electromechanical actuators, illustrating both algorithmic and hardware design solutions to enhance the reliability of onboard more electric aircraft.</p> <p>The book begins with an introduction to the current trends in the development of electrically powered actuation systems for aerospace applications. Practical examples are proposed to help present approaches to reliability, availability, maintainability and safety analysis of airborne equipment. The terminology and main strategies for fault diagnosis and condition monitoring are then reviewed. The core of the book focuses on the presentation of relevant case studies of fault diagnosis and monitoring design for airborne electromechanical actuators, using different techniques. The last part of the book is devoted to a summary of lessons learned and practical suggestions for the design of fault diagnosis solutions of complex airborne systems.</p> <p>The book is written with the idea of providing practical guidelines on the development of fault diagnosis and monitoring algorithms for airborne electromechanical actuators. It will be of interest to practitioners in aerospace, mechanical, electronic, reliability and systems engineering, as well as researchers and postgraduates interested in dynamical systems, automatic control and safety-critical systems. [<strong><a href="https://www.springer.com/it/book/9783030617981">Link</a></strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, G. Di Rito and F. Previdi, "Electro-Mechanical Actuators for the More Electric Aircraft," in <strong>Advances in Industrial Control</strong>, Springer International Publishing, <a href="https://doi.org/10.1007/978-3-030-61799-8"> doi: 10.1007/978-3-030-61799-8</a>, ISBN: 978-3-030-61799-8. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@book{doi:10.1007/978-3-030-61799-8, author = {Mirko Mazzoleni and Gianpietro Di Rito and Fabio Previdi }, title = {Electro-Mechanical Actuators for the More Electric Aircraft}, isbn = {978-3-030-61799-8}, year = {2021}, publisher = {Springer International Publishing}, doi = {10.1007/978-3-030-61799-8}, } </code></pre></div></div> Kernel-based identification of asymptotically stable continuous-time linear dynamical systems 2021-01-29T00:00:00+00:00 http://mirkomazzoleni.github.io/journal/2021/01/29/IJC_kernel_continuous <h3 id="abstract">Abstract</h3> <p>In many engineering applications, continuous-time models are preferred to discrete-time ones, in that they provide good physical insight and can be derived also from non-uniformly sampled data. However, for such models, model selection is a hard task if no prior physical knowledge is given. In this paper, we propose a non-parametric approach to infer a continuous-time linear model from data, by automatically selecting a proper structure of the transfer function and guaranteeing to preserve the system stability properties. By means of benchmark simulation examples, the proposed approach is shown to outperform state-of-the-art continuous-time methods, also in the critical case when short sequences of canonical input signals, like impulses or steps, are used for model learning. [<strong><a href="https://www.tandfonline.com/doi/abs/10.1080/00207179.2020.1868580">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Scandella, M. Mazzoleni, S. Formentin and F. Previdi, "Kernel-based identification of asymptotically stable continuous-time linear dynamical systems," in <strong>International Journal of Control</strong>, vol. 0, no. 0, pp. 1-14, Feb. 2021, <a href="https://doi.org/10.1080/00207179.2020.1868580"> doi: 10.1080/00207179.2020.1868580</a>. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> @article{doi:10.1080/00207179.2020.1868580, author = { Matteo Scandella and Mirko Mazzoleni and Simone Formentin and Fabio Previdi }, title = {Kernel-based identification of asymptotically stable continuous-time linear dynamical systems}, journal = {International Journal of Control}, volume = {0}, number = {0}, pages = {1-14}, year = {2021}, publisher = {Taylor &amp; Francis}, doi = {10.1080/00207179.2020.1868580}, } </code></pre></div></div> Mechatronics applications of condition monitoring using a statistical change detection method 2020-07-20T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2020/07/20/IFAC_WC_MechApplicationsCD <h3 id="abstract">Abstract</h3> <p>In this paper, we propose the use of a change detection strategy to perform condition monitoring of mechanical components. The method looks for statistical changes in the distribution of features extracted from raw measurements, such as Root Mean Square or Crest Factor indicators. The proposed method works in a batch fashion, comparing data from one experiment to another. When these distributions differ by a specified amount, a degradation score is increased. The approach is tested on three experimental applications: (i) an ElectroMechanical Actuator (EMA) employed in flight applications, where the focus of the monitoring is on the ballscrew transmission; (ii) a CNC workbench, where the focus is on the vertical shaft bearing, (iii) an industrial EMA with focus on the ballscrew bearing. All components have undergone a severe experimental degradation process, that ultimately led to their failure. Results show how the proposed method is able to assess component degradation prior to their failure. [<strong><a href="https://reader.elsevier.com/reader/sd/pii/S2405896320303566?token=0F6A95C9CA048B82A62C77B1E7F8C69A8701EA4B6AF5C144FB475A75625AE0EBC0BA440928C9E60F09C827E9D658115D&amp;originRegion=eu-west-1&amp;originCreation=20210420095257">Paper</a></strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, M. Scandella, L. Maurelli, F. Previdi, "Mechatronics applications of condition monitoring using a statistical change detection method," in <strong>21st IFAC World Congress</strong>, vol. 53, no. 2, pp. 92-97, Jul. 2020, <a href="https://doi.org/10.1016/j.ifacol.2020.12.100"> doi: 10.1016/j.ifacol.2020.12.100 </a>, ISSN: 2405-8963. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI202092, title = {Mechatronics applications of condition monitoring using a statistical change detection method}, journal = {IFAC-PapersOnLine}, volume = {53}, number = {2}, pages = {92-97}, year = {2020}, note = {21th IFAC World Congress}, issn = {2405-8963}, doi = {10.1016/j.ifacol.2020.12.100}, author = {M. Mazzoleni and M. Scandella and L. Maurelli and F. Previdi}, keywords = {Predictive maintenance, condition monitoring, actuators, bearings}, } </code></pre></div></div> KBERG: A MatLab toolbox for nonlinear kernel-based regularization and system identification 2020-07-20T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2020/07/20/IFAC_WC_KBERG <h3 id="abstract">Abstract</h3> <p>We present KBERG, a MatLab package for nonlinear Kernel-BasEd ReGularization and system identification. The toolbox provides a complete environment for running experiments on simulated and experimental data from both static and dynamical systems. The whole identification procedure is supported: (i) data generation, (ii) excitation signals design; (iii) kernel-based estimation and (iv) evaluation of the results. One of the main differences of the proposed package with respect to existing frameworks lies in the possibility to separately define experiments, algorithms and test, then combining them as desired by the user. Once these three quantities are defined, the user can simply run all the computations with only a command, waiting for results to be analyzed. As additional noticeable feature, the toolbox fully supports the manifold regularization rationale, in addition to the standard Tikhonov one, and the possibility to compute different (but equivalent) types of solutions other than the standard one. [<strong><a href="https://www.sciencedirect.com/science/article/pii/S2405896320317468/pdf?md5=2722c8e77cc7207348670482c22831ba&amp;pid=1-s2.0-S2405896320317468-main.pdf">Paper</a></strong>, <strong><a href="https://cal.unibg.it/wp-content/uploads/papers/20191104-KBERG.7z">Code</a></strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, M. Scandella, F. Previdi, "KBERG: A MatLab toolbox for nonlinear kernel-based regularization and system identification," in <strong>21st IFAC World Congress</strong>, vol. 53, no. 2, pp. 1231-1236, Jul. 2020, <a href="https://doi.org/10.1016/j.ifacol.2020.12.1340"> doi: 10.1016/j.ifacol.2020.12.1340 </a>, ISSN: 2405-8963. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI20201231, title = {KBERG: A MatLab toolbox for nonlinear kernel-based regularization and system identification}, journal = {IFAC-PapersOnLine}, volume = {53}, number = {2}, pages = {1231-1236}, year = {2020}, note = {21th IFAC World Congress}, issn = {2405-8963}, doi = {10.1016/j.ifacol.2020.12.1340}, author = {M. Mazzoleni and M. Scandella and F. Previdi}, keywords = {Kernel methods, System Identification} } </code></pre></div></div> Identification of dynamic textures using Dynamic Mode Decomposition 2020-07-20T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2020/07/20/IFAC_WC_DynText <h3 id="abstract">Abstract</h3> <p>Dynamic Textures (DTs) are image sequences of moving scenes that present stationary properties in time. In this paper, we apply Dynamic Mode Decomposition (DMD) and Dynamic Mode Decomposition with Control (DMDc) to identify a parametric model of dynamic textures. The identification results are compared with a benchmark method from the dynamic texture literature, both from a mathematical and from a computational complexity point of view. Extensive simulations are carried out to assess the performance of the proposed algorithms with regards to synthesis and denoising purposes, with different types of dynamic textures. Results show that DMD and DMDc present lower error, lower residual noise and lower variance compared to the benchmark approach. [<strong><a href="https://reader.elsevier.com/reader/sd/pii/S2405896320302974?token=AECBEF76D0D212C90F3B020A66367234D9686759348BEEDCAA1211EEA171E8C9D26681B11602D61599727328A8723F04&amp;originRegion=eu-west-1&amp;originCreation=20210420094658">Paper</a></strong>, <strong><a href="https://drive.google.com/file/d/1EmNG39q_EfrSVCJ9scKkCrwvf3eGONNK/view">Resources</a></strong>]</p> <h4 id="reference">Reference</h4> <blockquote> D. Previtali, N. Valceschini, M. Mazzoleni, F. Previdi, "Identification of dynamic textures using Dynamic Mode Decomposition," in <strong>21st IFAC World Congress</strong>, vol. 53, no. 2, pp. 2423-2428, Jul. 2020, <a href="https://doi.org/10.1016/j.ifacol.2020.12.045"> doi: 10.1016/j.ifacol.2020.12.045 </a>, ISSN: 2405-8963. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{PREVITALI20202423, title = {Identification of dynamic textures using Dynamic Mode Decomposition}, journal = {IFAC-PapersOnLine}, volume = {53}, number = {2}, pages = {2423-2428}, year = {2020}, note = {21th IFAC World Congress}, issn = {2405-8963}, doi = {10.1016/j.ifacol.2020.12.045}, author = {D. Previtali and N. Valceschini and M. Mazzoleni and F. Previdi}, keywords = {Dynamic textures, System Identification, Texture Synthesis, Dynamic Mode Decomposition}, } </code></pre></div></div> A Note on the Numerical Solutions of Kernel-Based Learning Problems 2020-04-23T00:00:00+00:00 http://mirkomazzoleni.github.io/journal/2020/04/23/IEEE_TAC_kernel_note <h3 id="abstract">Abstract</h3> <p>In the last decade, kernel-based learning approaches typically employed for classification and regression have shown outstanding performance also in dynamic system identification. The typical way to compute the solution of this learning problem subsumes the inversion of the kernel matrix. However, due to limited machine precision, this might not be possible in many practical applications. In this article, we analyze the aforementioned problem and show that the typical estimate is just one of the possible infinite solutions that can be leveraged, considering both the supervised and the semisupervised settings. We show under which conditions the infinite solutions are equivalent, and if it is not the case, we provide a bound on the mismatch between two generic solutions. Then, we propose two specific solutions that are particularly suited to boost sparsity or performance. [<strong><a href="https://ieeexplore.ieee.org/document/9076872">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Scandella, M. Mazzoleni, S. Formentin and F. Previdi, "A Note on the Numerical Solutions of Kernel-Based Learning Problems," in <strong>IEEE Transactions on Automatic Control</strong>, vol. 66, no. 2, pp. 940-947, Feb. 2021, <a href="https://doi.org/10.1109/TAC.2020.2989769"> doi: 10.1109/TAC.2020.2989769 </a>, ISSN: 1558-2523. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{doi:10.1109/TAC.2020.2989769, author={M. {Scandella} and M. {Mazzoleni} and S. {Formentin} and F. {Previdi}}, journal={IEEE Transactions on Automatic Control}, title={A Note on the Numerical Solutions of Kernel-Based Learning Problems}, year={2021}, volume={66}, number={2}, pages={940-947}, doi={10.1109/TAC.2020.2989769} } </code></pre></div></div> Data on the first endurance activity of a Brushless DC motor for aerospace applications 2020-02-01T00:00:00+00:00 http://mirkomazzoleni.github.io/journal/2020/02/01/DATAINBRIEF_Reprise <h3 id="abstract">Abstract</h3> <p>This article describes the data acquired during the first test activitycarried out in the Reliable Electromechanical actuator for PRImary SurfacE with health monitoring (REPRISE) H2020 project. The data consist of a set of measures from an Electro-Mechanical Actuator(EMA) employed in small aircrafts, such as phase currents, positions, temperature and loads. A test bench was developed toperform endurance sessions in various loads and working conditions. Specifically, two datasets are provided: (i) measurements used to monitor the EMA degradation through time; (ii) measurements that characterize the EMA closed-loop dynamic behaviour in healthy condition. The data are helpful to develop and test system identification methods and condition monitoring approaches. [<strong><a href="https://www.sciencedirect.com/science/article/pii/S2352340920300470">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, F. Previdi, M. Scandella and G. Pispola, "Data on the first endurance activity of a Brushless DC motor for aerospace applications", <strong>Elsevier Data in Brief</strong>, 2020, <a href="https://doi.org/10.1016/j.dib.2020.105153"> doi: 10.1016/j.dib.2020.105153 </a>, ISSN: 2352-3409, vol. 29. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI2020105153, title = "Data on the first endurance activity of a Brushless DC motor for aerospace applications", journal = "Data in Brief", volume = "29", pages = "105153", year = "2020", issn = "2352-3409", author = "Mirko Mazzoleni and Matteo Scandella and Fabio Previdi and Giulio Pispola", doi = "10.1016/j.dib.2020.105153" } </code></pre></div></div> A comparison of manifold regularization approaches for kernel-based system identification 2019-12-04T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2019/12/04/IFAC_ALCOS_Comparison_manifold <h3 id="abstract">Abstract</h3> <p>In this paper, we present a simulation study to investigate the role of manifold regularization in kernel-based approaches for nonparametric nonlinear SISO (Single-Input Single-Output) system identification. This problem is tackled as the estimation of a static nonlinear function that maps regressors (that contain past values of both input and output of the dynamic system) to the system outputs. Manifold regularization, as opposite to the Tikhonov one, enforces a local smoothing constraint on the estimated function. It is based on the assumption that the regressors lie on a manifold in the regressors space. This manifold is usually approximated with a weighted graph that connects the regressors. The present work analyzes the performance of kernel-based methods estimates when different choices are made for the graph connections and their respective weights. The approach is tested on benchmark nonlinear systems models, for different connections and weights strategies. Results give an intuition about the most promising choices in order to adopt manifold regularization for system identification [<strong><a href="https://www.sciencedirect.com/science/article/pii/S2405896319325868">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, M. Scandella, F. Previdi, "A comparison of manifold regularization approaches for kernel-based system identification", <strong>IFAC Adaptive and Learning Control Systems (ALCOS) conference</strong>, 2019. <a href="https://doi.org/10.1016/j.ifacol.2019.12.641">, doi: 10.1016/j.ifacol.2019.12.641 </a> ISSN: 2405-8963, IFAC-PapersOnLine, vol. 52, issue 29, pp. 180-185. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI2019180, title = "A comparison of manifold regularization approaches for kernel-based system identification", journal = "IFAC-PapersOnLine", volume = "52", number = "29", pages = "180 - 185", year = "2019", note = "13th IFAC Workshop on Adaptive and Learning Control Systems ALCOS 2019", issn = "2405-8963", doi = "https://doi.org/10.1016/j.ifacol.2019.12.641", url = "http://www.sciencedirect.com/science/article/pii/S2405896319325868", author = "M. Mazzoleni and M. Scandella and F. Previdi", } </code></pre></div></div> Experimental Development of a Health Monitoring Method for Electro-Mechanical Actuators of Flight Control Primary Surfaces in More Electric Aircrafts 2019-11-04T00:00:00+00:00 http://mirkomazzoleni.github.io/journal/2019/11/04/IEEE_Access_Reprise <h3 id="abstract">Abstract</h3> <p>This paper presents a health monitoring approach for Electro-Mechanical Actuators (EMA). We define four different indicators to continuously evaluate the health state of the system. The four indicators are computed by leveraging the output from a Statistical Process Monitoring (SPM) method based on multivariate statistics, such as the Hotelling’s $$T^{2}$$ statistic and the $$Q$$ statistic. SPM approaches give a dichotomous answer, i.e. the presence/absence of a fault. In this work, we propose four ways to compute a continuous indicator starting from the discrete SPM output, that is better suited for health monitoring. We test the approach using a dataset collected from a large experimental campaign on a 1:1 scale EMA for primary flight controls of small aircrafts, that led to EMA failure. Results show the effectiveness of the method. [<strong><a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;arnumber=8878102">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, F. Previdi, M. Scandella and G. Pispola, "Experimental Development of a Health Monitoring Method for Electro-Mechanical Actuators of Flight Control Primary Surfaces in More Electric Aircrafts", <strong>IEEE Access</strong>, 2019. <a href="https://doi.org/10.1109/ACCESS.2019.2948781">, doi: 10.1109/ACCESS.2019.2948781 </a>, ISSN: 2169-3536, vol. 7, pp. 153618-153634. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{8878102, author={M. {Mazzoleni} and F. {Previdi} and M. {Scandella} and G. {Pispola}}, journal={IEEE Access}, title={Experimental Development of a Health Monitoring Method for Electro-Mechanical Actuators of Flight Control Primary Surfaces in More Electric Aircrafts}, year={2019}, volume={7}, number={}, pages={153618-153634}, keywords={Actuators;Monitoring;Aircraft;Fault diagnosis;Aerospace control;Fault detection;Safety;Actuators;aerospace components;aerospace safety;condition monitoring;electromechanical systems;fault detection;predictive maintenance;statistical process monitoring}, doi={10.1109/ACCESS.2019.2948781}, ISSN={}, month={},} </code></pre></div></div> Nonlinear system identification via data augmentation 2019-06-19T00:00:00+00:00 http://mirkomazzoleni.github.io/journal/2019/06/19/SCL_DataAugmentedSysid <h3 id="abstract">Abstract</h3> <p>This paper presents a novel nonparametric approach to the identification of nonlinear dynamical systems. The proposed methodology exploits the potential of manifold learning on an artificially augmented dataset, obtained without running new experiments on the plant. The additional data are employed for approximating the manifold where input regressors lie. The knowledge of the manifold acts as a prior information on the system, that induces a proper regularization term on the identification cost. The new regularization term, as opposite to the standard Tikhonov one, enforces local smoothness of the function along the manifold. A graph-based algorithm tailored to dynamical systems is proposed to generate the augmented dataset. The hyperparameters of the method, along with the order of the system, are estimated from the available data. Numerical results on a benchmark Nonlinear Finite Impulse Response (NFIR) system show that the proposed approach may outperform the state of the art nonparametric methods. [<strong><a href="">Paper</a></strong>, <strong><a href="https://www.sciencedirect.com/science/article/pii/S0167691119300532">ScienceDirect</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> S. Formentin, M. Mazzoleni, M. Scandella and F. Previdi, "Nonlinear system identification via data augmentation", <strong>Systems &amp; Control Letters</strong>, 2019. <a href="https://doi.org/10.1016/j.sysconle.2019.04.004">, doi: 10.1016/j.sysconle.2019.04.004 </a>, ISSN: 0167-6911, pp. 56-63. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{FORMENTIN201956, title = "Nonlinear system identification via data augmentation", journal = "Systems &amp; Control Letters", volume = "128", pages = "56 - 63", year = "2019", issn = "0167-6911", doi = "https://doi.org/10.1016/j.sysconle.2019.04.004", url = "http://www.sciencedirect.com/science/article/pii/S0167691119300532", author = "Simone Formentin and Mirko Mazzoleni and Matteo Scandella and Fabio Previdi", keywords = "System identification, Semi-supervised learning" } </code></pre></div></div> Classification of light charged particles via learning-based system identification 2018-12-14T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2018/12/14/IEEE_CDC_Particles <h3 id="abstract">Abstract</h3> <p>This paper presents a nonparametric learning approach for the automatic classification of particles produced by the collision of a heavy ion beam on a target, by focusing on the identification of isotopes of the most energic light charged particles (LCP). In particular, it is shown that the measurement of the particle collision can be traced back to the impulse response of a linear dynamical system and, by employing recent kernel-based approaches, a nonparametric model is found that effectively trades off bias and variance of the model estimate. Then, the smoothened signals can be employed to classify the different types of particles. Experimental results show that the proposed method outperforms the state of the art approaches. All the experiments are carried out with the large detector array CHIMERA (Charge Heavy Ions Mass and Energy Resolving Array) in Catania, Italy. [<strong><a href="http://cal.unibg.it/wp-content/uploads/2019/01/2018-IEEE-CDC-Classification-of-light-charged-particles-via-learning-based-system-identification_copyright.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, M. Scandella, S. Formentin and F. Previdi, "Classification of light charged particles via learning-based system identification", <strong> 57th IEEE Conference on Decision and Control (CDC) </strong>, Miami Beach, Florida, USA, 2018. <a href="https://doi.org/10.1109/CDC.2018.8618946"> doi: 10.1109/CDC.2018.8618946 </a>, ISBN: 978-1-5386-1395-5, pp. 6053-6058. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@INPROCEEDINGS{8618946, author={M. {Mazzoleni} and M. {Scandella} and S. {Formentin} and F. {Previdi}}, booktitle={2018 IEEE Conference on Decision and Control (CDC)}, title={Classification of Light Charged Particles Via Learning-Based System Identification}, year={2018}, volume={}, number={}, pages={6053-6058}, keywords={Kernel;Detectors;Atmospheric measurements;Particle measurements;Ions;Covariance matrices;Atomic measurements}, doi={10.1109/CDC.2018.8618946}, ISSN={2576-2370}, month={Dec},} </code></pre></div></div> Condition monitoring of electro-mechanical actuators for aerospace using batch change detection algorithms 2018-08-24T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2018/08/24/IEEE_CCTA_Reprise <h3 id="abstract">Abstract</h3> <p>This paper proposes the use of a change detection algorithm to monitor the degradation of mechanical components of Electro-Mechanical Actuators (EMA) employed in the aerospace industry. Contrary to the standard on-line application of change detection methods, the presented approach can be applied in a batch mode, leveraging on the knowledge of when the data were collected. The methodology is applied to data measured during an endurance test campaign on a real EMA employed in aerospace, by means of a developed test bench, progressively bringing the EMA to failure. Three rationales for building an indicator of degradation are tested. Results show how the method is able to assess the degradation of the actuator over time, constituting a first step towards a condition monitoring solution for the more-electric-aircraft of the future. [<strong><a href="http://cal.unibg.it/wp-content/uploads/2019/01/2018-IEEE-CCTA-Condition-monitoring-of-electro-mechanical-actuators-for-aerospace-using-batch-change-detection-algorithms_copyright.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, M. Scandella, Y. Maccarana, F. Previdi, G. Pispola, N. Porzi, "Condition monitoring of electro-mechanical actuators for aerospace using batch change detection algorithms", <strong> 2nd IEEE Conference on Control Technology and Applications (CCTA) </strong>, Copenhagen, Denmark, 2018. <a href="https://doi.org/10.1109/CCTA.2018.8511334"> doi:10.1109/CCTA.2018.8511334 </a>, ISBN: 978-1-5386-7698-1, pp. 1747-1752. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@INPROCEEDINGS{8511334, author={M. Mazzoleni and M. Scandella and Y. Maccarana and F. Previdi and G. Pispola and N. Porzi}, booktitle={2018 IEEE Conference on Control Technology and Applications (CCTA)}, title={Condition Monitoring of Electro-Mechanical Actuators for Aerospace Using Batch Change Detection Algorithms}, year={2018}, volume={}, number={}, pages={1747-1752}, keywords={aerospace industry;condition monitoring;electromechanical actuators;batch change detection algorithms;mechanical components;aerospace industry;test bench;condition monitoring;electro-mechanical actuators;leveraging;Degradation;Actuators;Monitoring;Condition monitoring;Estimation;Aerospace industry;Standards}, doi={10.1109/CCTA.2018.8511334}, ISSN={}, month={Aug},} </code></pre></div></div> Identification of nonlinear dynamical system with synthetic data: a preliminary investigation 2018-07-10T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2018/07/10/IFAC_SYSID_S3I <h3 id="abstract">Abstract</h3> <p>This paper introduces a new rationale for learning nonlinear dynamical systems. The method makes use of an additional identification dataset, obtained without performing a new experiment on the system under study. The data are generated in an automatical manner, starting from a set of experimentally acquired measurements. In order to leverage the additional generated information, fundamental techniques from the machine learning field known as Semi-Supervised Learning (SSL) are employed and adapted. The problem is then cast as a regularized parametric learning problem. The effectiveness of the proposed approach is assessed on various nonlinear benchmark systems via repeated simulations, comparing the obtained results with a standard regularization method for learning parametric models. [<strong><a href="http://cal.unibg.it/wp-content/uploads/2018/10/2018-IFAC-SYSID-Identification-of-nonlinear-dynamical-system-with-synthetic-data-a-preliminary-investigation.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, M. Scandella, S. Formentin, F. Previdi, "Identification of nonlinear dynamical system with synthetic data: a preliminary investigation", <strong> 18th IFAC Symposium on System Identification (SYSID)</strong>, Stockholm, Sweden, 2018, <a href="https://doi.org/10.1016/j.ifacol.2018.09.227"> doi: 10.1016/j.ifacol.2018.09.227 </a>, ISSN: 2405-8963, pp. 622 - 627. <a href=""> </a> </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI2018622, title = "Identification of nonlinear dynamical system with synthetic data: a preliminary investigation", journal = "IFAC-PapersOnLine", volume = "51", number = "15", pages = "622 - 627", year = "2018", note = "18th IFAC Symposium on System Identification SYSID 2018", issn = "2405-8963", doi = "https://doi.org/10.1016/j.ifacol.2018.09.227", author = "M. Mazzoleni and M. Scandella and S. Formentin and F. Previdi", keywords = "System Identification, Semi-Supervised Learning, Regularization" } </code></pre></div></div> Condition assessment of electro-mechanical actuators for aerospace using relative density-ratio estimation 2018-06-21T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2018/06/21/IFAC_SYSID_Reprise <h3 id="abstract">Abstract</h3> <p>This paper faces the problem of developing an effective Condition Monitoring algorithm (CM) for Electro-Mechanical Actuators (EMA) in aerospace applications. In this view, a test campaign has been carried out in order to progressively bring the EMA near to failure, by means of a test bench suitably developed. Various indicators have been computed from measured data, for a set of the EMA’s working regimes. The statistical distribution of the computed features is assessed and tracked over time. We propose an online statistical approach, based on density estimation techniques, in order to detect potential changes in the data distribution. The discovered changes are then interpreted as a modification of the EMA’s health state, leading to a first building block for a complete condition assessment strategy. [<strong><a href="https://cal.unibg.it/wp-content/uploads/papers/2018-IFAC-SYSID-Condition-assessment-of-electro-mechanical-actuators-for-aerospace-using-relative-density-ratio-estimation.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, M. Scandella, Y. Maccarana, F. Previdi, G. Pispola, N. Porzi, "Condition assessment of electro-mechanical actuators for aerospace using relative density-ratio estimation", <strong> 18th IFAC Symposium on System Identification (SYSID)</strong>, Stockholm, Sweden, 2018, <a href="https://doi.org/10.1016/j.ifacol.2018.09.070"> doi: 10.1016/j.ifacol.2018.09.070 </a>, ISSN: 2405-8963, pp. 957 - 962. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI2018957, title = "Condition assessment of electro-mechanical actuators for aerospace using relative density-ratio estimation", journal = "IFAC-PapersOnLine", volume = "51", number = "15", pages = "957 - 962", year = "2018", note = "18th IFAC Symposium on System Identification SYSID 2018", issn = "2405-8963", doi = "https://doi.org/10.1016/j.ifacol.2018.09.070", author = "M. Mazzoleni and M. Scandella and Y. Maccarana and F. Previdi and G. Pispola and N. Porzi", keywords = "Condition monitoring, Change-point detection, Kernel methods, Time-series" } </code></pre></div></div> Development and Experimental Testing of a Health Monitoring System of Electro-Mechanical Actuators for Small Airplanes 2018-06-21T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2018/06/21/IEEE_MED_Reprise <h3 id="abstract">Abstract</h3> <p>This paper reports the preliminary results of the REPRISE (Reliable Electromechanical actuator for PRImary SurfacE with health monitoring) project, which aims to design a novel Electro-Mechanical Actuator (EMA) to be used on primary flight surfaces of small aircrafts. An important element of the actuator control system is a Health Monitoring (HM) module. This component is an algorithm able to detect anomalies in the device even if there is no evident loss of ability in pursuing its main function (position tracking). In particular, the project aim is to identify any degradation in the mechanical transmission elements, the ballscrew and other components such as bearings. Moreover, it is strongly advisable that the HM algorithm is based on a feature whose value can be easily computed and monitored during the actuator life. In this work, a large experimental activity has been carried out with the purpose of bringing the actuator close to failure, by progressive fault injection in overload operating conditions. A feature named $$\Sigma$$, that is, the mean of the RMS of the three phase currents (the input to the electric motor), is proposed as a parameter for HM. The effectiveness of this parameter in detecting the mechanical transmission degradation is experimentally tested. The degradation has been confirmed by visual inspection and screw thread profile measurements. In spite of this, the actuator is still able to perform position tracking in an effective way. [<strong><a href="http://cal.unibg.it/wp-content/uploads/2019/01/2018-IEEE-MED-Development-and-Experimental-Testing-of-a-Health-Monitoring-System_copyright.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> F. Previdi, Y. Maccarana, M. Mazzoleni, M. Scandella, G. Pispola and N. Porzi, "Development and Experimental Testing of a Health Monitoring System of Electro-Mechanical Actuators for Small Airplanes," <strong> 26th Mediterranean Conference on Control and Automation (MED) </strong>, Zadar, Croatia, 2018, <a href="https://doi.org/10.1109/MED.2018.8442734"> doi:10.1109/MED.2018.8442734</a>, ISBN: 978-1-5386-7890-9, ISSN: 2473-3504, pp. 673-678. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@INPROCEEDINGS{8442734, author={F. Previdi and Y. Maccarana and M. Mazzoleni and M. Scandella and G. Pispola and N. Porzi}, booktitle={2018 26th Mediterranean Conference on Control and Automation (MED)}, title={Development and Experimental Testing of a Health Monitoring System of Electro-Mechanical Actuators for Small Airplanes}, year={2018}, volume={}, number={}, pages={673-678}, keywords={Actuators;Monitoring;Degradation;Performance evaluation;Current measurement;Aircraft;Fasteners}, doi={10.1109/MED.2018.8442734}, ISSN={2473-3504}, ISBN = {978-1-5386-7890-9}, month={June},} </code></pre></div></div> Semi-supervised learning of dynamical systems: a preliminary study 2018-05-01T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2018/05/01/ECC_S3I <h3 id="abstract">Abstract</h3> <p>System identification has, in recent years, drawn insightful inspirations from techniques and concepts of the statistical learning research area. Examples of this consist in the widely adoption of regularization and kernels methods, in order to better condition the identification problem. By pursuing the same purpose, we introduce the concept of semi-supervised learning to tackle the system identification challenge. The problem, casted into the framework of the Reproducing Kernel Hilbert Spaces, leads to a new regularization technique, called manifold regularization. An application to the identification of a NFIR model is carried out, and a comparison with the standard Tikhonov regularization technique is shown [<strong><a href="http://cal.unibg.it/wp-content/uploads/2019/01/2018-ECC-Semi-supervised-learning-of-dynamical-systems-a-preliminary-study_copyright.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, S. Formentin, M. Scandella, F. Previdi, "Semi-supervised learning of dynamical systems: a preliminary study", <strong> 16th European Control Conference (ECC) </strong>, Limassol, Cyprus, 2018. <a href="https://doi.org/10.23919/ECC.2018.8550550"> doi: 10.23919/ECC.2018.8550550 </a>, ISBN: 978-3-9524-2698-2, pp. 2824-2829. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@INPROCEEDINGS{8550550, author={M. Mazzoleni and S. Formentin and M. Scandella and F. Previdi}, booktitle={2018 European Control Conference (ECC)}, title={Semi-supervised learning of dynamical systems: a preliminary study}, year={2018}, volume={}, number={}, pages={2824-2829}, keywords={Manifolds;Kernel;Semisupervised learning;Statistical learning;Symmetric matrices;Standards;Hilbert space}, doi={10.23919/ECC.2018.8550550}, ISSN={}, month={June},} </code></pre></div></div> Kernel manifold regression for the coupled electric drives dataset 2018-04-12T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2018/04/12/Nonlinear_Sysid_Workshop <h3 id="abstract">Abstract</h3> <p>The aim of this work is to introduce the concept of manifold regularization to the identification of dynamic systems. The method has been tested on the coupled electric drives problem, using a purely black box approach in the framework of the Reproducing Kernel Hilbert Spaces (RKHS) [<strong><a href="http://cal.unibg.it/cal/papers/kernel-manifold-regression-for-the-coupled-electric-drives-dataset/">Presentation</a></strong>, <strong><a href="http://cal.unibg.it/cal/papers/kernel-manifold-regression-for-the-coupled-electric-drives-dataset/">Code</a></strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, G. Maroni, Y. Maccarana, S. Formentin and F. Previdi, "Fault detection in airliner electro-mechanical actuators via hybrid particle filtering", <strong>3rd Nonlinear System Identification Benchmarks Workshop, Liege, Belgium. </strong>, Liege, Belgium, 2018. </blockquote> <h4 id="bibtex">Bibtex</h4> <p>Please cite the following papers:</p> <ul> <li><a href="/conference/2018/05/01/ECC_S3I/">Semi-supervised learning of dynamical systems: a preliminary study</a></li> </ul> Unsupervised Learning of Fundamental Emotional States via Word Embeddings 2017-12-09T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2017/12/09/IEEE_SSCI_Sentiment <h3 id="abstract">Abstract</h3> <p>This paper presents a novel approach for the detection of emotional states from textual data. The considered sentiments are those known as Ekman’s basic emotions (Anger, Disgust, Sadness, Happiness, Fear, Surprise). The method is completely unsupervised and it is based on the concept of word embeddings. This technique permits to represent a single word through a vector, giving a methematical representation of the word’s semantic. The focus of the work is to assign the percentage of the aforementioned emotions to short sentences. The method has been tested on a collection of Twitter messages and on the SemEval 2007 news headlines dataset. The entire period is expressed as the mean of the word’s vectors that compose the phrase, after preprocessing steps. The sentence representation is finally compared with each emotion’s word vector, to find the most representative with respect to the sentence’s vector [<strong><a href="http://cal.unibg.it/wp-content/uploads/2019/01/2018-IEEE-SSCI-Unsupervised-Learning-of-Fundamental-Emotional-states-via-word-embeddings_copyright.pdf">Paper</a> </strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, G. Maroni and F. Previdi, "Unsupervised learning of fundamental emotional states via word embeddings", <strong> 2017 IEEE Symposium Series on Computational Intelligence (SSCI) </strong>, Honolulu, Hawaii, USA, <a href="https://doi.org/10.1109/SSCI.2017.8280819"> doi: 10.1109/SSCI.2017.8280819 </a>, ISBN: 978-1-5386-2726-6. <a href=""> </a> </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{8280819, author={M. Mazzoleni and G. Maroni and F. Previdi}, booktitle={2017 IEEE Symposium Series on Computational Intelligence (SSCI)}, title={Unsupervised learning of fundamental emotional states via word embeddings}, year={2017}, volume={}, number={}, pages={1-6}, doi={10.1109/SSCI.2017.8280819}, ISBN={978-1-5386-2726-6}, } </code></pre></div></div> Control-oriented modeling of SKU-level demand in retail food market 2017-07-14T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2017/07/14/IFAC_WC_sku <h3 id="abstract">Abstract</h3> <p>In food market, modeling the dynamics of Stock-Keeping Unit (SKU) requests is of fundamental importance, not only to understand the market but also for optimization and control purposes. In fact, standing on model-based predictions of future demand, an efficient planning of the promotional calendar can be devised. Moreover, better inventory management can be achieved, by reducing losses due to expired aliments remained unsold and improving distribution operations. In this work, data-driven control-oriented modeling of such a demand is discussed and a novel switching dynamical strategy is proposed. When applied to experimental data from a real food company, the above strategy is shown to accurately predict future sales under fixed promotion events and outperform the state-of-the-art modeling methodsecause the distribution of the disturbances which affect the system is usually not gaussian. [<strong><a href="http://cal.unibg.it/cal/wp-content/uploads/papers/2017-IFAC-WC-retail-food-market.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni and S. Formentin and F. Previdi and S.M. Savaresi, "Control-oriented modeling of SKU-level demand in retail food market", <strong>20th IFAC World Congress </strong>, Toulouse, France, 2017, <a href="https://doi.org/10.1016/j.ifacol.2017.08.1951"> doi: 10.1016/j.ifacol.2017.08.1951 </a>, ISSN: 2405-8963, pp. 13003-13008. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI201713003, title = "Control-oriented modeling of SKU-level demand in retail food market", journal = "20th IFAC World Congress", volume = "50", number = "1", pages = "13003 - 13008", year = "2017", issn = "2405-8963", doi = "https://doi.org/10.1016/j.ifacol.2017.08.1951", author = "M. Mazzoleni and S. Formentin and F. Previdi and S.M. Savaresi", } </code></pre></div></div> A comparison of data-driven fault detection methods with application to aerospace electro-mechanical actuators 2017-07-14T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2017/07/14/IFAC_WC_modelFreeFD <h3 id="abstract">Abstract</h3> <p>In this paper, a model-free framework is proposed in order to equip electromechanical actuators, deployed in aerospace applications, with health-monitoring capabilities. A large experimental activity has been carried out to perform acquisitions with both healthy and faulty components, taking into consideration the standard regulations for environmental testing of avionics hardware. The injected faults followed a Fault Tree Analysis and Failure Mode and Effect Analysis. Features, belonging to different domains, have been extracted from the measured signals. These indexes are based largely on the motor driving currents, in order to avoid the installation of new sensors. Finally, a Gradient Tree Boosting algorithm has been chosen to detect the system status: the choice has been dictated by a comparison with other known classification algorithms. Furthermore, the most promising features for a classification point of view are reported. [<strong><a href="http://cal.unibg.it/cal/wp-content/uploads/papers/2017-IFAC-WC-Holmes-model-free.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni and Y. Maccarana and F. Previdi, "A comparison of data-driven fault detection methods with application to aerospace electro-mechanical actuators", <strong>20th IFAC World Congress </strong>, Toulouse, France, 2017, <a href="https://doi.org/10.1016/j.ifacol.2017.08.1837"> doi: 10.1016/j.ifacol.2017.08.1837 </a>, ISSN: 2405-8963, pp. 12797 - 12802. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI201712797, title = "A comparison of data-driven fault detection methods with application to aerospace electro-mechanical actuators", journal = "20th IFAC World Congress, volume = "50", number = "1", pages = "12797 - 12802", year = "2017", issn = "2405-8963", doi = "https://doi.org/10.1016/j.ifacol.2017.08.1837", author = "M. Mazzoleni and Y. Maccarana and F. Previdi", } </code></pre></div></div> Fault detection in airliner electro-mechanical actuators via hybrid particle filtering 2017-07-14T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2017/07/14/IFAC_WC_ModelBasedFD <h3 id="abstract">Abstract</h3> <p>In this paper, a modification of the standard particle filter algorithm is applied to face the fault detection issue, on an electro-mechanical actuator. The variant, based on a hybrid system interpretation of the health monitoring problem, is known as OTPF (Observation and Transition Particle Filter). By modeling each fault condition as a hybrid system mode, the method is able to assess the most likely regime for each time stamp. Following this approach, data were acquired from an electro-mechanical actuator, used in aerospace environment, under various fault conditions. The injected mechanical defects consisted in damages undergone by steel spheres, inside a ballscrew transmission system. Then, a model for each condition was identified and the proposed methodology applied. Simulation results show the superiority of the method with respect to the EKF (Extended Kalman Filter), especially because the distribution of the disturbances which affect the system is usually not gaussian. [<strong><a href="http://cal.unibg.it/cal/wp-content/uploads/papers/2017-IFAC-WC-Holmes-particle-filter.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, G. Maroni, Y. Maccarana, S. Formentin and F. Previdi, "Fault detection in airliner electro-mechanical actuators via hybrid particle filtering", <strong>20th IFAC World Congress </strong>, Toulouse, France, 2017, <a href="https://doi.org/10.1016/j.ifacol.2017.08.640"> doi: 10.1016/j.ifacol.2017.08.640 </a>, ISSN: 2405-8963, pp. 2860-2865. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI20172860, title = "Fault detection in airliner electro-mechanical actuators via hybrid particle filtering", journal = "20th IFAC World Congress", volume = "50", number = "1", pages = "2860 - 2865", year = "2017", issn = "2405-8963", doi = "https://doi.org/10.1016/j.ifacol.2017.08.640", author = "M. Mazzoleni and G. Maroni and Y. Maccarana and S. Formentin and F. Previdi", } </code></pre></div></div> Development of a reliable electro-mechanical actuator for primary control surfaces in small aircrafts 2017-07-01T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2017/07/01/IEEE_AIM <h3 id="abstract">Abstract</h3> <p>This paper lays the foundation for the development of an innovative electro-mechanical actuator for flight-control surfaces. The main features of the enhanced system will be the introduction of new sensor types and health monitoring capabilities. A dedicated test bench has been developed in order to perform endurance tests, leading the mechanical components to failure. In this view, a Condition Monitoring (CM) algorithm is expected to assess the progressive faults degradation, estimating their progression and the Remaining Useful Life (RUL) of related subsystems. Based on the development of new hardware and software components, the REPRISE project is expected to deliver a significant contribution to the More Electric Aircraft mission. [<strong><a href="http://cal.unibg.it/wp-content/uploads/2019/01/2017-IEEE-AIM-Reprise_copyright.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, Y. Maccarana, F. Previdi, G. Pispola, M. Nardi, F. Perni and S. Toro, "Development of a reliable electro-mechanical actuator for primary control surfaces in small aircrafts", <strong> IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) </strong>, Munich, Germany, 2017, <a href="https://doi.org/10.1109/AIM.2017.8014172"> doi: 10.1109/AIM.2017.8014172 </a>, ISBN: 978-1-5090-6000-9, ISSN: 2159-6255, pp. 1142-1147. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@INPROCEEDINGS{8014172, author={M. Mazzoleni and Y. Maccarana and F. Previdi and G. Pispola and M. Nardi and F. Perni and S. Toro}, booktitle={2017 IEEE International Conference on Advanced Intelligent Mechatronics (AIM)}, title={Development of a reliable electro-mechanical actuator for primary control surfaces in small aircrafts}, year={2017}, volume={}, number={}, pages={1142-1147}, doi={10.1109/AIM.2017.8014172}, ISSN={2159-6255}, ISBN = {978-1-5090-6000-9}, month={July}, } </code></pre></div></div> Classification algorithms analysis for brain-computer interface in drug craving therapy 2017-02-20T00:00:00+00:00 http://mirkomazzoleni.github.io/journal/2017/02/20/BSC_BCI <h3 id="abstract">Abstract</h3> <p>This paper presents a novel therapy to recover patients from drug craving diseases, with the use of brain–computer interfaces (BCIs). The clinical protocol consists of trying to mentally repel drug-related images, and a Stroop test is used to evaluate the blue therapy effect. The method requires a BCI hardware package and a software program which communicates with the device. In order to improve the BCI detection rates, data were collected from five different healthy subjects during the training. These measurements are then used to design a better classification algorithm with respect to the default BCI classifier. The investigated algorithms are logistic regression, support vector machines, decision trees, k-nearest neighbors and Naive Bayes. Although the low number of participants is not enough to guarantee statistically significant results, the designed algorithms perform better than the default one, in terms of accuracy, F1-score and area under the curve (AUC). The Naive Bayes method has been chosen as the best classifier between the tested ones, giving a +12.21% performance boost as concerns the F1-score metric. The presented methodology can be extended to other types of craving problems, such as food, and alcohol. Results relative to the effectiveness of the proposed approach are reported on a set of patients with drug craving problems. [<strong><a href="http://cal.unibg.it/cal/wp-content/uploads/papers/2017-BSPC-Brain-Computer-Interface.pdf">Paper</a></strong>, <strong><a href="https://www.sciencedirect.com/science/article/pii/S1746809417300198?dgcid=author">ScienceDirect</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, F. Previdi, S. Bonfiglio, "Classification algorithms analysis for brain-computer interface in drug craving therapy", <strong>Biomedical Signal Processing and Control</strong>, Volume 52, 2019, Pages 463-472, ISSN 1746-8094. <a href="https://doi.org/10.1016/j.bspc.2017.01.011"> doi: 10.1016/j.bspc.2017.01.011 </a> </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI2017, title = "Classification algorithms analysis for brain–computer interface in drug craving therapy", journal = "Biomedical Signal Processing and Control", volume = "52", pages = "463 - 472", year = "2019", issn = "1746-8094", doi = "https://doi.org/10.1016/j.bspc.2017.01.011", author = "Mirko Mazzoleni and Fabio Previdi and Natale Salvatore Bonfiglio", } </code></pre></div></div> Human perception of probability 2016-12-17T00:00:00+00:00 http://mirkomazzoleni.github.io/blog/2016/12/17/perception_of_probability <p>Back in 1964, <a href="https://en.wikipedia.org/wiki/Sherman_Kent">Sherman Kent</a> tried to address the problem of misleading odds expressions in <a href="https://en.wikipedia.org/wiki/National_Intelligence_Estimate">National Intelligence Estimates</a> (NIE). Recognizing communication problems caused by imprecise probablistic statements, Kent proposed a schema for standardizing the uncertainty ranges associated with words used to communicate the likelihood of an event. In his work <a href="https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/sherman-kent-and-the-board-of-national-estimates-collected-essays/6words.html">Words of Estimative Probability</a>, the author defined five ranges of uncertainty and related expressions to convey them:</p> <ol> <li><strong>Almost Certain</strong>: $$93\% \pm 6\%$$</li> <li><strong>Probable</strong>: $$75\% \pm 12\%$$</li> <li><strong>Chances About Even</strong>: $$50\% \pm 10\%$$</li> <li><strong>Probably Not</strong>: $$30\% \pm 10\%$$</li> <li><strong>Almost Certainly Not</strong>: $$7\% \pm 5\%$$</li> </ol> <p>Kent defined also a set of equivalent expressions with respect to the ones defined previously:</p> <ol> <li><strong>Almost Certain</strong>: virtually certain, highly probable, highly likely, odds (or chances) overwhelming</li> <li><strong>Probable</strong>: likely, we believe, we estimate</li> <li><strong>Chances About Even</strong>: chances a little better (or less) than even, improbable, unlikely</li> <li><strong>Probably Not</strong>: we believe that…not, we estimate that…not, we doubt, doubtful</li> <li><strong>Almost Certainly Not</strong>: virtually impossible, almost impossible, some slight chance, highly doubtful</li> </ol> <p><a href="https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/psychology-of-intelligence-analysis/art15.html#rft144">This article</a> describes an experiment that was performed with 23 NATO military officers accustomed to reading intelligence reports. They were given a number of sentences such as: “It is highly unlikely that…”: all the sentences were the same except that the verbal expressions of probability changed. The officers were asked what percentage probability they would attribute to each statement if they read it in an intelligence report. Each dot in the table represents one officer’s probability assignment. While there was broad consensus about the meaning of “better than even”, there was a wide disparity in interpretation of other probability expressions. The shaded areas in the table show the ranges proposed by Kent.</p> <p><img src="/images/2016-12-17-perception_of_probability/perception.gif" style="width: 500px;" class="center_img" /></p> <p>The experiment showed how the probabilities conveyed by different expressions could be perceived by a human. In <a href="https://www.reddit.com/r/dataisbeautiful/comments/3hi7ul/oc_what_someone_interprets_when_you_say_probably/">this</a> Reddit thread, the <a href="https://www.reddit.com/user/zonination">author</a> replicated the experiment by collecting data via subreddit <a href="https://www.reddit.com/r/samplesize">r/samplesize</a>. The data can be found <a href="https://github.com/zonination/perceptions">here</a>. By visualizing the notched boxplots, the results match very well the previous CIA’s experiment.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Import files, load plot and data packages</span><span class="w"> </span><span class="n">probly</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read.csv</span><span class="p">(</span><span class="s2">"probly.csv"</span><span class="p">,</span><span class="w"> </span><span class="n">stringsAsFactors</span><span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">reshape2</span><span class="p">)</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">scales</span><span class="p">)</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">RColorBrewer</span><span class="p">)</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">ggthemes</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Melt data into column format.</span><span class="w"> </span><span class="n">probly</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">melt</span><span class="p">(</span><span class="n">probly</span><span class="p">)</span><span class="w"> </span><span class="n">probly</span><span class="o">$</span><span class="n">variable</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"[.]"</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="n">probly</span><span class="o">$</span><span class="n">variable</span><span class="p">)</span><span class="w"> </span><span class="c1"># Reorder probability levels</span><span class="w"> </span><span class="n">probly</span><span class="o">$</span><span class="n">variable</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">probly</span><span class="o">$</span><span class="n">variable</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Chances Are Slight"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Highly Unlikely"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Almost No Chance"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Little Chance"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Probably Not"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Unlikely"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Improbable"</span><span class="p">,</span><span class="w"> </span><span class="s2">"We Doubt"</span><span class="p">,</span><span class="w"> </span><span class="s2">"About Even"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Better Than Even"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Probably"</span><span class="p">,</span><span class="w"> </span><span class="s2">"We Believe"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Likely"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Probable"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Very Good Chance"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Highly Likely"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Almost Certainly"</span><span class="p">))</span><span class="w"> </span></code></pre></div></div> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Plot probability data</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">probly</span><span class="p">,</span><span class="n">aes</span><span class="p">(</span><span class="n">reorder</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">FUN</span><span class="o">=</span><span class="n">median</span><span class="p">),</span><span class="n">value</span><span class="p">))</span><span class="o">+</span><span class="w"> </span><span class="n">geom_boxplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">fill</span><span class="o">=</span><span class="n">variable</span><span class="p">),</span><span class="n">alpha</span><span class="o">=</span><span class="m">.5</span><span class="p">,</span><span class="w"> </span><span class="n">notch</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="o">+</span><span class="w"> </span><span class="n">geom_jitter</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="n">variable</span><span class="p">),</span><span class="n">size</span><span class="o">=</span><span class="m">4</span><span class="p">,</span><span class="n">alpha</span><span class="o">=</span><span class="m">.2</span><span class="p">)</span><span class="o">+</span><span class="w"> </span><span class="n">coord_flip</span><span class="p">()</span><span class="o">+</span><span class="w"> </span><span class="n">guides</span><span class="p">(</span><span class="n">fill</span><span class="o">=</span><span class="kc">FALSE</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span><span class="o">+</span><span class="w"> </span><span class="n">xlab</span><span class="p">(</span><span class="s2">"Sentence"</span><span class="p">)</span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Assigned Probability (%)"</span><span class="p">)</span><span class="o">+</span><span class="w"> </span><span class="n">theme_few</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16</span><span class="p">)</span><span class="o">+</span><span class="w"> </span><span class="n">theme</span><span class="p">(</span><span class="n">panel.border</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">())</span><span class="o">+</span><span class="w"> </span><span class="n">theme</span><span class="p">(</span><span class="n">panel.grid.major.x</span><span class="w"> </span><span class="o">=</span><span class="n">element_line</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="s2">"grey90"</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="m">.25</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">breaks</span><span class="o">=</span><span class="n">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">100</span><span class="p">,</span><span class="m">10</span><span class="p">))</span><span class="w"> </span></code></pre></div></div> <p><img src="/images/2016-12-17-perception_of_probability/perception_boxplots.svg" alt="" class="center-image" /><!-- --></p> <p>From the graph, several things draw the attention. Particularly interesting are the outliers of terms such as:</p> <ul> <li>“Highly likely” at $$15\%$$</li> <li>“We believe” and “Better than even” at $$5\%$$</li> </ul> <p>and, from the opposite side:</p> <ul> <li>“Probably not”, “We doubt”, “Little chance” at $$100\%$$</li> <li>“Highly unlikely”, “Almost no chance” at $$90-95\%$$</li> </ul> <p>An interesting question could be to investigate how and what people’s experiences lead to such out-of-box judgements! The notched boxplots permit to compare groups: if the notches of two boxes do not overlap, this suggests that the medians are significantly different. The notches represents an <a href="http://stats.stackexchange.com/questions/184516/why-is-the-95-ci-for-the-median-supposed-to-be-%C2%B11-57iqr-sqrtn">empirical</a> $$95\%$$ confindence interval on the group median. It is then possible to visually cluster the expressions:</p> <ol> <li>Highly Likely, Almost Certainly</li> <li>Probably, We Believe, Likely, Probable, Very Good Chance</li> <li>About Even, Better Than Even</li> <li>Chances Are Slight, Little Chance, Probably Not, Unlikely, Improbable, We Doubt</li> <li>Almost No Chance, Highly Unlikely</li> </ol> <p>Starting from similar analysis, the book <a href="https://www.amazon.com/Critical-Thinking-Strategic-Intelligence-Katherine/dp/1452226679/ref=sr_1_2?ie=UTF8&amp;qid=1481480362&amp;sr=8-2&amp;keywords=Critical+Thinking+For+Strategic+Intelligence">Critical Thinking For Strategic Intelligence</a> provides some guidelines for analysts who need to translate their findings to the general public, for example:</p> <ul> <li>Try to minimize use of such words as <em>might</em> or <em>could</em> because sentences that contain them usually convey little useful information to the reader</li> <li>The best way to convey a level of likelihood is to follow the probabilistic word, percentage, or bettor’s odd with the word <em>because</em> and a response to complete the sentence that includes a list of key factors that support the judgement</li> <li>A good technique for assessing the soundness of a numeric probability judgement is to check if the percentage of a hypothesis being wrong and the percentage of it being right add to 100</li> <li>The key to presenting levels of confidence is for analysts to state not just how confident they are as analysts, but why they are confident</li> <li>A source summary statement is a powerful tool for giving readers an overall sense of an analyst’s level of confidence and the quality of the sources used to support the analysis before they start reading the paper</li> </ul> <p>In conclusion, by providing the customers with explicit language laying out why a specific word or percentage was selected, they can make their own independent calculations of the probability of event occurring. The analyst, having knowledge of how a specific sentence is interpreted, can refine his language to better convey information.</p> <p>The entire code for this post can be found <a href="https://github.com/MirkoMazzoleni/MirkoMazzoleni.github.io/blob/master/Rmarkdowns/2016-12-17-perception_of_probability.Rmd">here</a>.</p> The three ways of statistical inference 2016-11-28T00:00:00+00:00 http://mirkomazzoleni.github.io/blog/2016/11/28/three_ways_of_inference <p>I recently skipped through the course <a href="https://www.coursera.org/learn/statistical-inferences"><em>Improving your statistical inferences</em></a>, held on the <a href="https://www.coursera.org/">Coursera</a> MOOC platform. The lessons are taught by Daniel Lakens, associate professor at the department of Human-Technology Interaction, Eindhoven University of Technology.</p> <p>I’ve found fascinating his comparison about the different conceptual frameworks of statistical inference that exist nowadays. Considering the following yoga philosophies, the <a href="https://en.wikipedia.org/wiki/Three_Yogas">three paths of realization</a> are:</p> <ul> <li>The <strong>Karma</strong> yoga</li> <li>The <strong>Jnana</strong> yoga</li> <li>The <strong>Bhakti</strong> yoga</li> </ul> <p>These disciplines entail different meanings. Respectively, they are known as:</p> <ul> <li>The path of <strong>Action</strong></li> <li>The path of <strong>Knowledge</strong></li> <li>The path of <strong>Devotion</strong></li> </ul> <p>But what does this have in common with statistical inference? In day-by-day work, the researcher faces three fundamental questions:</p> <ul> <li>What should I <strong>do</strong>?</li> <li>What’s the <strong>relative evidence</strong>?</li> <li>What should I <strong>believe</strong>?</li> </ul> <p>The Path of Action gives us a guideline to act. It searches for rules to govern our behavior such that, in the long run, we will not be wrong too often. This is the hypothesis testing framework. The well known problem with NHST (Null Hypothesis Significance Testing) is that a rule to govern our behavior in the long run, tells us nothing about the current test.</p> <p>The Path of Knowledge is based on actually observed data (which is what matters isn’t it?). Standing on the concept of Likelihoods, the aim here is to compare the likelihood of different hypothesis via the likelihood ratio, given the data (and the chosen model). A discussion on the likelihood approach to inference can be found in the textbook <a href="https://www.amazon.com/Statistical-Evidence-Likelihood-Monographs-Probability/dp/0412044110/ref=sr_1_1?s=books&amp;ie=UTF8&amp;qid=1480364103&amp;sr=1-1&amp;keywords=Statistical+Evidence%3A+A+Likelihood+Paradigm">Statistical Evidence: A Likelihood Paradigm</a> by Richard Royall (1997).</p> <p>We arrived, finally, to the last way: the Path of Devotion. Here, personal experience, intuition and previous knowledge can be leveraged to assess the evidence degrees of belief. Based on likelihoods, it lets us to incorporate prior information on our computations. Bayes factors, the relative evidence for one model to another, is used to compare the different hypothesis. Bayesian inference can be obviously used to perform parameter estimation, specifying in the same way a prior distribution on possible parameters values. The book <a href="https://www.amazon.com/Doing-Bayesian-Data-Analysis-Second/dp/0124058884/ref=sr_1_1?ie=UTF8&amp;qid=1480365777&amp;sr=8-1&amp;keywords=doing+bayesian+data+analysis">Doing Bayesian Data Analysis</a>, by John Kruschke (2014), is an excellent practical guide to get started with modern Bayesian procedures.</p> <p>Concluding, statistical inference frameworks can be thought in light of the three yoga paths:</p> <ul> <li>The path of <strong>Action</strong>: Neyman-Pearson</li> <li>The path of <strong>Knowledge</strong>: Likelihoods</li> <li>The path of <strong>Devotion</strong>: Bayesian statistics</li> </ul> <p>The presented material is borrowed from the course taught by Daniel Lakens. Definitely a must follow course, full of inspiring insights as the presented ones.</p> A Games of Tufte - Part II 2016-09-20T00:00:00+00:00 http://mirkomazzoleni.github.io/blog/2016/09/20/game_of_tufte_part_II <p>The <a href="/blog/2016/08/06/game_of_tutfe/">first part</a> of the exploratory data analysis of the Games of Thrones dataset hosted at <a href="https://www.kaggle.com/mylesoneill/game-of-thrones">Kaggle</a> dealed with data understanding and cleaning. This second post digs further into visualization, unveiling interesting aspect of the battles.</p> <p>The entire code for this post can be found <a href="https://github.com/MirkoMazzoleni/MirkoMazzoleni.github.io/blob/master/Rmarkdowns/2016-08-05-game_of_tutfe.Rmd">here</a>.</p> <h2 id="more-questions">More questions</h2> <p>This section highlights and depicts more questions that arise from the data.</p> <h3 id="major-deaths-and-captures-rate">Major deaths and captures rate</h3> <p>The following plot shows how the major deaths and captures are spread across the battle succession across the years. The major number of them happen in the year $$299$$, were almost each battle generates a major death. In the year $$300$$, it seems that no more major characters die in battle, with only one of them captured. It can bee seen that the majority of deaths and captures happen in the <em>Riverlands</em> and in the <em>North</em>. From the graph emerges that the cumulated number of deaths is always higher than the cumulated number of captures, symptom that the soldiers prefer to kill rather than take prisoners. <img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-39-1.svg" alt="" /><!-- --></p> <h3 id="number-of-battles-won-by-each-commander-when-attacking">Number of battles won by each commander, when attacking</h3> <p>The plots shows the number of won battles by each king, thanks a specific commander, with a different color for each region where the battle was fought. It can be noticed that the Greyjoy’s fought almost in the North, while the Lannister’s and the Stark’s fought all across Westeros. Gregor Clegane (the Mountain) is the commander who won more battles. Folks with no king fought only in the Riverlands. <img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-40-1.svg" alt="" /><!-- --></p> <h3 id="type-of-battles-won--by-each-king-when-attacking">Type of battles won by each king, when attacking</h3> <p>The plots shows the number of won battles by each king, thanks a specific tactic, with a different color for each year in which the battle was fought. We can see that Robb Stark preferred to fight with ambushes, while the Lannister’s with pitched battles since the year $$299$$, and with sieges from the year $$300$$.</p> <p><img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-41-1.svg" alt="" /><!-- --></p> <h3 id="number-of-battles-fought-in-summer-and-winter">Number of battles fought in summer and winter</h3> <p>The plots shows the number of won battles by each king fought in summer (1) or winter (0), with a different color for each year in which the battle was fought. The poor Robb Stark died during the Red Wedding in year $$299$$, so he did not fight any battle during year $$300$$. We can see that winter has finally come in year $$300$$. Brace yourself!!</p> <p><img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-42-1.svg" alt="" /><!-- --></p> <h3 id="king-vs-king">King vs. King</h3> <p>This plots shows the preferred king versus whose each king fought his battles.</p> <p><img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-43-1.svg" alt="" /><!-- --></p> <p>By inspecting the plot, it seems strange that <em>Balon/Euron Greyjoy</em> attack himself. The battle considered is the <em>Sack of Torrhen’s Square</em>. In fact, the defender king would be <em>Bran Stark</em>, since Robb is dead. I make the decision to substitute the defender king with <em>Robb Stark</em> indicating that the defenders are the Stark’s.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> name attacker_king defender_king 13 Sack of Torrhen's Square Balon/Euron Greyjoy Balon/Euron Greyjoy </code></pre></div></div> <p>The correct plot is the following. We can see that Robb Stark fought mainly versus Joffrey Baratheon and viceversa. The troops without a king fought each other. Stannis fought first versus his brother Renly and then versus the Lannister’s. <img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-45-1.svg" alt="" /><!-- --></p> <p>The previous plots do not change except for those of <strong>Question Q4</strong>, for which the answer remain unchanged. The correct plots are reported here.</p> <p><img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-46-1.svg" alt="" /><!-- --></p> <h3 id="battle-success-given-army-forces">Battle success given army forces</h3> <p>The plot seems to suggest that there is a linear relationship between the size of the attacking army and the size of the defending one, maybe due to information that each rival part has about the other. It seems that after a certain attacker size threshold, the loss is more probable. Maybe is difficult to coordinate such many men, versus a numerous defending army. Having a bigger army, therefore, does not mean to win for sure. <img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-47-1.svg" alt="" /><!-- --></p> <h2 id="graph-of-thrones">Graph of Thrones</h2> <p>The following section (which owes a lot to the Kaggle’s user <a href="https://www.kaggle.com/colinfraser/d/mylesoneill/game-of-thrones/battles-investigation">ColinFraser</a>), deals with the use of graph algorithms to analyse the social relations between houses, during the battles. By using the variables <em>attacker_i</em> and <em>defender_i</em>, with $$i=1\dots 4$$, is possible to build a graph which vertices are the house names, and an edge is present between two of them if a battle has been fought between the two houses.</p> <p>The direction of the edge is from a loser to a winner, and a darker edge color show how many battles are present with that edge direction (that is, with that battle outcome): the darker, the higher. It can be seen that the Frey’s fought with many houses and won many battles, and how the Nigth’s Watch are isolated from the rest of the world. Interestingly, even the Tyrell’s fought only versus the Greyjoy’s. Both the Tully’s and the Stark’s won more battles versus the Lannister’s with respect to how many they lost versus them, but for the latter house, they are behind the Greyjoy as direct matches won. Interesting is the loop around house Baratheon, which represent the figths of Stannis versus his brother Renly.</p> <p><img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-49-1.svg" alt="" /><!-- --></p> <h3 id="houses-which-won-most">Houses which won most</h3> <p>By counting the in-degree (number of battles won) and the total degee of a node (number of battle fought), is possible to compute how the house performed in terms of efficiency. Since the number of battles for each house is small, we employ the <em>Laplace correction</em> to correct for the small sample size:</p> $p(win)=\frac{n+1}{n+m+2}$ <p>where $$n$$ is the number of won battles, $$m$$ is the number of lost battles and $$p(win)$$ is the probability to win a battle.</p> <p><img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-53-1.svg" alt="" /><!-- --></p> <h3 id="most-powerful-houses">Most powerful houses</h3> <p>By using the <strong>PageRank</strong> algorithm, it is possible to assign a value to each node in the graph. In the case of Google, each node is a document, a web page, and the edges are links between pages. If a page has an incoming link from an important page, that link carries more value. In our context, we suppose this mean that win against the Lannister’s, for example, carries more value than to win against the Glover’s, and the algorithm is able to exploit this based on the graph structure of won and lost battles (again, incoming and outcoming edges). From Wikipedia:</p> <blockquote> <p>The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page</p> </blockquote> <p>Adapting the reasoning to our case, we can think of it as in terms of houses and battle outcomes. By walking through the Markov network following edge direction (which is from loser to winner), we discover the stationary probability distribution of the Markov chain. This represent the probability that, starting from a house at random and moving in the direction of battles outcome, after a while we expect to celebrate the victory of a house for that invariant amount of time.</p> <p>The results show that the house Frey is the most powerful, and the $$11.7%$$ of times we expect to come to House Frey having a party for their victory.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Frey Lannister 0.11706463 0.10476410 Bolton Baratheon 0.09134930 0.08013144 Night's Watch Greyjoy 0.06422775 0.04357728 Stark Brotherhood without Banners 0.04331037 0.04023057 Bracken Tully 0.04023057 0.03713693 Darry Karstark 0.03713693 0.03440543 Mormont Glover 0.03440543 0.03440543 Brave Companions Mallister 0.02823198 0.02823198 Free folk Tyrell 0.02823198 0.02823198 Blackwood Thenns 0.02823198 0.02823198 Giants 0.02823198 </code></pre></div></div> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># It has to output 1, being a probability distribution</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">pr</span><span class="o">$</span><span class="n">vector</span><span class="p">)</span><span class="w"> </span><span class="m">1</span><span class="w"> </span></code></pre></div></div> <h1 id="acknowledgements">Acknowledgements</h1> <ul> <li><a href="https://www.kaggle.com/sergeycherkasov/d/mylesoneill/game-of-thrones/test-test">Some analysis of Game of Throne data</a> - Sergey Cherkasov</li> <li><a href="https://www.kaggle.com/shaildeliwala/d/mylesoneill/game-of-thrones/exploratory-analysis-and-predictions">Exploratory Analysis and Predictions</a> - Shail Deliwala</li> <li><a href="https://www.kaggle.com/gowrishankarin/d/mylesoneill/game-of-thrones/analysis-on-battles">Systematic Analysis on GoT Battles</a> - Gowri Shankar</li> <li><a href="https://www.kaggle.com/colinfraser/d/mylesoneill/game-of-thrones/battles-investigation">Battles investigation</a> - ColinFraser</li> <li><a href="http://motioninsocial.com/tufte/">Tufte in R</a> - Lukasz Piwek</li> </ul> A Games of Tufte - Part I 2016-08-06T00:00:00+00:00 http://mirkomazzoleni.github.io/blog/2016/08/06/game_of_tutfe <!-- image --> <p>This report concerns the first part of an exploratory data analysis based on the Games of Thrones dataset hosted on <a href="https://www.kaggle.com/mylesoneill/game-of-thrones">Kaggle</a>. The aim of this work is to familiarize with the data for subsequent analysis, and using the Tufte design rules to represent the plots. During the process, personal domain knowledge (acquired from books and not the tv series) is used to motivate hypothesis and decisions. Since there aren’t motivations or questions that brought me to collect data, in order to answer to them, we let the Exploratory Data Analysis phase to generate questions for us. A sound answer to those questions would require at least another dataset, so we let to fix in mind the fact that we are simply describing the dataset at hand, without the temptation to make inferences or other types of final statements.</p> <p>The entire code for this post can be found <a href="https://github.com/MirkoMazzoleni/MirkoMazzoleni.github.io/blob/master/Rmarkdowns/2016-08-05-game_of_tutfe.Rmd">here</a>.</p> <h2 id="data-cleaning-and-questions-generation">Data cleaning and questions generation</h2> <p>After having load the required libraries and the dataset, which contains information about the main battles in the reign of Westeros during the <a href="ttp://awoiaf.westeros.org/index.php/War_of_the_Five_Kings">War of the Five Kings</a>, let’s first take an overview of the dataset at hand by checking the variables at our disposal. We can see $$38$$ observations for each of the $$25$$ variables. The next step will be to gain confidence with the features and the values they can take.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 'data.frame': 38 obs. of 25 variables:$ name : Factor w/ 38 levels "Battle at the Mummer's Ford",..: 13 1 7 14 18 10 25 5 3 17 ... $year : int 298 298 298 298 298 298 298 299 299 299 ...$ battle_number : int 1 2 3 4 5 6 7 8 9 10 ... $attacker_king : Factor w/ 5 levels "","Balon/Euron Greyjoy",..: 3 3 3 4 4 4 3 2 2 2 ...$ defender_king : Factor w/ 7 levels "","Balon/Euron Greyjoy",..: 6 6 6 3 3 3 6 6 6 6 ... $attacker_1 : Factor w/ 11 levels "Baratheon","Bolton",..: 10 10 10 11 11 11 10 9 9 9 ...$ attacker_2 : Factor w/ 8 levels "","Bolton","Frey",..: 1 1 1 1 8 8 1 1 1 1 ... $attacker_3 : Factor w/ 3 levels "","Giants","Mormont": 1 1 1 1 1 1 1 1 1 1 ...$ attacker_4 : Factor w/ 2 levels "","Glover": 1 1 1 1 1 1 1 1 1 1 ... $defender_1 : Factor w/ 13 levels "","Baratheon",..: 12 2 12 8 8 8 6 11 11 11 ...$ defender_2 : Factor w/ 3 levels "","Baratheon",..: 1 1 1 1 1 1 1 1 1 1 ... $defender_3 : logi NA NA NA NA NA NA ...$ defender_4 : logi NA NA NA NA NA NA ... $attacker_outcome : Factor w/ 3 levels "","loss","win": 3 3 3 2 3 3 3 3 3 3 ...$ battle_type : Factor w/ 5 levels "","ambush","pitched battle",..: 3 2 3 3 2 2 3 3 5 2 ... $major_death : int 1 1 0 1 1 0 0 0 0 0 ...$ major_capture : int 0 0 1 1 1 0 0 0 0 0 ... $attacker_size : int 15000 NA 15000 18000 1875 6000 NA NA 1000 264 ...$ defender_size : int 4000 120 10000 20000 6000 12625 NA NA NA NA ... $attacker_commander: Factor w/ 32 levels "","Asha Greyjoy",..: 8 6 9 22 16 18 6 30 2 28 ...$ defender_commander: Factor w/ 29 levels "","Amory Lorch",..: 7 4 10 28 12 14 15 1 1 1 ... $summer : int 1 1 1 1 1 1 1 1 1 1 ...$ location : Factor w/ 28 levels "","Castle Black",..: 8 13 17 9 27 17 4 12 5 23 ... $region : Factor w/ 7 levels "Beyond the Wall",..: 7 5 5 5 5 5 5 3 3 3 ...$ note : Factor w/ 6 levels "","Greyjoy's troop number based on the Battle of Deepwood Motte, in which Asha had 1000 soldier on 30 longships. That comes out to"| __truncated__,..: 1 1 1 1 1 1 1 1 1 2 ... </code></pre></div></div> <h3 id="question-expectation-and-answers">Question, Expectation and Answers</h3> <p>In this section, we will follow the following logical framework:</p> <ol> <li>Letting the exploration to generate the questions</li> <li>Setting an expectation for what the data will tell us</li> <li>Revise or confirm the expectation in the light of analysed data</li> </ol> <p>Being the dataset composed most by categorical variables, we want to know first the discrete values that these variables can take, and see if any question arises.</p> <h3 id="categorical-data">Categorical Data</h3> <p>This section deals with the understanding and cleaning of categorical variables in the dataset.</p> <h4 id="attacker-king">Attacker King</h4> <p>The variable represents the attacker’s king. A slash indicates that the king changes over the course of the war. The levels of this variable are:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"" "Balon/Euron Greyjoy" "Joffrey/Tommen Baratheon" "Robb Stark" "Stannis Baratheon" </code></pre></div></div> <p>The fact, in some circumstances, that there isn’t an attacking king is not an error: simply can be that there is no attacking king commanding the troops. In this case throwing away missing data can be detrimental, as they can be source of information. For example, a value of “ “ can mean “unknown” or “not applicable”, so it should be encoded in that way. Usually I search for trend in missing data to see if they miss for a reason.</p> <p><strong>Question Q1</strong>: Does the “ “ level mean something? <strong>Expectation E1</strong>: Yes, simply there’s no king. <strong>Answer A1</strong>: The “ “ stands for “NoKing”.</p> <ul> <li>Check the battle names where there is no attacker king and possible notes on that battles:</li> </ul> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> name note 23 Battle of the Burning Septry 30 Sack of Saltpans </code></pre></div></div> <ul> <li>Is the missing king due to the battle type? ==&gt; NO, usually pitched battles have a king which guides the army, and troops without a king are expected to act as bandits, preferring a razing or ambush strategy:</li> </ul> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> FALSE TRUE 1 0 ambush 10 0 pitched battle 13 1 razing 1 1 siege 11 0 </code></pre></div></div> <ul> <li>Is the missing king due to the attacker? ==&gt; YES, the Brave Companions and the Brotherhood don’t have a king:</li> </ul> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> FALSE TRUE Baratheon 6 0 Bolton 2 0 Bracken 1 0 Brave Companions 0 1 Brotherhood without Banners 0 1 Darry 1 0 Free folk 1 0 Frey 2 0 Greyjoy 7 0 Lannister 8 0 Stark 8 0 </code></pre></div></div> <ul> <li>Is the previous conclusion the right one? Let’s Check for other associations. Seems that the Riverlands are land of no-one, and bandits or sellswords prefer to fight in that area!</li> </ul> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> FALSE TRUE Beyond the Wall 1 0 The Crownlands 2 0 The North 10 0 The Reach 2 0 The Riverlands 15 2 The Stormlands 3 0 The Westerlands 3 0 </code></pre></div></div> <p>The most plausible answer is that the missing value is related to fighters which don’t have a king. We can then fill the missing value with a new one, the “NoKing” value in this case.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_king</span><span class="p">)[</span><span class="n">match</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_king</span><span class="p">))]</span><span class="o">=</span><span class="s2">"NoKing"</span><span class="w"> </span></code></pre></div></div> <p><strong>Question Q2</strong>: Who is the king who attacked more? <strong>Expectation E2</strong>: Joffrey/Tommen Baratheon, being Joffrey the most sadistic character. <strong>Answer A2</strong>: Joffrey/Tommen Baratheon</p> <p><img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-9-1.svg" alt="" /><!-- --></p> <h4 id="defender-king">Defender King</h4> <p>This variable represents the defender’s king. The levels are:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"" "Balon/Euron Greyjoy" "Joffrey/Tommen Baratheon" "Mance Rayder" "Renly Baratheon" "Robb Stark" "Stannis Baratheon" </code></pre></div></div> <p><strong>Question Q3</strong>: Does the “ “ level mean something? <strong>Expectation E3</strong>: Yes, simply there’s no king. <strong>Answer A3</strong>: No king.</p> <p>From the data we can see that there are 3 battles without a defending king.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> name note 23 Battle of the Burning Septry 25 Retaking of Harrenhal 30 Sack of Saltpans </code></pre></div></div> <p>Digging a little deeper shows that two of them were against the brave Companion which don’t have a king, and the remaining one is a razing, that is, an attack against an undefended position:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> ambush pitched battle razing siege 0 0 0 1 0 Baratheon 0 0 0 0 0 Blackwood 0 0 0 0 0 Bolton 0 0 0 0 0 Brave Companions 0 0 2 0 0 Darry 0 0 0 0 0 Greyjoy 0 0 0 0 0 Lannister 0 0 0 0 0 Mallister 0 0 0 0 0 Night's Watch 0 0 0 0 0 Stark 0 0 0 0 0 Tully 0 0 0 0 0 Tyrell 0 0 0 0 0 </code></pre></div></div> <p>I feel confident to label the “ “ value to the “NoKing” one.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">defender_king</span><span class="p">)[</span><span class="n">match</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">defender_king</span><span class="p">))]</span><span class="o">=</span><span class="s2">"NoKing"</span><span class="w"> </span></code></pre></div></div> <p><strong>Question Q4</strong>: Whose king undergone more attacks? <strong>Expectation E4</strong>: Robb Stark, it seems that everyone wants the North. <strong>Answer A4</strong>: Robb Stark.</p> <p><img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-13-1.svg" alt="" /><!-- --></p> <p><strong>Observation O1</strong>: From the previous plots, it seems that both Renly Baratheon and Mance Rayder did not have the time, or the will, to perform any attack: they only defended their position.</p> <h4 id="attackers">Attackers</h4> <p>These variables indicates the major houses attacking.</p> <p>Main attackers:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> "Baratheon" "Bolton" "Bracken" "Brave Companions" "Brotherhood without Banners" "Darry" "Free folk" "Frey" "Greyjoy" "Lannister" "Stark" </code></pre></div></div> <p>Second attackers:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"" "Bolton" "Frey" "Greyjoy" "Karstark" "Lannister" "Thenns" "Tully" </code></pre></div></div> <p>Third attackers:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"" "Giants" "Mormont" </code></pre></div></div> <p>Fourth attackers:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"" "Glover" </code></pre></div></div> <p>It can be seen that even in the attackers levels, except for the variable <em>attacker_1</em>, there are missing information represented as an empty string. Intuitively, this represent the fact that there aren’t the same number of allies for every battles. We can encode the missing values as “NonPresent”, indicating that an attacker is not present for that battle.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_2</span><span class="p">)[</span><span class="n">match</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_2</span><span class="p">))]</span><span class="o">=</span><span class="s2">"NotPresent"</span><span class="w"> </span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_3</span><span class="p">)[</span><span class="n">match</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_3</span><span class="p">))]</span><span class="o">=</span><span class="s2">"NotPresent"</span><span class="w"> </span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_4</span><span class="p">)[</span><span class="n">match</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_4</span><span class="p">))]</span><span class="o">=</span><span class="s2">"NotPresent"</span><span class="w"> </span></code></pre></div></div> <p>Another observation is that the House Glover is present all the times that 4 attackers participated in a battle.</p> <p><strong>Question Q5</strong>: What are the battle with the maximum number of attackers? <strong>Expectation E5</strong>: Probably the battles to conquer the North. <strong>Answer A5</strong>: The battles were those carried on by Stannis Baratheon to free the North from the Greyjoy’s and Winterfell from Bolton’s:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> name attacker_king defender_king attacker_outcome region 31 Retaking of Deepwood Motte Stannis Baratheon Balon/Euron Greyjoy win The North 38 Siege of Winterfell Stannis Baratheon Joffrey/Tommen Baratheon The North </code></pre></div></div> <p><strong>Observation O2</strong>:This last battle does not have an outcome, because in the book we only know a letter send to Jon Snow by Ramsey Bolton which tells him that Stannis died, but we are not sure of the letter trustfulness.</p> <p><strong>Question Q6</strong>: There are other cases with missing <em>attacker_outcome</em>? <strong>Expectation E6</strong>: Probably not, since that is the last battle of the books until now. <strong>Answer A6</strong>: No, that battle is the only one. We can set the missing value to “unknown”:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> name attacker_king defender_king region 38 Siege of Winterfell Stannis Baratheon Joffrey/Tommen Baratheon The North </code></pre></div></div> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_outcome</span><span class="p">)[</span><span class="n">match</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_outcome</span><span class="p">))]</span><span class="o">=</span><span class="s2">"unknown"</span><span class="w"> </span></code></pre></div></div> <h4 id="defenders">Defenders</h4> <p>This variable indicates the major houses defending.</p> <p>Main defenders:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"" "Baratheon" "Blackwood" "Bolton" "Brave Companions" "Darry" "Greyjoy" "Lannister" "Mallister" "Night's Watch" "Stark" "Tully" "Tyrell" </code></pre></div></div> <p>Second defenders:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"" "Baratheon" "Frey" </code></pre></div></div> <p>Third defenders:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> NULL </code></pre></div></div> <p>Fourth defenders:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> NULL </code></pre></div></div> <p>It can be seen that even in the defenders levels there are missing information represented as an empty string. As with the attackers, we can encode the missing values as “NonPresent”, indicating that a defender is not present for that battle.</p> <p><strong>Observation O3</strong>: It can be further noticed that nobody defended with more than one ally, and the <em>defender_3</em> and <em>defender_4</em> columns can be removed from the dataset.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">defender_2</span><span class="p">)[</span><span class="n">match</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">defender_2</span><span class="p">))]</span><span class="o">=</span><span class="s2">"NotPresent"</span><span class="w"> </span><span class="n">battles</span><span class="o">$</span><span class="n">defender_3</span><span class="o">=</span><span class="kc">NULL</span><span class="w"> </span><span class="n">battles</span><span class="o">$</span><span class="n">defender_4</span><span class="o">=</span><span class="kc">NULL</span><span class="w"> </span></code></pre></div></div> <p><strong>Question Q7</strong>: What does it mean a “ “ value on the variable <em>defender_1</em>? <strong>Expectation E7</strong>: A battle was fought without defenders, and probably was a razing. <strong>Answer A7</strong>: The battle was indeed a razing and there were nor attackers neither defender kings. We can set the missing value to the “NotPresent” one:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> name attacker_king attacker_1 defender_king battle_type 30 Sack of Saltpans NoKing Brave Companions NoKing razing </code></pre></div></div> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">defender_1</span><span class="p">)[</span><span class="n">match</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">defender_1</span><span class="p">))]</span><span class="o">=</span><span class="s2">"NotPresent"</span><span class="w"> </span></code></pre></div></div> <h4 id="attacker-outcomes">Attacker outcomes</h4> <p>This variable indicates the outcome from the perspective of the attacker. Categories: win, loss, draw.</p> <p><strong>Question Q8</strong>: What are the possible outcomes? <strong>Expectation E8</strong>: From the codebook, the possible values are “draw”, “win”, “loss”.<strong>Answer A8</strong>: The values are under the expectations but no battle ended with a “draw”:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"unknown" "loss" "win" </code></pre></div></div> <h4 id="battle-types">Battle types</h4> <p>A classification of the battle’s primary type. Categories:</p> <ul> <li>Pitched_battle: armies meet in a location and fight.</li> <li>Ambush: a battle where stealth or subterfuge was the primary means of attack.</li> <li>Siege: a prolonged of a forties position.</li> <li>Razing: an attack against an undefended position</li> </ul> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"" "ambush" "pitched battle" "razing" "siege" </code></pre></div></div> <p><strong>Question Q9</strong>: What does it mean the value “ “ on the variable <em>battle_type</em>? <strong>Expectation E9</strong>: Probably an unknown battle type. <strong>Answer A9</strong>: The value is not indicated because is unknown how the battle went and its outcome, being the battle the Siege of Winterfell by Stannis Baratheon:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> name attacker_king attacker_outcome defender_king battle_type 38 Siege of Winterfell Stannis Baratheon unknown Joffrey/Tommen Baratheon </code></pre></div></div> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">battle_type</span><span class="p">)[</span><span class="n">match</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">battle_type</span><span class="p">))]</span><span class="o">=</span><span class="s2">"unknown"</span><span class="w"> </span></code></pre></div></div> <h4 id="attacker-commander">Attacker commander</h4> <p>Major commanders of the attackers. Commander’s names are included without honorific titles and commanders are separated by commas. Since there are many commanders, only the first are reported:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"" "Asha Greyjoy" "Dagmer Cleftjaw" "Daven Lannister, Ryman Fey, Jaime Lannister" "Euron Greyjoy, Victarion Greyjoy" "Gregor Clegane" </code></pre></div></div> <p><strong>Question Q10</strong>: What does it mean the value “ “ on the variable <em>attacker_commander</em>? <strong>Expectation E10</strong>: Probably a missing or unknown commander. <strong>Answer A10</strong>: The value is not indicated because there wasn’t a commander, being a battle led by the Brotherhood without Banners. We can set the missing value to a “NotPresent” one:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> name attacker_king attacker_1 defender_king battle_type 23 Battle of the Burning Septry NoKing Brotherhood without Banners NoKing pitched battle </code></pre></div></div> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_commander</span><span class="p">)[</span><span class="n">match</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">attacker_commander</span><span class="p">))]</span><span class="o">=</span><span class="s2">"NotPresent"</span><span class="w"> </span></code></pre></div></div> <h4 id="defender-commander">Defender commander</h4> <p>Major commanders of the defenders. Commander’s names are included without honoric titles and commanders are separated by commas. Since there are many commanders, only the first one are reported:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"" "Amory Lorch" "Asha Greyjoy" "Beric Dondarrion" "Bran Stark" "Brynden Tully" </code></pre></div></div> <p><strong>Question Q11</strong>: What does it mean the value “ “ on the variable <em>defender_commander</em>? <strong>Expectation E11</strong>: Probably a missing or unknown commander. <strong>Answer A11</strong>: The value is not indicated because there wasn’t a commander, or it was unknown. In the battles where there is “NoKing” as <em>defender_king</em>, we can assume that the a <em>defender_commander</em> was not present. In the rest of the battles, which most of them are led by the Greyjoy’s, probably there was a <em>defender_commander</em> but is not indicated, and thus is unknown:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> name attacker_king defender_king battle_type 8 Battle of Moat Cailin Balon/Euron Greyjoy Robb Stark pitched battle 9 Battle of Deepwood Motte Balon/Euron Greyjoy Robb Stark siege 10 Battle of the Stony Shore Balon/Euron Greyjoy Robb Stark ambush 13 Sack of Torrhen's Square Balon/Euron Greyjoy Balon/Euron Greyjoy siege 21 Siege of Darry Robb Stark Joffrey/Tommen Baratheon siege 23 Battle of the Burning Septry NoKing NoKing pitched battle 29 Fall of Moat Cailin Joffrey/Tommen Baratheon Balon/Euron Greyjoy siege 30 Sack of Saltpans NoKing NoKing razing 32 Battle of the Shield Islands Balon/Euron Greyjoy Joffrey/Tommen Baratheon pitched battle 33 Invasion of Ryamsport, Balon/Euron Greyjoy Joffrey/Tommen Baratheon razing Vinetown, and Starfish Harbor </code></pre></div></div> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">defender_commander</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">defender_commander</span><span class="p">),</span><span class="w"> </span><span class="s2">"NotPresent"</span><span class="p">,</span><span class="s2">"unknown"</span><span class="p">)</span><span class="w"> </span><span class="n">battles</span><span class="p">[</span><span class="n">battles</span><span class="o">$</span><span class="n">defender_king</span><span class="o">==</span><span class="s2">"NoKing"</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">battles</span><span class="o">$</span><span class="n">defender_commander</span><span class="o">==</span><span class="s2">""</span><span class="p">,</span><span class="s2">"defender_commander"</span><span class="p">]</span><span class="o">=</span><span class="s2">"NotPresent"</span><span class="w"> </span><span class="n">battles</span><span class="p">[</span><span class="n">battles</span><span class="o">$</span><span class="n">defender_commander</span><span class="o">==</span><span class="s2">""</span><span class="p">,</span><span class="s2">"defender_commander"</span><span class="p">]</span><span class="o">=</span><span class="s2">"unknown"</span><span class="w"> </span><span class="n">battles</span><span class="o">$</span><span class="n">defender_commander</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">droplevels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">defender_commander</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <h4 id="battles-locations">Battles locations</h4> <p>This variable represents the battle location. Levels are:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"" "Castle Black" "Crag" "Darry" "Deepwood Motte" "Dragonstone" </code></pre></div></div> <p><strong>Question Q12</strong>: What does it mean the value “ “ on the variable <em>location</em>? <strong>Expectation E12</strong>: Probably a missing or unknown location <strong>Answer A12</strong>: The location <a href="http://awoiaf.westeros.org/index.php/Battle_at_the_burning_septry">is not known</a>:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> name attacker_king defender_king 23 Battle of the Burning Septry NoKing NoKing </code></pre></div></div> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">$</span><span class="n">location</span><span class="p">)[</span><span class="n">match</span><span class="p">(</span><span class="s2">""</span><span class="p">,</span><span class="n">levels</span><span class="p">(</span><span class="n">battles</span><span class="o">\$</span><span class="n">location</span><span class="p">))]</span><span class="o">=</span><span class="s2">"unknown"</span><span class="w"> </span></code></pre></div></div> <h4 id="battle-regions">Battle regions</h4> <p>The region where the battle takes place. Categories: Beyond the Wall, The North, The Iron Islands, The Riverlands, The Vale of Arryn, The Westerlands, The Crownlands, The Reach, The Stormlands, Dorne</p> <p><strong>Question Q13</strong>: What are the values assume med by the variable? <strong>Expectation E13</strong>: The values assumed by the variable are those described in the codebook. <strong>Answer A13</strong>: The answer meets the expectation, except for the regions “The Iron Islands”, “The Vale of Arryn” and “Dorne”, probably because no battle were fought in those regions:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"Beyond the Wall" "The Crownlands" "The North" "The Reach" "The Riverlands" "The Stormlands" "The Westerlands" </code></pre></div></div> <h3 id="numerical-data">Numerical Data</h3> <p>This section deals with the understanding and cleaning of numerical variables in the dataset.</p> <h4 id="year">Year</h4> <p>The year of the battle. We convert it to a factor variable for convenience and representation, since it assumes only $$3$$ different values.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Min. 1st Qu. Median Mean 3rd Qu. Max. 298.0 299.0 299.0 299.1 300.0 300.0 </code></pre></div></div> <h4 id="attacker-size">Attacker size</h4> <p>The size of the attacker’s force. No distinction is made between the types of soldiers such as cavalry and footmen:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 20 1375 4000 9943 8250 100000 14 </code></pre></div></div> <p>From the summary we can see that the distribution of the attacker army has a mean of about $$10000$$ soldiers , but is very scattered with many missing numbers. Particularly impressing is maximum number of $$100000$$ men.</p> <p><strong>Question Q14</strong>: Which is the battle with $$100000$$ men? <strong>Expectation E14</strong>: A battle in the North with the wildlings. <strong>Answer A14</strong>: The battle was the assault of Castle Black by the wildlings and free folk, when Jon Snow loses Igritte. We can see that there is an error in the data, because we know that Stannis Baratheon was on the Night’s side, defending the Nigth’s Watch and seizing Mance Rayder. Furthermore, Stannis won the battle and Mance Rayder lost it, thus the <em>attacker_king</em> and <em>defender_king</em> variables should be swapped. The number of $$100000$$ is more meaningful now if we think of it as the army of all the freefolks, as it is also reported <a href="http://awoiaf.westeros.org/index.php/Battle_of_Castle_Black">here</a>:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> name attacker_king defender_king attacker_1 defender_1 attacker_outcome 28 Battle of Castle Black Stannis Baratheon Mance Rayder Free folk Night's Watch loss </code></pre></div></div> <p><strong>Question Q15</strong>: When attacking, which battle type required more men? <strong>Expectation E15</strong>: Probably the <em>pitched battle</em> type, since it requires more men than ambush or a siege, which require more discretion and tools (trebuchets, rams) capability respectively. <strong>Answer 15</strong>: The <em>pitched battle</em> has the higher median (about $$10000$$ troops) and it is very skewed around this value, and only $$25%$$ of values are lower than $$3000$$ troops. Perhaps surprising, the median of the <em>ambush</em> distribution is similar to that of <em>siege</em>, being the latter more concentrated, indicating that there some standard number of troops to do a siege. Here we have isolated cases of ambushes with less than $$30$$ men, and a siege with $$100000$$ men (the Mance Rayder attack to Castle Black). We Do not consider “unknown” o “razing” battles since they have few or none observations. <img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-35-1.svg" alt="" /><!-- --></p> <p><strong>Question Q16</strong>: When attacking, which king had the most numerous army? <strong>Expectation E16</strong>: We already know that Mance Rayder commanded $$100000$$ men. <strong>Answer 16</strong>: Mance had the most numerous attacking army, but he attacked only one time, so it is more interesting to considered the other kings. I made the choice to exclude from the comparison also the “NoKing” category, since it has few observations. We can see from the plot that the Greyjoy’s had the smallest army, ranging from $$10$$ to $$1000$$ men. The Lannister’s and the Stark’s show a high median value army, but that also had great variation in its forces, mainly for Robb Stark. This can be probably due to his attitude to perform ambush attacks with few men. Stannis Baratheon forces undergo few losses, having a quite concentrated distribution. <img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-36-1.svg" alt="" /><!-- --></p> <h4 id="defender-size">Defender size</h4> <p>The size of the defender’s force. No distinction is made between the types of soldiers such as cavalry and footmen.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 100 1070 6000 6428 10000 20000 19 </code></pre></div></div> <p><strong>Question Q17</strong>: When attacking, which king had the most numerous army? <strong>Expectation E17</strong>:Probably the Lannister’s or Stannis Baratheon. <strong>Answer 17</strong>: Apart from the “Noking” and “Renly Baratheon”, which have few observation, the plots show that the Lannister’s defended their position with more troops than the other, and Stannis Baratheon, despite attacking with an high number of troops, defended with very few ones. <img src="/images/2016-08-06-games_of_tufte/unnamed-chunk-38-1.svg" alt="" /><!-- --></p> <h1 id="acknowledgements">Acknowledgements</h1> <ul> <li><a href="https://www.kaggle.com/sergeycherkasov/d/mylesoneill/game-of-thrones/test-test">Some analysis of Game of Throne data</a> - Sergey Cherkasov</li> <li><a href="https://www.kaggle.com/shaildeliwala/d/mylesoneill/game-of-thrones/exploratory-analysis-and-predictions">Exploratory Analysis and Predictions</a> - Shail Deliwala</li> <li><a href="https://www.kaggle.com/gowrishankarin/d/mylesoneill/game-of-thrones/analysis-on-battles">Systematic Analysis on GoT Battles</a> - Gowri Shankar</li> <li><a href="http://motioninsocial.com/tufte/">Tufte in R</a> - Lukasz Piwek</li> </ul> Modeling and identification of an Electro-Hydraulic Actuator 2016-06-01T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2016/06/01/IEEE_ICCA_Hydraulic <h3 id="abstract">Abstract</h3> <p>In this work, a physical non-linear model of an Electro-Hydraulic Actuator has been developed. Each system component (valves, pipes, cylinders) and their interactions have been modeled by means of conservation and constitutive laws. The actuator dynamics, with lumped-parameter element models, have been treated accurately, with special attention given to modeling friction. Finally, a global parametric-identification procedure has been performed for all unknown parameters. Throughout this paper all the modeling assumptions and results from system identification are verified with experimental data. [<strong><a href="http://cal.unibg.it/wp-content/uploads/2019/01/2016-IEEE-ICCA-Holmes-Load-system-modeling_copyright.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> A. L. Cologni, M. Mazzoleni and F. Previdi, "Modeling and identification of an Electro-Hydraulic Actuator", <strong>12th IEEE International Conference on Control and Automation (ICCA)</strong>, Kathmandu, Nepal, 2016, <a href="https://doi.org/10.1109/ICCA.2016.7505299"> doi: 10.1109/ICCA.2016.7505299 </a>, ISBN: 978-1-5090-1738-6, pp. 335-340.</blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@INPROCEEDINGS{7505299, author={A. L. Cologni and M. Mazzoleni and F. Previdi}, booktitle={2016 12th IEEE International Conference on Control and Automation (ICCA)}, title={Modeling and identification of an Electro-Hydraulic Actuator}, year={2016}, volume={}, number={}, pages={335-340}, doi={10.1109/ICCA.2016.7505299}, ISSN={}, ISBN={978-1-5090-1738-6}, month={June}, } </code></pre></div></div> A Comparison of Classification Algorithms for Brain Computer Interface in Drug Craving Treatment 2015-08-20T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2015/08/20/IFAC_BMS_BCI <h3 id="abstract">Abstract</h3> <p>In this paper, the use of Brain Computer Interfaces (BCIs) is proposed as a means to recover patients from craving diseases, with the aim of a clinical protocol. In order to understanding the BCI messages, a classification algorithm based on logistic regression has been developed. The choice was dictated by a comparison with other known classification techniques of different reasoning type, highlighting the pros and cons of them. Finally, a result regarding the brain areas which are more involved during the activity is reported. [<strong><a href="http://cal.unibg.it/cal/wp-content/uploads/papers/2015-IFAC-BMS-BCI.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni and F. Previdi, "A Comparison of Classification Algorithms for Brain Computer Interface in Drug Craving Treatment", <strong> 9th IFAC Symposium on Biological and Medical Systems (BMS) </strong>, Berlin, Germany, 2015, <a href="https://doi.org/10.1016/j.ifacol.2015.10.188"> doi: 10.1016/j.ifacol.2015.10.188 </a>, ISSN: 2405-8963, pp. 487-492. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{MAZZOLENI2015487, title = "A Comparison of Classification Algorithms for Brain Computer Interface in Drug Craving Treatment", journal = "9th IFAC Symposium on Biological and Medical Systems BMS 2015", volume = "48", number = "20", pages = "487 - 492", year = "2015", issn = "2405-8963", doi = "https://doi.org/10.1016/j.ifacol.2015.10.188", author = "M. Mazzoleni and F. Previdi", } </code></pre></div></div> Fault Detection via modified Principal Direction Divisive Partitioning and application to aerospace electro-mechanical actuators 2014-12-15T00:00:00+00:00 http://mirkomazzoleni.github.io/conference/2014/12/15/IEEE_CDC_PDDP <h3 id="abstract">Abstract</h3> <p>In this paper, the use of the Principal Direction Divisive Partitioning (PDDP) method for unsupervised learning is discussed and analyzed with a focus on fault detection applications. Specifically, a geometric limit of the standard algorithm is highlighted by means of a simulation example and a modified version of PDDP is introduced. Such a method is shown to correcly perform data clustering also when the standard algorithm fails. The modified strategy is based on the use of a Chi-squared statistical test and offers more guarantees in terms of detection of a wrong functioning of the system. The proposed algorithm is finally experimentally tested on a fault detection application for aerospace electro-mechanical actuators, for which a comparison with k-means and fuzzy k-means approaches is also provided. [<strong><a href="http://cal.unibg.it/wp-content/uploads/2019/02/2014-IEEE-CDC-FDI-PDDP_copyright.pdf">Paper</a></strong>, <strong>Code</strong>]</p> <h4 id="reference">Reference</h4> <blockquote> M. Mazzoleni, S. Formentin, F. Previdi and S. M. Savaresi, "Fault Detection via modified Principal Direction Divisive Partitioning and application to aerospace electro-mechanical actuatorss <strong>53rd IEEE Conference on Decision and Control (CDC)</strong>, Los Angeles, CA, USA, 2014, <a href="https://doi.org/10.1109/CDC.2014.7040292"> doi:10.1109/CDC.2014.7040292 </a>, ISBN: 978-1-4673-6090-6, pp. 5770-5775. </blockquote> <h4 id="bibtex">Bibtex</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@INPROCEEDINGS{7040292, author={M. Mazzoleni and S. Formentin and F. Previdi and S. M. Savaresi}, booktitle={53rd IEEE Conference on Decision and Control}, title={Fault Detection via modified Principal Direction Divisive Partitioning and application to aerospace electro-mechanical actuators}, year={2014}, volume={}, number={}, pages={5770-5775}, doi={10.1109/CDC.2014.7040292}, ISSN={0191-2216}, ISBN={978-1-4673-6090-6}, month={Dec}, } </code></pre></div></div>