J'ai du mal à créer un bloc de données avec les données constituées suivantes dans un fichier texte, bien que le texte libre soit le même. Je sais que c'est vraiment compliqué ...

<Annotations><Annotation LineColor="255" Name="Trial for orientation" Visible="True"><Regions><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4541" Y="1558" /><V X="4724" Y="1799" /></Vertices></Region></Regions></Annotation><Annotation LineColor="65280" Name="vRNA+ cells" Visible="True"><Regions><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4379" Y="1790" /><V X="4390" Y="1799" /></Vertices></Region></Regions></Annotation><Annotation LineColor="65280" Name="vRNA+ cells" Visible="True"><Regions><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4386" Y="1828" /><V X="4397" Y="1837" /></Vertices></Region></Regions></Annotation><Annotation LineColor="65280" Name="vRNA+ cells" Visible="True"><Regions><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6493" Y="5094" /><V X="6504" Y="5106" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3812" Y="3623" /><V X="3825" Y="3637" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5929" Y="4178" /><V X="5945" Y="4194" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6625" Y="2950" /><V X="6657" Y="2978" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4558" Y="4108" /><V X="4573" Y="4123" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4790" Y="3634" /><V X="4813" Y="3662" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4518" Y="3659" /><V X="4531" Y="3671" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4605" Y="3672" /><V X="4624" Y="3694" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5402" Y="5809" /><V X="5414" Y="5822" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5950" Y="4281" /><V X="5976" Y="4308" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6874" Y="3009" /><V X="6892" Y="3025" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5747" Y="5081" /><V X="5771" Y="5107" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6252" Y="2950" /><V X="6269" Y="2966" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5849" Y="2824" /><V X="5870" Y="2837" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5928" Y="4387" /><V X="5942" Y="4399" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5687" Y="6327" /><V X="5707" Y="6340" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6752" Y="1357" /><V X="6778" Y="1372" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5354" Y="4828" /><V X="5377" Y="4847" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="7917" Y="3164" /><V X="7940" Y="3175" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="7912" Y="3149" /><V X="7928" Y="3163" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6297" Y="3778" /><V X="6313" Y="3799" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="7084" Y="3362" /><V X="7101" Y="3379" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4970" Y="5380" /><V X="4982" Y="5395" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3445" Y="1445" /><V X="3457" Y="1456" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6426" Y="5157" /><V X="6436" Y="5171" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5376" Y="1552" /><V X="5397" Y="1570" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4577" Y="2321" /><V X="4609" Y="2346" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="2637" Y="2264" /><V X="2664" Y="2288" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6650" Y="1357" /><V X="6671" Y="1378" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="8594" Y="3417" /><V X="8611" Y="3437" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="2988" Y="2342" /><V X="3006" Y="2356" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4986" Y="2410" /><V X="5000" Y="2420" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="7647" Y="5031" /><V X="7662" Y="5044" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6844" Y="2660" /><V X="6858" Y="2670" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5284" Y="3289" /><V X="5304" Y="3308" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="2681" Y="2457" /><V X="2707" Y="2483" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4648" Y="3349" /><V X="4662" Y="3361" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="2233" Y="1564" /><V X="2247" Y="1579" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6117" Y="4809" /><V X="6144" Y="4833" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5478" Y="6361" /><V X="5494" Y="6374" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5489" Y="6427" /><V X="5497" Y="6436" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5472" Y="6476" /><V X="5481" Y="6487" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5471" Y="6440" /><V X="5485" Y="6458" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="7226" Y="4961" /><V X="7237" Y="4975" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4725" Y="6452" /><V X="4745" Y="6470" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6949" Y="2462" /><V X="6972" Y="2479" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5000" Y="2420" /><V X="5006" Y="2428" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="2089" Y="4001" /><V X="2104" Y="4016" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="2958" Y="3938" /><V X="2968" Y="3953" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4854" Y="6259" /><V X="4873" Y="6276" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5420" Y="4202" /><V X="5441" Y="4227" /></Vertices></Region></Regions></Annotation><Annotation LineColor="16777215" Name="FDC trapped virus" Visible="True"><Regions><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5416" Y="1480" /><V X="5536" Y="1576" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5695" Y="3512" /><V X="5767" Y="3611" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="1897" Y="1636" /><V X="2093" Y="1888" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4770" Y="2430" /><V X="4846" Y="2531" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5192" Y="1246" /><V X="5306" Y="1441" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4776" Y="1457" /><V X="4878" Y="1586" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="1669" Y="1563" /><V X="1794" Y="1617" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4096" Y="1504" /><V X="4195" Y="1591" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4544" Y="1566" /><V X="4719" Y="1788" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3905" Y="1350" /><V X="3971" Y="1426" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5302" Y="3416" /><V X="5369" Y="3479" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6100" Y="2670" /><V X="6240" Y="2822" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6090" Y="2919" /><V X="6247" Y="3010" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3145" Y="2292" /><V X="3268" Y="2426" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3207" Y="1841" /><V X="3424" Y="1955" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3522" Y="1510" /><V X="3717" Y="1751" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3838" Y="1661" /><V X="4054" Y="1865" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5897" Y="961" /><V X="6000" Y="1060" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5746" Y="1319" /><V X="5965" Y="1529" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4948" Y="1375" /><V X="5146" Y="1536" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6806" Y="2219" /><V X="6924" Y="2334" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="2984" Y="2411" /><V X="3026" Y="2470" /></Vertices></Region></Regions></Annotation><Annotation LineColor="16711935" Name="Artifact" Visible="True"><Regions><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6407" Y="5169" /><V X="6422" Y="5185" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4959" Y="3499" /><V X="5002" Y="3554" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="8740" Y="4189" /><V X="8787" Y="4238" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5431" Y="4342" /><V X="5470" Y="4371" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="8211" Y="3461" /><V X="8238" Y="3495" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4660" Y="4481" /><V X="4690" Y="4500" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5969" Y="5897" /><V X="5998" Y="5920" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="2391" Y="4223" /><V X="2410" Y="4239" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="9055" Y="2277" /><V X="9079" Y="2302" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5446" Y="6683" /><V X="5457" Y="6694" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="8384" Y="1065" /><V X="8401" Y="1081" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="1906" Y="3761" /><V X="1930" Y="3776" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6475" Y="2238" /><V X="6491" Y="2253" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6504" Y="2234" /><V X="6525" Y="2247" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6968" Y="2274" /><V X="6998" Y="2295" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3849" Y="5101" /><V X="3874" Y="5127" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4177" Y="3950" /><V X="4212" Y="3984" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="1906" Y="3311" /><V X="1927" Y="3331" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3358" Y="3828" /><V X="3378" Y="3843" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5421" Y="6670" /><V X="5436" Y="6685" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3669" Y="1428" /><V X="3693" Y="1446" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4982" Y="6011" /><V X="5033" Y="6049" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6113" Y="948" /><V X="6145" Y="971" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3227" Y="1543" /><V X="3251" Y="1562" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="1535" Y="3082" /><V X="1554" Y="3095" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3411" Y="3585" /><V X="3440" Y="3613" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="2216" Y="2010" /><V X="2231" Y="2024" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="5223" Y="4131" /><V X="5252" Y="4156" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="4365" Y="4154" /><V X="4391" Y="4174" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6756" Y="4481" /><V X="6792" Y="4513" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="2645" Y="2191" /><V X="2665" Y="2206" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6569" Y="4468" /><V X="6607" Y="4501" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="3246" Y="2027" /><V X="3266" Y="2040" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="7095" Y="1673" /><V X="7113" Y="1685" /></Vertices></Region><Region Type="Rectangle" HasEndcaps="0" NegativeROA="0"><Vertices><V X="6340" Y="3752" /><V X="6362" Y="3780" /></Vertices></Region></Regions></Annotation></Annotations>

Ce que j'essaye de faire est d'obtenir un bloc de données qui a deux colonnes de X et Y qui sont jumelées comme vous le voyez dans le texte. Par exemple, les premiers X et Y seraient 4541 et 1558 dans la même rangée suivis de 4724 et 1799.

J'ai essayé d'importer et d'utiliser "=" comme délimiteur, mais cela crée simplement de nombreuses colonnes de données lorsque j'en ai vraiment besoin dans le cadre de données. Quelqu'un a des pensées?

0
Cardtrick 27 oct. 2020 à 22:22

2 réponses

Meilleure réponse

Vos données sont au format XML, il serait donc judicieux d'analyser le XML plutôt que d'en extraire des fonctionnalités sous forme de chaîne.

Voici un exemple simple, qui lit les données que vous avez fournies (enregistrées sous forme de fichier XML) et utilise le package xml2 pour parcourir chaque nœud <V />, extrait les coordonnées X et Y et les assemble en un nouveau tibble, convertissant les valeurs en nombres entiers en cours de route:

library(tidyverse)
library(xml2) 

data <- read_xml("path/to/your/data.xml")

vertices <- xml_find_all(data, "//V")

coordinates <- tibble(
  X = as.integer(xml_attr(vertices, "X")),
  Y = as.integer(xml_attr(vertices, "Y")))
)

Le résultat est le suivant:

# A tibble: 222 x 2
       X     Y
    
 1  4541  1558
 2  4724  1799
 3  4379  1790
 4  4390  1799
 5  4386  1828
 6  4397  1837
 7  6493  5094
 8  6504  5106
 9  3812  3623
10  3825  3637
# ... with 212 more rows
1
semaphorism 27 oct. 2020 à 20:06

Les données sont structurées en XML, vous pouvez donc y utiliser des XPath pour extraire les données.

Supposons que les données que vous avez publiées ci-dessus se trouvent dans foo.xml

library(tidyverse)
library(xml2)

vertices <- read_xml('foo.xml') %>% xml_find_all('.//Vertices')

xy_data <-
    tibble(
        x1 = vertices %>% xml_find_all('./*[1]') %>% xml_attr('X'),
        y1 = vertices %>% xml_find_all('./*[1]') %>% xml_attr('Y'),
        x2 = vertices %>% xml_find_all('./*[2]') %>% xml_attr('X'),
        y2 = vertices %>% xml_find_all('./*[2]') %>% xml_attr('Y')
    ) %>%
    mutate_all(as.integer)

print(xy_data)
# A tibble: 111 x 4
      x1    y1    x2    y2
   <int> <int> <int> <int>
 1  4541  1558  4724  1799
 2  4379  1790  4390  1799
 3  4386  1828  4397  1837
 4  6493  5094  6504  5106
 5  3812  3623  3825  3637
 6  5929  4178  5945  4194
 7  6625  2950  6657  2978
 8  4558  4108  4573  4123
 9  4790  3634  4813  3662
10  4518  3659  4531  3671
# … with 101 more rows
1
Greg Foletta 27 oct. 2020 à 22:22