J'ai donc une petite table d'Excel, que j'aimerais lire dans Pandas. En fait, j'ai plusieurs des goûts, et je voudrais simplement les intégrer directement dans mon script plutôt que de garder une trace de fichiers séparés.

Mon fichier peut être un tableau comme celui-ci entrez la description de l'image ici

Et maintenant, je veux le rendre intégrable:

import pandas as pd
pd.set_option("display.width", 1000)

df = pd.read_excel("/your/excel/here/TEST.xlsx")
my_json = df.to_json()

# print of the above json that I want to keep in the script
read_this = {"970":{"0.0":0.0,"0.975301809":0.153,"1.950603618":0.711,"2.925905427":1.269,"3.901207236":1.7775,"4.876509045":1.3125,"5.851810854":0.8475,"6.827112663":0.3825,"7.802414472":0.0,"8.777716281":0.0,"9.75301809":0.0},"1250":{"0.0":0.72,"0.975301809":0.6608,"1.950603618":0.5616,"2.925905427":0.4624,"3.901207236":0.3632,"4.876509045":0.36,"5.851810854":0.36,"6.827112663":0.36,"7.802414472":0.36,"8.777716281":0.36,"9.75301809":0.36},"2000":{"0.0":0.36,"0.975301809":1.18368,"1.950603618":3.50496,"2.925905427":5.383636362,"3.901207236":6.398181817,"4.876509045":9.031304347,"5.851810854":12.91304348,"6.827112663":14.7792,"7.802414472":15.8208,"8.777716281":16.56,"9.75301809":16.56},"3000":{"0.0":2.16,"0.975301809":5.03712,"1.950603618":9.85824,"2.925905427":13.33152,"3.901207236":15.83136,"4.876509045":18.57375,"5.851810854":21.50325,"6.827112663":24.43275,"7.802414472":818.440258,"8.777716281":1625.416258,"9.75301809":2041.92},"4000":{"0.0":8.64,"0.975301809":10.95428571,"1.950603618":16.26857143,"2.925905427":24.38666667,"3.901207236":33.48,"4.876509045":36.50666666,"5.851810854":34.85333333,"6.827112663":387.6812305,"7.802414472":1301.771077,"8.777716281":2215.860923,"9.75301809":2908.8},"5000":{"0.0":7.2,"0.975301809":134.1889811,"1.950603618":492.0670188,"2.925905427":849.9450564,"3.901207236":1207.823094,"4.876509045":1632.171428,"5.851810854":2814.281143,"6.827112663":3996.390856,"7.802414472":5178.500572,"8.777716281":6360.610284,"9.75301809":7542.72},"5500":{"0.0":285.48,"0.975301809":548.6879999,"1.950603618":1290.456,"2.925905427":2032.224,"3.901207236":2773.992,"4.876509045":3515.76,"5.851810854":5088.96,"6.827112663":6662.16,"7.802414472":8235.36,"8.777716281":9808.56,"9.75301809":11381.76},"6000":{"0.0":563.76,"0.975301809":963.1870186,"1.950603618":2088.844981,"2.925905427":3214.502943,"3.901207236":4340.160906,"4.876509045":5399.348572,"5.851810854":7363.638857,"6.827112663":9327.929144,"7.802414472":11292.21943,"8.777716281":13256.50972,"9.75301809":15220.8}}

new_df = pd.DataFrame.from_dict(read_this)

print("original\n", df, "\n")

print("from json\n", new_df)

Et je reçois ce qui suit

original
             970     1250       2000         3000         4000         5000       5500          6000
0.000000  0.0000  0.7200   0.360000     2.160000     8.640000     7.200000    285.480    563.760000
0.975302  0.1530  0.6608   1.183680     5.037120    10.954286   134.188981    548.688    963.187019
1.950604  0.7110  0.5616   3.504960     9.858240    16.268571   492.067019   1290.456   2088.844981
2.925905  1.2690  0.4624   5.383636    13.331520    24.386667   849.945056   2032.224   3214.502943
3.901207  1.7775  0.3632   6.398182    15.831360    33.480000  1207.823094   2773.992   4340.160906
4.876509  1.3125  0.3600   9.031304    18.573750    36.506667  1632.171428   3515.760   5399.348572
5.851811  0.8475  0.3600  12.913043    21.503250    34.853333  2814.281143   5088.960   7363.638857
6.827113  0.3825  0.3600  14.779200    24.432750   387.681231  3996.390856   6662.160   9327.929144
7.802414  0.0000  0.3600  15.820800   818.440258  1301.771077  5178.500572   8235.360  11292.219430
8.777716  0.0000  0.3600  16.560000  1625.416258  2215.860923  6360.610284   9808.560  13256.509720
9.753018  0.0000  0.3600  16.560000  2041.920000  2908.800000  7542.720000  11381.760  15220.800000 

from json
                1250       2000         3000         4000         5000       5500          6000     970
0.0          0.7200   0.360000     2.160000     8.640000     7.200000    285.480    563.760000  0.0000
0.975301809  0.6608   1.183680     5.037120    10.954286   134.188981    548.688    963.187019  0.1530
1.950603618  0.5616   3.504960     9.858240    16.268571   492.067019   1290.456   2088.844981  0.7110
2.925905427  0.4624   5.383636    13.331520    24.386667   849.945056   2032.224   3214.502943  1.2690
3.901207236  0.3632   6.398182    15.831360    33.480000  1207.823094   2773.992   4340.160906  1.7775
4.876509045  0.3600   9.031304    18.573750    36.506667  1632.171428   3515.760   5399.348572  1.3125
5.851810854  0.3600  12.913043    21.503250    34.853333  2814.281143   5088.960   7363.638857  0.8475
6.827112663  0.3600  14.779200    24.432750   387.681231  3996.390856   6662.160   9327.929144  0.3825
7.802414472  0.3600  15.820800   818.440258  1301.771077  5178.500572   8235.360  11292.219430  0.0000
8.777716281  0.3600  16.560000  1625.416258  2215.860923  6360.610284   9808.560  13256.509720  0.0000
9.75301809   0.3600  16.560000  2041.920000  2908.800000  7542.720000  11381.760  15220.800000  0.0000

Si proche, mais pas vraiment pareil. Comment puis-je conserver la structure d'origine en tant que ligne de texte intégrable?

Boîte à outils de fichier Excel disponible ici

0
komodovaran_ 17 avril 2018 à 11:05

3 réponses

Meilleure réponse

OK, enfin je comprends ce que vous voulez: inclure le contenu de votre fichier Excel (c'est-à-dire une matrice 2D) directement en tant que variable dans le code source de votre script, afin que vous n'ayez plus à lire le fichier. Ai-je raison ?

La structure de données native capable de stocker des matrices 2D est une liste de listes . Cela peut être obtenu à partir de votre fichier Excel par le code suivant:

import pandas as pd
df = pd.read_excel("/your/excel/here/TEST.xlsx")
print("mat =", df.values.tolist())

Qui devrait imprimer quelque chose comme:

mat = [['', 970, 1250....], [0.00, 0, 0.72...], ...]

Ensuite, vous copiez simplement les lignes imprimées avec votre souris et les collez au début de votre code, pour créer une matrice mat qui stocke vos données.

Si vous avez besoin de panda DataFrame, changez simplement la ligne print en:

print("df = pd.DataFrame(%s)" % df.values.tolist())

Et appliquer le même processus de copier / coller

1
sciroccorics 17 avril 2018 à 09:07

D'accord, j'ai donc joué un peu et j'ai trouvé le moyen exact d'être comme ça:

import pandas as pd
from collections import OrderedDict

pd.set_option("display.width", 1000)

# Load in the original file (this needs to happen just once)
df = pd.read_excel("/your/excel/here/TEST.xlsx")

# Write this whole table to a single line of dict-formatted JSON for portability
embeddable_json = df.to_json()

# Print it to see the actual output
print(embeddable_json)

# Paste it in here, and wrap it in an OrderedDict to maintain original order
read_this = OrderedDict({"970":{"0.0":0.0,"0.975301809":0.153,"1.950603618":0.711,"2.925905427":1.269,"3.901207236":1.7775,"4.876509045":1.3125,"5.851810854":0.8475,"6.827112663":0.3825,"7.802414472":0.0,"8.777716281":0.0,"9.75301809":0.0},"1250":{"0.0":0.72,"0.975301809":0.6608,"1.950603618":0.5616,"2.925905427":0.4624,"3.901207236":0.3632,"4.876509045":0.36,"5.851810854":0.36,"6.827112663":0.36,"7.802414472":0.36,"8.777716281":0.36,"9.75301809":0.36},"2000":{"0.0":0.36,"0.975301809":1.18368,"1.950603618":3.50496,"2.925905427":5.383636362,"3.901207236":6.398181817,"4.876509045":9.031304347,"5.851810854":12.91304348,"6.827112663":14.7792,"7.802414472":15.8208,"8.777716281":16.56,"9.75301809":16.56},"3000":{"0.0":2.16,"0.975301809":5.03712,"1.950603618":9.85824,"2.925905427":13.33152,"3.901207236":15.83136,"4.876509045":18.57375,"5.851810854":21.50325,"6.827112663":24.43275,"7.802414472":818.440258,"8.777716281":1625.416258,"9.75301809":2041.92},"4000":{"0.0":8.64,"0.975301809":10.95428571,"1.950603618":16.26857143,"2.925905427":24.38666667,"3.901207236":33.48,"4.876509045":36.50666666,"5.851810854":34.85333333,"6.827112663":387.6812305,"7.802414472":1301.771077,"8.777716281":2215.860923,"9.75301809":2908.8},"5000":{"0.0":7.2,"0.975301809":134.1889811,"1.950603618":492.0670188,"2.925905427":849.9450564,"3.901207236":1207.823094,"4.876509045":1632.171428,"5.851810854":2814.281143,"6.827112663":3996.390856,"7.802414472":5178.500572,"8.777716281":6360.610284,"9.75301809":7542.72},"5500":{"0.0":285.48,"0.975301809":548.6879999,"1.950603618":1290.456,"2.925905427":2032.224,"3.901207236":2773.992,"4.876509045":3515.76,"5.851810854":5088.96,"6.827112663":6662.16,"7.802414472":8235.36,"8.777716281":9808.56,"9.75301809":11381.76},"6000":{"0.0":563.76,"0.975301809":963.1870186,"1.950603618":2088.844981,"2.925905427":3214.502943,"3.901207236":4340.160906,"4.876509045":5399.348572,"5.851810854":7363.638857,"6.827112663":9327.929144,"7.802414472":11292.21943,"8.777716281":13256.50972,"9.75301809":15220.8}})

# Read in the above to see if it matches the original file
new_df = pd.DataFrame.from_dict(read_this)

# It does!
print("original\n", df, "\n")
print("from json\n", new_df)

Alors maintenant, je peux garder de petites tables dans le script python lui-même, sans aucun fichier externe, application, pâte, etc.

La prochaine fois, je peux simplement lancer ceci, et la table est déjà là. Pas besoin de garder une trace de quoi que ce soit d'autre:

import pandas as pd
from collections import OrderedDict

read_this = OrderedDict({"970":{"0.0":0.0,"0.975301809":0.153,"1.950603618":0.711,"2.925905427":1.269,"3.901207236":1.7775,"4.876509045":1.3125,"5.851810854":0.8475,"6.827112663":0.3825,"7.802414472":0.0,"8.777716281":0.0,"9.75301809":0.0},"1250":{"0.0":0.72,"0.975301809":0.6608,"1.950603618":0.5616,"2.925905427":0.4624,"3.901207236":0.3632,"4.876509045":0.36,"5.851810854":0.36,"6.827112663":0.36,"7.802414472":0.36,"8.777716281":0.36,"9.75301809":0.36},"2000":{"0.0":0.36,"0.975301809":1.18368,"1.950603618":3.50496,"2.925905427":5.383636362,"3.901207236":6.398181817,"4.876509045":9.031304347,"5.851810854":12.91304348,"6.827112663":14.7792,"7.802414472":15.8208,"8.777716281":16.56,"9.75301809":16.56},"3000":{"0.0":2.16,"0.975301809":5.03712,"1.950603618":9.85824,"2.925905427":13.33152,"3.901207236":15.83136,"4.876509045":18.57375,"5.851810854":21.50325,"6.827112663":24.43275,"7.802414472":818.440258,"8.777716281":1625.416258,"9.75301809":2041.92},"4000":{"0.0":8.64,"0.975301809":10.95428571,"1.950603618":16.26857143,"2.925905427":24.38666667,"3.901207236":33.48,"4.876509045":36.50666666,"5.851810854":34.85333333,"6.827112663":387.6812305,"7.802414472":1301.771077,"8.777716281":2215.860923,"9.75301809":2908.8},"5000":{"0.0":7.2,"0.975301809":134.1889811,"1.950603618":492.0670188,"2.925905427":849.9450564,"3.901207236":1207.823094,"4.876509045":1632.171428,"5.851810854":2814.281143,"6.827112663":3996.390856,"7.802414472":5178.500572,"8.777716281":6360.610284,"9.75301809":7542.72},"5500":{"0.0":285.48,"0.975301809":548.6879999,"1.950603618":1290.456,"2.925905427":2032.224,"3.901207236":2773.992,"4.876509045":3515.76,"5.851810854":5088.96,"6.827112663":6662.16,"7.802414472":8235.36,"8.777716281":9808.56,"9.75301809":11381.76},"6000":{"0.0":563.76,"0.975301809":963.1870186,"1.950603618":2088.844981,"2.925905427":3214.502943,"3.901207236":4340.160906,"4.876509045":5399.348572,"5.851810854":7363.638857,"6.827112663":9327.929144,"7.802414472":11292.21943,"8.777716281":13256.50972,"9.75301809":15220.8}})
df = pd.DataFrame.from_dict(read_this)

Je n'ai testé que pour ce cas spécifique, donc je ne sais pas s'il existe des trames de données qui se comportent mal.

J'espère que je n'ai pas perdu trop de temps pour tout le monde.

0
komodovaran_ 17 avril 2018 à 09:01

À mon avis, HDF5 est une meilleure option pour les données numériques. Il a une sensibilité de type, une compression native et une fonctionnalité de mémoire insuffisante.

De plus, ce format est accessible dans toutes les langues.

Quelques exemples de code ci-dessous, mais plus d'informations disponibles dans la documentation h5py.

import pandas as pd
import h5py

# read data
df = pd.read_excel('file.xlsx')

# convert to numpy array
arr = df.values

# extract index and column names
idx = df.index
cols = df.columns

# write to HDF5 file
with h5py.File('out.h5', 'w') as hf:

    hf['/'].attrs['index'] = idx
    hf['/'].attrs['columns'] = cols

    hf.create_dataset('arr', shape=arr.shape, data=arr,
                      compression='gzip', compression_opts=9)
0
jpp 17 avril 2018 à 08:28