Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Tuesday, April 19, 2016

How can I split a large file csv file (7GB) in Python

How can I split a large file csv file (7GB) in Python


I have a 7GB csv file which I'd like to split into smaller chunks, so it is readable and faster for analysis in Python on a notebook. I would like to grab a small set from it, maybe 250MB, so how can I do this?

Answer by jonrsharpe for How can I split a large file csv file (7GB) in Python


See the Python docs on file objects (the object returned by open(filename) - you can choose to read a specified number of bytes, or use readline to work through one line at a time.

Answer by Thomas Orozco for How can I split a large file csv file (7GB) in Python


You don't need Python to split a csv file. Using your shell:

$ split -l 100 data.csv  

Would split data.csv in chunks of 100 lines.

Answer

by dstromberg for How can I split a large file csv file (7GB) in Python

Maybe something like this?

#!/usr/local/cpython-3.3/bin/python    import csv    divisor = 10    outfileno = 1  outfile = None    with open('big.csv', 'r') as infile:      for index, row in enumerate(csv.reader(infile)):          if index % divisor == 0:              if outfile is not None:                  outfile.close()              outfilename = 'big-{}.csv'.format(outfileno)              outfile = open(outfilename, 'w')              outfileno += 1              writer = csv.writer(outfile)          writer.writerow(row)  

Answer by Quentin Febvre for How can I split a large file csv file (7GB) in Python


I had to do a similar task, and used the pandas package:

for i,chunk in enumerate(pd.read_csv('bigfile.csv', chunksize=500000)):      chunk.to_csv('chunk{}.csv'.format(i))  

Answer by Jimmy for How can I split a large file csv file (7GB) in Python


I agree with @jonrsharpe readline should be able to read one line at a time even for big files.

If you are dealing with big csb files might I suggest using pandas.read_csv. I often use it for the same purpose and always find it awesome (and fast). Takes a bit of time to get used to idea of DataFrames. But once you get over that it speeds up large operations like yours massively.

Hope it helps.


Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72

Related Posts:

0 comments:

Post a Comment

Popular Posts

Fun Page

Powered by Blogger.