samedi 9 mai 2015

join rows in CSV with different sized sections python

I have a csv file structered like this:

|     publish_date     |sentence_number|character_count|    sentence       |
----------------------------------------------------------------------------
|          1           |               |               |                   |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |      -1       |       0       | Sentence 1 here.  |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       0       |      14       | Sentence 2 here.  |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       1       |      28       | "Sentence 3 here. |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       2       |      42       | Sentence 4 here." |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       3       |      56       | Sentence 5 here.  |
----------------------------------------------------------------------------
|         end          |               |               |                   |
----------------------------------------------------------------------------
|          2           |               |               |                   |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |      -1       |       0       | Sentence 1 here.  |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       0       |      14       | Sentence 2 here.  |
----------------------------------------------------------------------------
|         end          |               |               |                   |
----------------------------------------------------------------------------
|         end          |               |               |                   |
----------------------------------------------------------------------------

What I'd like to do is combine each block of sentences into paragraphs to output individual paragraphs:

["Sentence 1 here.", "Sentence 2 here.", ""Sentence 3 here.", "Sentence 4 here."", "Sentence 5 here."]

Some sentences are quotes which continue into a new sentence, whilst others are entirely embedded within a sentence.

So far I've got this:

def read_file():

    file = open('test.csv', "rU")
    reader = csv.reader(file)
    included_cols = [3]

    for row in reader:
        content = list(row[i] for i in included_cols)

        print content    
    return content

read_file()

But this just outputs a list of sentences like so:

['Sentence 1 here.']
['Sentence 2 here.']

Any suggestions appreciated.

Aucun commentaire:

Enregistrer un commentaire