茫茫網海中的冷日 - 對這文章發表回應
茫茫網海中的冷日
         
茫茫網海中的冷日
發生過的事,不可能遺忘,只是想不起來而已!
 恭喜您是本站第 1673291 位訪客!  登入  | 註冊
主選單

Google 自訂搜尋

Goole 廣告

隨機相片
PIMG_00102.jpg

授權條款

使用者登入
使用者名稱:

密碼:


忘了密碼?

現在就註冊!

對這文章發表回應

發表限制: 非會員 可以發表

發表者: 冷日 發表時間: 2015/12/18 8:36:04

PowerShell Basics #1: Reading and parsing CSV

I will be giving a talk on the topic of “PowerShell for Developers” at TechDays 2010 in Helsinki, Finland. As a warm-up to my presentation, I will be publishing a series of blog posts on various aspects of PowerShell. My viewpoint is specifically one of developer utility: What can PowerShell do to make a dev’s life easier?

I want to start with something that touches on data. Often, developers receive data in Excel format – usually in order to then import it into a database. Reading data from Excel is somewhat painful, but fortunately, Excel allows for easy saving to the CSV format. PowerShell, on the other hand, provides for several quite easy manipulations.

Simple imports with Import-Csv

Let’s start with a simple CSV file, customers.csv:


ID,Name,Country
1,John,United States
2,Beatrice,Germany
3,Jouni,Finland
4,Marcel,France

Turning this into objects in PowerShell is very straightforward:


PS D:\temp> import-csv customers.csv
ID Name Country
-- ---- -------
1 John United States
2 Beatrice Germany
3 Jouni Finland
4 Marcel France

As you can see, the first line in the text file gets parsed as a header, specifying the names on the PowerShell objects. The import doesn’t have a notion of strong typing; therefore, all the properties are imported as pure text. Often this is enough, but if it isn’t, look below…

Headerlessness and other cultures

There are a few scenarios where this won’t work. For example, if your CSV doesn’t have headers, you would get objects with column names such as “1”, “John” and “United States”. Lacking headers, you can supply them as a parameter:

import-csv .\customers.csv -header ID,Name,Country

That was easy (but don’t do it when your data has headers, or you end up duplicating them).

Well then, perhaps you live in a region where the field separator isn’t the usual comma? This is no problem to PowerShell, either


PS D:\temp> type customers-fi.csv
ID;Name;Country
1;John;United States
2;Beatrice;Germany

PS D:\temp> import-csv .\customers-fi.csv -delimiter ';'

ID Name Country
-- ---- -------
1 John United States
2 Beatrice Germany

If you know the file originated from your current UI culture, you can just dispense with the delimiter specification and type import-csv –useCulture customers-fi.csv. That will pick up the delimiter from your Windows settings.

When your CSV ain’t a file…

Often you get your CSV data in a file. Occasionally, you might download it through HTTP, or even pull it from a database. No matter how, you may end up with an array of strings that contains your CSV data. The Import-Csv cmdlet reads a file, but if you need to parse the data from another source, use ConvertFrom-Csv.


PS D:\temp> $csv = 'ID,Name,Country
>> 1,John,United States
>> 2,Beatrice,Germany'
>>
PS D:\temp> ConvertFrom-Csv $csv
ID Name Country
-- ---- -------
1 John United States
2 Beatrice Germany

As far as the culture switches go, everything discussed above also applies to ConvertFrom-Csv.

How about CSV content oddities?

There are some uses of CSV that veer away from the normal, safe path. The first and a reasonably common scenario is having the field delimiter in the data, something that is usually handled by quoting the field. Of course, up next is the scenario where a field contains the quotation mark.


image
And finally, there is the really controversial aspect of having a newline in a CSV field. Many parsers struggle with this, and in fact, the correct behavior isn’t exactly clear. Of course, for practical purposes, anything that makes Excel exports work correctly is usually good. But let’s look at an example that contains all of these anomalies (the original content for this is shown in the Excel screenshot to the right).


Number,String,Multiline,Date
512,"Comma,Rocks","Line 1
Line 2",15.1.2010 15:14
57,"Cool ""quotes""","First
Second",7.1.2010 9:33

The data is split across five lines, but actually contains two records and a header. This alone is somewhat controversial, given CSV’s starting point of one line per record. Anyway, it often still needs to be parsed, and PowerShell does a good job here:


PS D:\temp> Import-Csv .\oddstrings.csv
Number String Multiline Date
------ ------ --------- ----
512 Comma,Rocks Line 1... 15.1.2010 15:14
57 Cool "quotes" First... 7.1.2010 9:33

There is one key thing to notice. Import-Csv works, because it treats the data source as a single whole. However, ConvertFrom-Csv misparses the multiline fields, as it handles the input line-by-line.


PS D:\temp> type .\oddstrings.csv | ConvertFrom-Csv
Number String Multiline Date
------ ------ --------- ----
512 Comma,Rocks Line 1
Line 2" 15.1.2010 15:14
57 Cool "quotes" First
Second" 7.1.2010 9:33

Strong typing, then?


For this, there are no pre-cooked solutions. But once your data is imported correctly, separators, multiline fields and all, it’s rather easy to just typecast the stuff, providing you input validation at the same time. Consider this CSV file of event descriptions:


StartsAt,Title,Venue
9.3.2010,TechDays 2010,Helsinki
15.2.2010,Mobile World Congress,Barcelona

Next, you want to filter the data to just show the occurring within the next 30 days. For this, you'll want the datetimes parsed into System.DateTime objects.


PS D:\temp> $events = Import-Csv event.csv | foreach {
New-Object PSObject -prop @{
StartsAt = [DateTime]::Parse($_.StartsAt);
Title = $_.Title;
Venue = $_.Venue
}
}

PS D:\temp> $events
StartsAt Venue Title
-------- ----- -----
9.3.2010 0:00:00 Helsinki TechDays 2010
15.2.2010 0:00:00 Barcelona Mobile World Congress

Now, filtering the list is a snap.


PS D:\temp> $events | where { $_.StartsAt -lt (get-date).AddDays(30) }
StartsAt Venue Title
-------- ----- -----
15.2.2010 0:00:00 Barcelona Mobile World Congress

One more thing to notice here: In the example above, it worked because the date format (“15.2.2010”) was in Finnish and I happened to run the PowerShell in a thread with the Finnish culture. However, if your data happens to come from a culture different than your UI, you need to pass in the correct culture specifier. For example, to parse a date in the US locale, use the following:


$usCulture = [System.Globalization.CultureInfo]::CreateSpecificCulture("en-US")
[DateTime]::Parse("04/07/2009", $usCulture)

Note that specifying the culture explicitly is always a good idea if you plan on saving the script and reusing it later. Although you might pay attention to the parsing details at the time of writing, it is quite conceivable for someone to run the script later on in another thread. As dates are prone to silent misparsing (for example, is 04/07/2009 7th April or 4th July?), you could end up manipulating incorrect data.

In addition to dates, you’ll want to look at cultures when parsing decimal numbers and such. Remember: the –UseCulture switch on Import-Csv only applies to the separator settings, not the optional parsing phase.

Enjoy!


原文出處: Heikniemi Hardcoded » PowerShell Basics #1: Reading and parsing CSV
內容圖示
url email imgsrc image code quote
樣本
bold italic underline linethrough   












 [詳情...]
validation picture

注意事項:
預覽不需輸入認證碼,僅真正發送文章時才會檢查驗證碼。
認證碼有效期10分鐘,若輸入資料超過10分鐘,請您備份內容後,重新整理本頁並貼回您的內容,再輸入驗證碼送出。

選項

Powered by XOOPS 2.0 © 2001-2008 The XOOPS Project|