Friday, May 27, 2011

字符函数介绍1

1 Length:设置字符串长度;
Lengthc:得到字符串存储空间大小
data chars1;
   file print;
   string = 'abc';
   length string $ 7;
   storage_length = lengthc(string);
   display = ":" || string || ":";
   put storage_length=;
   put display=;
run;
这里storage_length的值为3。
data chars2;
   file print;
   length string $ 7;
   string = 'abc';
   storage_length = lengthc(string);
   display = ":" || string || ":";
   put storage_length=;
   put display=;
run;
这里storage_length的值为7。因此用length时,要注意使用的次序。

2 compbl:将多个连续空格转换成为一个空格
data multiple;
   input #1 @1  name    $20.
         #2 @1  address $30.
         #3 @1  city    $15.
            @20 state    $2.
            @25 zip      $5.;
   name = compbl(name);
   address = compbl(address);
   city = compbl(city);
datalines;
Ron Cody
89 Lazy Brook Road
Flemington         NJ   08822
Bill     Brown
28   Cathy   Street
North   City       NY   11518
;
title "Listing of Data Set MULTIPLE";
proc print data=multiple noobs;
   id name;
   var address city state zip;
run;
结果:
   name            address             city       state     zip

Ron Cody      89 Lazy Brook Road    Flemington     NJ      08822
Bill Brown    28 Cathy Street       North City     NY      11518

3 Compress:从字符串中移删除某些字符。
data phone;
   input phone $ 1-15;
   phone1 = compress(phone);
   phone2 = compress(phone,'(-) ');
datalines;
(908)235-4490
(201) 555-77 99
;
title "Listing of Data Set PHONE";
proc print data=phone noobs;
run;
结果:
Listing of Data Set PHONE

     phone            phone1          phone2

(908)235-4490      (908)235-4490    9082354490
(201) 555-77 99     (201)555-7799    2015557799

4 Verify:检验字符串中是否含有检验字符串之外的字符。
data verify;
   input @1  id $3.
         @5  answer $5.;
   position = verify(answer,'abcde');
datalines;
001 acbed
002 abxde
003 12cce
004 abc e
;
title "Listing of Data Set VERIFY";
proc print data=verify noobs;
run;
结果:
  Listing of Data Set VERIFY

id     answer    position

001    acbed         0
002    abxde         3
003    12cce         1
004    abc e         4 
当用verify时一定要注意,要检验的字符串如果含有空格时容易产生意外的结果:
data trailing;
   length string $ 10;
   string = 'abc';
   pos = verify(string,'abcde');
run;
这里POS=4,因为string的后面系统自动赋给空格,要解决这个问题,就要用到trim函数:
pos = verify(trim(string),'abcde');
这里POS=0

5 Substr:取一个长字符串中的一部分。
data pieces_parts;
   input id $ 1-9;
   length state $ 2;
   state = substr(id,1,2);
   num = input(substr(id,7,3),3.);
datalines;
NYXXXX123
NJ1234567
;
title "Listing of Data Set PIECES_PARTS";
proc print data= pieces_parts noobs;
run;

Substr另类用法:给某字符串的某几个字符赋值:
data pressure;
   input sbp dbp @@;
   length sbp_chk dbp_chk $ 4;
   sbp_chk = put(sbp,3.);
   dbp_chk = put(dbp,3.);
   if sbp gt 160 then
      substr(sbp_chk,4,1) = '*';
   if dbp gt 90 then
      substr(dbp_chk,4,1) = '*';
datalines;
120 80 180 92 200 110
;
title "Listing of Data Set PRESSURE";
proc print data=pressure noobs;
run;
这里,我们对sbp_chk和dbp_chk的第四个字符赋值为*。

6 Scan:从长的字符串里分离出单词或短字符串:
Scan 的语法:SCAN(char_var,n,'list-of-delimiters'); n是char_var的第n个单词,如果char_var的单词数小于n,那么返回值将为空;如果n为负,那么scan将从右到左进行。
data parse; 
   input long_str $ 1-80;
   array pieces[5] $ 10   
      piece1-piece5;
   do i = 1 to 5;
      pieces[i] = scan(long_str,i,',.! ');
   end;
   drop long_str i;
datalines;
this line,contains!five.words
abcdefghijkl xxx yyy
;
title "Listing of Data Set PARSE";
proc print data=parse noobs;
run;
结果:
Listing of Data Set PARSE             
             
piece1 piece2 piece3 piece4 piece5
this   line   contains   five   words
abcdefghij xxx yyy   

Scan:获得字符串里的最后一个单词:
data first_last;
   input @1  name  $20.
         @21 phone $13.;
   ***extract the last name from name;
   last_name = scan(name,-1,' ');
datalines;
Jeff W. Snoker       (908)782-4382
Raymond Albert       (732)235-4444
Alfred Edward Newman (800)123-4321
Steven J. Foster     (201)567-9876
Jose Romerez         (516)593-2377
;
title "Names and Phone Numbers in Alphabetical Order (by Last Name)";
proc report data=first_last nowd;
   columns name phone last_name;
   define last_name / order noprint width=20;
   define name      / display 'Name' left width=20;
   define phone     / display 'Phone Number' width=13 format=$13.;
run;

8 index:搜索第二个参数(字符串)在第一个参数(字符串)的位置
Indexc:搜索第二个参数的任意一字母在第一个参数里最早出现的位置
data locate;
   input string $ 1-10;
   first = index(string,'xyz');
   first_c = indexc(string,'x','y','z');
datalines;
abcxyz1234
1234567890
abcx1y2z39
abczzzxyz3
;
title "Listing of Data Set LOCATE";
proc print data=locate noobs;
run;
结果:
obs  first     first_c
     4          4
     0          0
     0          4
     7          4

9 UPCASE:字母全部大写
LOWCASE:字母全部小写
data up_down;
   length a b c d e $ 1;
   input a b c d e x y;
datalines;
M f P p D 1 2
m f m F M 3 4
;
data upper;
   set up_down;
   array all_c[*] _character_;
   do i = 1 to dim(all_c);
      all_c[I] = upcase(all_c[i]);
   end;
   drop i;
run;

title "Listing of Data Set UPPER";
proc print data=upper noobs;
run;

10 PROPCASE:将每个单词的第一个字母大写,其它字母全部小写
data proper;
   input name $40.;
   name=PROPCASE(name);
datalines;
rOn coDY
the tall and the short
the "%$#@!" escape
;
title "Listing of Data Set PROPER";
proc print data=proper noobs;
run;
结果:
Listing of Data Set PROPER

name
Ron Cody
The Tall And The Short
The "%$#@!" Escape

11 TRANWRD:将字符串转换成其它的字符串,例如把road替换为rd.
其语法为:TRANWRD (char_var,'find_str','replace_str');
data convert;
   input @1 address $20. ;
   *** Convert Street, Avenue and
   Boulevard to their abbreviations;
   address = tranwrd(address,'Street','St.');
   address = tranwrd (address,'Avenue','Ave.');
   address = tranwrd (address,'Road','Rd.');
datalines;
89 Lazy Brook Road 
123 River Rd.
12 Main Street
;
title "Listing of Data Set CONVERT";
proc print data=convert;
run;
结果:
OBS    ADDRESS

     89 Lazy Brook Rd.
     123 River Rd.
     12 Main St.

13 SPEDIS:模糊比较,如果两个字符串完全相同,则返回0,否则相似性越小,返回值越大。语法:SPEDIS(string1,string2);
data compare;
   length string1 string2 $ 15;
   input string1 string2;
   points = spedis(string1,string2);
datalines;
same same
same sam
firstletter xirstletter
lastletter lastlettex
receipt reciept
;
title "Listing of Data Set COMPARE";
proc print data=compare noobs;
run;
结果:
Listing of Data Set COMPARE

string1        string2        points

same           same              0
same           sam               8
firstletter    xirstletter      18
lastletter     lastlettex       10
receipt        reciept           7

14 any函数集:返回某一类字符首次出现的位置
ANYALNUM:
 ANYALPHA:任一字符
ANYDIGIT:
 ANYPUNCT:
 ANYSPACE:
data find_alpha_digit;
   input string $20.;
   first_alpha = anyalpha(string);
   first_digit = anydigit(string);
datalines;
no digits here
the 3 and 4
123 456 789
;
 title "Listing of Data Set FIND_ALPHA_DIGIT";
proc print data=find_alpha_digit noobs;
run;
结果:
string         first_alpha     first_digit

no digits here       1         0
the 3 and 4          1         5
123 456 789          0         1

15 NOT函数集:返回非某一类字符首次出现的位置
NOTALNUM:
NOTALPHA:
NOTDIGIT:
NOTPUNCT:
NOTSPACE:

data data_cleaning;
   input string $20.;
   only_alpha = notalpha(trim(string));
   only_digit = notdigit(trim(string));
datalines;
abcdefg
1234567
abc123
1234abcd
;
title "Listing of Data Set DATA_CLEANING";
proc print data=data_cleaning noobs;
run;
结果:
string     only_alpha    only_digit

abcdefg       0        1
1234567       1        0
abc123        4        1
1234abcd      1        5

16 CATS,CATX:合并字符串。
我们可以用“||”或“!!”来合并字符串,但用上面两个函数来合并字符串的好处是可以自动去掉原字符串前后两端的空白字符。
语法:
CATS(string1,string2,<stringn>);
CATX(separator,string1,string2,<stringn>); CATX允许合并的字符串中间加入自定义的分格符。

data join_up;
   length cats $ 6 catx $ 17;
   string1 = 'ABC   ';
   string2 = '   XYZ   ';
   string3 = '12345';
   cats = cats(string1,string2);
   catx = catx('***',string1,string2,string3);
run;
title "Listing of Data Set JOIN_UP";
proc print data=join_up noobs;
run;
结果
Listing of Data Set JOIN_UP

string1    string2    string3     cats           catx

  ABC        XYZ       12345     ABCXYZ    ABC***XYZ***12345

17 LENGTH, LENGTHN, LENGTHC:字符串长度
LENGTH:得到字符串长度(不包括后面的trailing空白)
LENGTHN:得到字符串长度(不包括后面的trailing空白)
LENGTHC:得到字符串存储空间大小
LENGTH与 LENGTHN的区别是:当测量空字符串时,LENGTH返回值为1,而LENGTHN返回值为0。
结果:
data how_long;
   one = 'ABC   ';
   two = ' ';
   three = 'ABC   XYZ';
   length_one = length(one);
   lengthn_one = lengthn(one);
   lengthc_one = lengthc(one);
   length_two = length(two);
   lengthn_two = lengthn(two);
   lengthc_two = lengthc(two);
   length_three = length(three);
   lengthn_three = lengthn(three);
   lengthc_three = lengthc(three);
run;
title "Listing of Data Set HOW_LONG";
proc print data=how_long noobs;
run;
结果:
Listing of Data Set HOW_LONG
                                       
one    two    three   length_one   lengthn_one     lengthc_one    length_two
 ABC           ABC   XYZ       3           3           6          1
 lengthn_ two    lengthc_ two    length_ three    lengthn_ three    lengthc_ three
                               
     0           1                    9           9

18 COMPARE:
COMPARE(string1, string2 <,'modifiers'>)
这里的modifiers如下,你可以用1个或多个modifiers:

19 STRIP:删除字符串前和字符串后的空格
LEFT:删除字符串前的空格
TRIM:删除字符串后的空格
if strip(string) = 'abc' then result = 'yes';
if left(trim(string)) = 'abc' then result = 'yes';

20 COUNT:计算子符子串的个数
COUNTC:计算子符子串里任意字符的个数
语法:count(string,find_string,<'modifiers'>)
 countc(string,find_string,<'modifiers'>)
其中,modifiers可以为i,以忽略case;或t,以忽略字符串前的空格

0 comments:

 
Copyright 2010 NiuNiu's Warehouse. Powered by Blogger
Blogger Templates created by DeluxeTemplates.net | Blogger Styles | Balance Transfer Credit Cards
Wordpress by Wpthemescreator
Blogger Showcase