NiuNiu's Warehouse: 字符函数介绍1

原文地址：http://www2.sas.com/proceedings/sugi31/247-31.pdf

转载请注明出处：http://blog.sina.com.cn/s/blog_5d3b177c0100b685.html

1 Length：设置字符串长度；

Lengthc：得到字符串存储空间大小

data chars1;

file print;

string = 'abc';

length string $ 7;

storage_length = lengthc(string);

display = ":" || string || ":";

put storage_length=;

put display=;

run;

这里storage_length的值为3。

data chars2;

file print;

length string $ 7;

string = 'abc';

storage_length = lengthc(string);

display = ":" || string || ":";

put storage_length=;

put display=;

run;

这里storage_length的值为7。因此用length时，要注意使用的次序。

2 compbl：将多个连续空格转换成为一个空格

data multiple;

input #1 @1 name $20.

#2 @1 address $30.

#3 @1 city $15.

@20 state $2.

@25 zip $5.;

name = compbl(name);

address = compbl(address);

city = compbl(city);

datalines;

Ron Cody

89 Lazy Brook Road

Flemington NJ 08822

Bill Brown

28 Cathy Street

North City NY 11518

;

title "Listing of Data Set MULTIPLE";

proc print data=multiple noobs;

id name;

var address city state zip;

run;

结果：

name address city state zip

Ron Cody 89 Lazy Brook Road Flemington NJ 08822

Bill Brown 28 Cathy Street North City NY 11518

3 Compress：从字符串中移删除某些字符。

data phone;

input phone $ 1-15;

phone1 = compress(phone);

phone2 = compress(phone,'(-) ');

datalines;

(908)235-4490

(201) 555-77 99

;

title "Listing of Data Set PHONE";

proc print data=phone noobs;

run;

结果：

Listing of Data Set PHONE

phone phone1 phone2

(908)235-4490 (908)235-4490 9082354490

(201) 555-77 99 (201)555-7799 2015557799

4 Verify：检验字符串中是否含有检验字符串之外的字符。

data verify;

input @1 id $3.

@5 answer $5.;

position = verify(answer,'abcde');

datalines;

001 acbed

002 abxde

003 12cce

004 abc e

;

title "Listing of Data Set VERIFY";

proc print data=verify noobs;

run;

结果：

Listing of Data Set VERIFY

id answer position

001 acbed 0

002 abxde 3

003 12cce 1

004 abc e 4

当用verify时一定要注意，要检验的字符串如果含有空格时容易产生意外的结果：

data trailing;

length string $ 10;

string = 'abc';

pos = verify(string,'abcde');

run;

这里POS=4，因为string的后面系统自动赋给空格，要解决这个问题，就要用到trim函数：

pos = verify(trim(string),'abcde');

这里POS=0

5 Substr：取一个长字符串中的一部分。

data pieces_parts;

input id $ 1-9;

length state $ 2;

state = substr(id,1,2);

num = input(substr(id,7,3),3.);

datalines;

NYXXXX123

NJ1234567

;

title "Listing of Data Set PIECES_PARTS";

proc print data= pieces_parts noobs;

run;

Substr另类用法：给某字符串的某几个字符赋值：

data pressure;

input sbp dbp @@;

length sbp_chk dbp_chk $ 4;

sbp_chk = put(sbp,3.);

dbp_chk = put(dbp,3.);

if sbp gt 160 then

substr(sbp_chk,4,1) = '*';

if dbp gt 90 then

substr(dbp_chk,4,1) = '*';

datalines;

120 80 180 92 200 110

;

title "Listing of Data Set PRESSURE";

proc print data=pressure noobs;

run;

这里，我们对sbp_chk和dbp_chk的第四个字符赋值为*。

6 Scan：从长的字符串里分离出单词或短字符串：

Scan 的语法：SCAN(char_var,n,'list-of-delimiters'); n是char_var的第n个单词，如果char_var的单词数小于n，那么返回值将为空；如果n为负，那么scan将从右到左进行。

data parse;

input long_str $ 1-80;

array pieces[5] $ 10

piece1-piece5;

do i = 1 to 5;

pieces[i] = scan(long_str,i,',.! ');

end;

drop long_str i;

datalines;

this line,contains!five.words

abcdefghijkl xxx yyy

;

title "Listing of Data Set PARSE";

proc print data=parse noobs;

run;

结果：

Listing of Data Set PARSE

piece1 piece2 piece3 piece4 piece5

this line contains five words

abcdefghij xxx yyy

Scan：获得字符串里的最后一个单词：

data first_last;

input @1 name $20.

@21 phone $13.;

***extract the last name from name;

last_name = scan(name,-1,' ');

datalines;

Jeff W. Snoker (908)782-4382

Raymond Albert (732)235-4444

Alfred Edward Newman (800)123-4321

Steven J. Foster (201)567-9876

Jose Romerez (516)593-2377

;

title "Names and Phone Numbers in Alphabetical Order (by Last Name)";

proc report data=first_last nowd;

columns name phone last_name;

define last_name / order noprint width=20;

define name / display 'Name' left width=20;

define phone / display 'Phone Number' width=13 format=$13.;

run;

8 index：搜索第二个参数(字符串)在第一个参数（字符串）的位置

Indexc：搜索第二个参数的任意一字母在第一个参数里最早出现的位置

data locate;

input string $ 1-10;

first = index(string,'xyz');

first_c = indexc(string,'x','y','z');

datalines;

abcxyz1234

1234567890

abcx1y2z39

abczzzxyz3

;

title "Listing of Data Set LOCATE";

proc print data=locate noobs;

run;

结果：

obs first first_c

1 4 4

2 0 0

3 0 4

4 7 4

9 UPCASE：字母全部大写

LOWCASE：字母全部小写

data up_down;

length a b c d e $ 1;

input a b c d e x y;

datalines;

M f P p D 1 2

m f m F M 3 4

;

data upper;

set up_down;

array all_c[*] _character_;

do i = 1 to dim(all_c);

all_c[I] = upcase(all_c[i]);

end;

drop i;

run;

title "Listing of Data Set UPPER";

proc print data=upper noobs;

run;

10 PROPCASE：将每个单词的第一个字母大写，其它字母全部小写

data proper;

input name $40.;

name=PROPCASE(name);

datalines;

rOn coDY

the tall and the short

the "%$#@!" escape

;

title "Listing of Data Set PROPER";

proc print data=proper noobs;

run;

结果：

Listing of Data Set PROPER

name

Ron Cody

The Tall And The Short

The "%$#@!" Escape

11 TRANWRD：将字符串转换成其它的字符串，例如把road替换为rd.

其语法为：TRANWRD (char_var,'find_str','replace_str');

data convert;

input @1 address $20. ;

*** Convert Street, Avenue and

Boulevard to their abbreviations;

address = tranwrd(address,'Street','St.');

address = tranwrd (address,'Avenue','Ave.');

address = tranwrd (address,'Road','Rd.');

datalines;

89 Lazy Brook Road

123 River Rd.

12 Main Street

;

title "Listing of Data Set CONVERT";

proc print data=convert;

run;

结果：

OBS ADDRESS

1 89 Lazy Brook Rd.

2 123 River Rd.

3 12 Main St.

13 SPEDIS：模糊比较，如果两个字符串完全相同，则返回0，否则相似性越小，返回值越大。语法：SPEDIS(string1,string2);

data compare;

length string1 string2 $ 15;

input string1 string2;

points = spedis(string1,string2);

datalines;

same same

same sam

firstletter xirstletter

lastletter lastlettex

receipt reciept

;

title "Listing of Data Set COMPARE";

proc print data=compare noobs;

run;

结果：

Listing of Data Set COMPARE

string1 string2 points

same same 0

same sam 8

firstletter xirstletter 18

lastletter lastlettex 10

receipt reciept 7

14 any函数集：返回某一类字符首次出现的位置

ANYALNUM：

ANYALPHA：任一字符

ANYDIGIT：

ANYPUNCT：

ANYSPACE：

data find_alpha_digit;

input string $20.;

first_alpha = anyalpha(string);

first_digit = anydigit(string);

datalines;

no digits here

the 3 and 4

123 456 789

;

title "Listing of Data Set FIND_ALPHA_DIGIT";

proc print data=find_alpha_digit noobs;

run;

结果：

string first_alpha first_digit

no digits here 1 0

the 3 and 4 1 5

123 456 789 0 1

15 NOT函数集：返回非某一类字符首次出现的位置

NOTALNUM：

NOTALPHA：

NOTDIGIT：

NOTPUNCT：

NOTSPACE：

data data_cleaning;

input string $20.;

only_alpha = notalpha(trim(string));

only_digit = notdigit(trim(string));

datalines;

abcdefg

1234567

abc123

1234abcd

;

title "Listing of Data Set DATA_CLEANING";

proc print data=data_cleaning noobs;

run;

结果：

string only_alpha only_digit

abcdefg 0 1

1234567 1 0

abc123 4 1

1234abcd 1 5

16 CATS，CATX：合并字符串。

我们可以用“||”或“!!”来合并字符串，但用上面两个函数来合并字符串的好处是可以自动去掉原字符串前后两端的空白字符。

语法：

CATS(string1,string2,<stringn>);

CATX(separator,string1,string2,<stringn>); CATX允许合并的字符串中间加入自定义的分格符。

data join_up;

length cats $ 6 catx $ 17;

string1 = 'ABC ';

string2 = ' XYZ ';

string3 = '12345';

cats = cats(string1,string2);

catx = catx('***',string1,string2,string3);

run;

title "Listing of Data Set JOIN_UP";

proc print data=join_up noobs;

run;

结果

Listing of Data Set JOIN_UP

string1 string2 string3 cats catx

ABC XYZ 12345 ABCXYZ ABC***XYZ***12345

17 LENGTH, LENGTHN, LENGTHC：字符串长度

LENGTH：得到字符串长度（不包括后面的trailing空白）

LENGTHN：得到字符串长度（不包括后面的trailing空白）

LENGTHC：得到字符串存储空间大小

LENGTH与 LENGTHN的区别是：当测量空字符串时，LENGTH返回值为1，而LENGTHN返回值为0。

结果：

data how_long;

one = 'ABC ';

two = ' ';

three = 'ABC XYZ';

length_one = length(one);

lengthn_one = lengthn(one);

lengthc_one = lengthc(one);

length_two = length(two);

lengthn_two = lengthn(two);

lengthc_two = lengthc(two);

length_three = length(three);

lengthn_three = lengthn(three);

lengthc_three = lengthc(three);

run;

title "Listing of Data Set HOW_LONG";

proc print data=how_long noobs;

run;

结果：

Listing of Data Set HOW_LONG

one two three length_one lengthn_one lengthc_one length_two

ABC ABC XYZ 3 3 6 1

lengthn_ two lengthc_ two length_ three lengthn_ three lengthc_ three

0 1 9 9 9

18 COMPARE：

COMPARE(string1, string2 <,'modifiers'>)

这里的modifiers如下，你可以用1个或多个modifiers：

19 STRIP：删除字符串前和字符串后的空格

LEFT：删除字符串前的空格

TRIM：删除字符串后的空格

if strip(string) = 'abc' then result = 'yes';

if left(trim(string)) = 'abc' then result = 'yes';

20 COUNT：计算子符子串的个数

COUNTC：计算子符子串里任意字符的个数

语法：count(string,find_string,<'modifiers'>)

countc(string,find_string,<'modifiers'>)

其中，modifiers可以为i，以忽略case；或t，以忽略字符串前的空格

NiuNiu's Warehouse

Pages

Friday, May 27, 2011

字符函数介绍1

0 comments:

Search

About Me

Blog Archive

Music

Total Pageviews