Michal Paluch, Alois Kneip, and Werner Hildenbrand,
"Individual vs. Aggregate Income Elasticities for Heterogeneous Populations", Journal of Applied Econometrics, forthcoming

Bandwidth selection

For each year and commodity group there are two bandwidth selection tasks:

Task 1. Optimal bandwidth for the mean regression of c on y and a
Task 2. Optimal bandwidth for the mean regression of log(c) on y and a

The selection of the optimal bandwidth has been carried out in N (Nonparametric software by Jeff Racine (http://www.economics.mcmaster.ca/racine/)

Task 1. The N-input file for this task is as follows (Example for total expenditure for 1993)

3
2
macrodata_11_93.txt
2
3
3
3000
y
n
n
5
macrodata_11_93.txt
2
3
3
6680
y
y
11
4

whereas macrodata_11_93.txt is the sample data for total expenditure (cat = 11) for 1993. It contains observations on c, y and a, ordered according to the following structure:

A = rawmatrix[[i]]
Dat[[i]] = data.frame(A[,cat], factor(A[,18]), factor(A[,19]), factor(A[,20]), ordered(A[,3]-A[,6]), ordered(A[,6]),	ordered(A[,7]), A[,2], log(A[,14]))
colnames(Dat[[i]]) <- c("exp.for.category","self.employed", "unemployed", "retired","n.adults", "n.child", "n.working", "age","log.income")

The parameters are as follows (output from N)

$Id: main.c,v 1.273 2004/05/11 17:35:29 jracine Exp jracine $
Main menu option:                      Kernel Regression
Regression menu option:                Computation for Training Data
Filename:                              macrodata_11_93.txt
Kernel for continuous predictors:      Second Order Epanechnikov Kernel
Kernel for unordered predictors:       Aitchison & Aitken Unordered Kernel
Kernel for ordered predictors:         Wang & Van Ryzin Ordered Kernel
Estimator for continuous predictors:   Local Linear Fixed Bandwidth Estimator
Number of continuous predictors:       2
Number of unordered predictors:        3
Number of ordered predictors:          3
Number of observations:                6680


Task 2. The N-input file for this task is as follows (Example for total expenditure for 1993)

3
2
microdata_11_93.txt
2
3
3
3000
y
n
n
5
microdata_11_93.txt
2
3
3
6680
y
y
11
4

whereas microdata_11_93.txt is the sample data for total expenditure (cat = 11) for 1993. It contains observations on log(c), y and a, ordered according to the previously mentioned structure

The parameters are as follows (output from N)

$Id: main.c,v 1.273 2004/05/11 17:35:29 jracine Exp jracine $
Main menu option:                      Kernel Regression
Regression menu option:                Computation for Training Data
Filename:                              microdata_11_93.txt
Kernel for continuous predictors:      Second Order Epanechnikov Kernel
Kernel for unordered predictors:       Aitchison & Aitken Unordered Kernel
Kernel for ordered predictors:         Wang & Van Ryzin Ordered Kernel
Estimator for continuous predictors:   Local Linear Fixed Bandwidth Estimator
Number of continuous predictors:       2
Number of unordered predictors:        3
Number of ordered predictors:          3
Number of observations:                6680


Obtained bandwidths are scaling factors, which has to be recomputed as absolute bandwidths. This has been done in MATLAB

---- 
bwfilem = sprintf('E:/phk/results/nonparametric/bwmicro_%d_%d.txt',year,cat); %Bandwidth for the mean regression of log(c) on y on a
bwfileM = sprintf('E:/phk/results/results/nonparametric/bwmacro_%d_%d.txt',year,cat); %Bandwidth for the mean regression of c on y and a
cdffile = sprintf('E:/phk/results/nonparametric/bwcdf_%d_%d.txt',cat,year); %Bandwidth for the quantile regression of log(c) on y and a
bwcdf = dlmread(cdffile); sfm = dlmread(bwfilem);sfM = dlmread(bwfileM);  %read data from the above files (scaling factors)

%calculation of the opt. bandwidth
%for continuous regressors h(x) = sf(x)*std(x)*n^(-1/6);
%for discrete regressors lambda(x) = sf(x)*n^(-2/6);
[n,m] = size(d); %n is the number of observations in the data
fc = n^(-1/6);fd = n^(-2/6);seq = [1:n]'; #define adjustment factors 
s = 1.5*std(d)'; S = 1.5*std(D)'; %standard deviations of the data (d - logc on y and a; D - c on y and a)

hy = bwcdf(1)*s(1)*fd; %rule of thumb: bandwidth for consumption expenditure is proportional to n^(-2/6). See Li and Racine(2006)
h(1,1) = s(8)*sfm(1)*fc;h(2,1) = s(9)*sfm(2)*fc; %bw for age and income mean estim
H(1,1) = S(8)*sfM(1)*fc;H(2,1) = S(9)*sfM(2)*fc; %bw for age and income mean estim
hcdf(1,1) = s(8)*bwcdf(2)*fc;hcdf(2,1) = s(9)*bwcdf(3)*fc; %bw for age and income cdf estim

h(3,1) = sfm(3)*fd;h(4,1) = sfm(4)*fd;h(5,1) = sfm(5)*fd;h(6,1) = sfm(6)*fd;h(7,1) = sfm(7)*fd;h(8,1) = sfm(8)*fd;
H(3,1) = sfM(3)*fd;H(4,1) = sfM(4)*fd;H(5,1) = sfM(5)*fd;H(6,1) = sfM(6)*fd;H(7,1) = sfM(7)*fd;H(8,1) = sfM(8)*fd;
hcdf(3,1) = bwcdf(4)*fd;hcdf(4,1) = bwcdf(5)*fd;hcdf(5,1) = bwcdf(6)*fd;hcdf(6,1) = bwcdf(7)*fd;hcdf(7,1) = bwcdf(8)*fd;hcdf(8,1) = bwcdf(9)*fd;
HH = [0,0,hy;h,H,hcdf];
bwoutfile = sprintf('E:/phk/results/nonparametric/bwall_%d_%d.txt',cat,year);
dlmwrite(bwoutfile,HH,'\t');


