Structured data

Data set

Let

be a data matrix (N samples and D variables). Let us consider the following toy sample:

Nc = 50;%number of instances per class (Balanced problem)

X = [rand(1,Nc)' rand(1,Nc)' rand(1,Nc)'; ...

1.8*[rand(1,Nc)' rand(1,Nc)' rand(1,Nc)']+0.8;...

-1.5*[rand(1,Nc)' rand(1,Nc)' rand(1,Nc)']-0.5];

scatter3(X(:,1),X(:,2),X(:,3),40,'fill')

title('Raw data'), xlabel('Variable 1 - X axis')

ylabel('Variable 2 - y axis')

zlabel('Variable 3 - z axis')

Let

be the labelling vector.

y = [ones(1,Nc) 2*ones(1,Nc) 3*ones(1,Nc)]';

scatter3(X(:,1),X(:,2),X(:,3),40,y,'fill')

title('Labeled Raw data'), xlabel('Variable 1 - X axis')

ylabel('Variable 2 - y axis')

zlabel('Variable 3 - z axis')

p = [1];

x_p = [X(p,1),X(p,2),X(p,3)];

scatter3(X(:,1),X(:,2),X(:,3),40,y,'fill')

title('Labeled Raw data'), xlabel('Variable 1 - X axis')

ylabel('Variable 2 - y axis')

zlabel('Variable 3 - z axis')

hold on

scatter3(x_p(1),x_p(2),x_p(3),80,'black','filled'), hold off

Let us consider the Fisheriris data set:

load fisheriris.mat

X = meas;

[~,~,y] = unique(species);

[N,D] = size(X);

sel_feat = [1 2 3];

scatter3(X(:,sel_feat(1)),X(:,sel_feat(2)),X(:,sel_feat(3)),50,y,'filled'),

title('Iris Dataset'),

xlabel(['Feature ' num2str(sel_feat(1))])

ylabel(['Feature ' num2str(sel_feat(2))])

zlabel(['Feature ' num2str(sel_feat(3))])