Probably Approximately Correct

MSc Machine Learning @ UCL | Alumnus @ IIT Madras| Google DeepMind Scholar | Interests: Machine learning

Sunday, December 24, 2023

Classifier Voting Mechanism for Multi-Class Decision Making : #High_Performance_Python(Post_2)

 


 

In classification tasks, two types of ensemble methods are utilized: hard voting and soft voting. Hard voting operates by collating the final class labels from a range of models and selecting the class that receives the majority of votes. Conversely, soft voting takes into account the predicted probabilities for each class label from various models. In this approach, the probabilities for each class are accumulated, and the class with the highest overall probability is selected as the prediction. Today, we shall explore hard voting through a 'toy' example.

 

The below code implements a Hard voting mechanism commonly used in ensemble machine learning methods, especially in scenarios involving multi-class classification. 




combinations = [pair for pair in itertools.combinations(range(3), 2)]
print("combination Oloop:",combinations)
def voting(output):
    vote_n = np.zeros(10)                        
    vote = ((np.sign(output) + 1) // 2).astype(int).tolist()
    print(vote)
    for i,j in enumerate(vote):
        print(i,j)
        print("combinations:",combinations[i][j])
        vote_n[combinations[i][j]] +=1  
        print("vote:",vote_n)
    return np.argmax(vote_n)   
output = [-1 if i % 2 == 0 else 1 for i in range(3)]
results = voting(output)


Output :

combination Oloop: [(0, 1), (0, 2), (1, 2)]
[0, 1, 0]
0 0
combinations: 0
vote: [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
1 1
combinations: 2
vote: [1. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
2 0
combinations: 1
vote: [1. 1. 1. 0. 0. 0. 0. 0. 0. 0.]



The combinations list is formulated utilising itertools.combinations, which assembles all conceivable pairings of classes from a set of three classes, namely 0, 1, and 2. The purpose of the voting function is to evaluate the outputs from various classifiers and ascertain the class receiving the majority of votes. Within this function, vote_n is a NumPy array, initially filled with zeros and dimensioned to tally votes for each class, presuming a total of ten classes. The vote process transforms the given output list into binary votes—0 or 1—based on their sign; thus, positive values in the output correspond to a vote of 1, while negative values yield a vote of 0.

 

As the function proceeds, it methodically iterates through these binary votes. Each vote is scrutinised, and the corresponding class pair from combinations is identified, subsequently incrementing the vote tally for the chosen class in vote_n. Following the accumulation of all votes, the function determines the class index with the highest vote count, employing np.argmax(vote_n) to do so.







Labels:

Saturday, December 23, 2023

Iterating and Indexing in Nested Lists for Data Selection : #High_Performance_Python(Post_1)



Often in image processing, we encounter image data organized in nested lists. A notable example is the image data titled 'dtrain123.dat', available at the link 

To effectively visualize such data, we use nested loops to traverse and manipulate these complex data structures. Let's start with a 'toy' example to illustrate the basic concept and its output. Following that, I'll provide the actual code needed to display the image data from 'dtrain123.dat'.

Code :

x = np.arange(10, 20)
y = np.array([0, 1, 2, 3, 0, 2, 1, 3, 0, 2])
print(x, y)

index = [np.argwhere(y == i).flatten() for i in range(4)]
print(index)

for i in range(4):
    if len(index[i]) >= 3: 
        img_index = index[i][2] 
        print(f"Category {i}, Index: {img_index}, 
           Value in x: {x[img_index]}")


The code creates an 'index' list that maps each category (0 to 3) to its occurrences in the array `y` using `np. argwhere(y == i).flatten()`. It then iterates over these categories, ensuring at least three instances exist to avoid indexing errors. For each qualifying category, it retrieves the index of the third occurrence from `index[i][2]` and uses this to access the related value in the `x` array.


This same logic is applied to the image data.

img_indices = [np.argwhere(y == i) for i in range(10)] 
plt.figure(figsize=(10, 3))
for i in range(10):
        plt.subplot(2, 5, i + 1)
        img_index = img_indices[i][2, 0]
        print(img_index)
        plt.imshow(x[img_index, :].reshape(16, 16))
        plt.axis('off')

The desired output is :




Labels: