Scale-Invariant Feature Transform (SIFT)

들어가며: SIFT가 필요한 이유

SIFT는 Scale-Invariant Feature Transform의 약자로, 이미지에서 크기 변화에 강인한 특징점을 찾고 기술하는 알고리즘입니다. Harris corner detection을 수행할 때 필터의 scale에 따라 corner 검출 여부가 달라지는 문제가 있었는데요, SIFT는 바로 이러한 문제를 해결하기 위해 David Lowe가 1999년에 제안한 알고리즘입니다.

SIFT 알고리즘의 4단계 프로세스

1단계: Scale-space Extrema Detection

우리가 어떤 장면을 볼 때는 전체적인 틀과 세부적인 내용을 동시에 파악합니다. 성공적인 컴퓨터 비전을 위해서는 이런 다중 스케일 관찰을 모방해야 합니다. 이 단계에서는 Scale Space란 개념이 나오는데 여러 스케일로 본 이미지들을 모아놓은 공간을 Scale Space라고 합니다. 이미지 스케일을 변화시키는 가장 간단한 방법은 이미지를 확대하거나 축소시키는 것입니다.

가우시안 블러의 활용 SIFT는 사람이 물체를 관찰하는 방식을 모방합니다:

가까이서 보면: 세세한 부분까지 선명하게 보임
멀리서 보면: 전체적인 윤곽만 흐릿하게 보임

이를 구현하기 위해 가우시안 블러를 사용합니다. 블러의 강도(σ, 시그마 값)를 조절하여 마치 다른 거리에서 본 것 같은 효과를 만들어냅니다. σ가 클수록 더 멀리서 본 것처럼 흐릿해집니다.

LoG에서 DoG로의 최적화

LoG(Laplacian of Gaussian) 연산은 이미지의 엣지와 코너를 효과적으로 검출하지만, 계산량이 매우 많습니다. 따라서 SIFT는 계산 효율성이 뛰어나면서도 LoG의 근사치인 DoG(Difference of Gaussian)를 사용합니다. DoG 계산은 매우 간단합니다: 같은 옥타브 내에서 인접한 두 개의 블러 이미지를 빼주면 됩니다.

2단계: Keypoint Localization

Keypoint 찾기 Keypoint를 찾기 위해 3장의 DoG 이미지를 사용합니다:

중간 DoG 이미지에서 한 픽셀을 선택
현재 이미지의 주변 8개 픽셀
위아래 이미지의 각 9개 픽셀
총 26개 픽셀과 비교하여 최댓값 또는 최솟값인 경우 keypoint로 선정

이후 threshold를 적용하여 약한 keypoint는 제거합니다. 이렇게 찾은 keypoint들은 scale invariance(스케일 불변성)를 갖게 됩니다. (원래는 더 2차 Taylor approximation을 하는데 그 부분은 생략했습니다.)

3단계: Orientation Assignment

그런데 어떻게 Descriptor를 생성해야할까요? 각각의 Descriptor를 비교해봅시다. Keypoint를 매칭하기 어려운 이유는 같은 패치가 다른 픽셀 값으로 나타나고, 패치가 다른 스케일, 위치, 회전으로 나타나기 때문입니다. 아래 5가지 Descriptor가 있는데 SIFT는 이 중에서 마지막을 채택하였습니다.

Descriptor 방법론 비교

Pixel Values: 단순하지만 조명과 기하 변화에 취약
Image Gradient: 조명 변화에 강하지만 여전히 기하 변형에 취약
Color Histogram: 회전/스케일에 불변하지만 공간 정보 손실
Spatial Histogram: 대략적 공간 정보 보존하지만 큰 회전에 취약
Orientation Normalization (SIFT 채택): 회전 불변성 달성, 가장 효과적

이제 keypoint들에 방향을 할당하여 rotation invariance(회전 불변성)를 달성합니다.

Orientation Histogram 생성 과정:

가우시안 가중치 적용: Keypoint 주변 윈도우에 가우시안 블러를 적용합니다. 이는 중심에서 먼 픽셀의 영향을 줄여 노이즈에 강건하게 만들고, keypoint 중심부의 gradient 정보를 더 중요하게 반영하기 위함입니다.

Histogram 생성: 36개 빈(10도 간격)으로 구성된 histogram을 만듭니다. 주의할 점은 gradient의 방향별 빈도수가 아니라 magnitude 값을 누적한다는 것입니다.

주 방향 결정:

가장 높은 peak를 keypoint의 주 방향으로 설정
최대 peak의 80% 이상인 다른 peak도 별도의 방향으로 인정
따라서 하나의 위치에서 여러 방향을 가진 keypoint가 생성될 수 있음

4단계: Keypoint Descriptor

마지막으로 각 keypoint에 고유한 "지문"을 부여합니다.

16×16 윈도우 설정: Keypoint 주변에 16×16 픽셀 윈도우를 만들고, 이를 16개의 4×4 서브 윈도우로 분할

각 서브 윈도우에서 Histogram 생성:

8개 방향 빈 사용 (45도 간격)
Gradient magnitude와 orientation 계산
16개 서브윈도우 × 8개 빈 = 128차원 벡터

회전 불변성 달성: 각 gradient 방향에서 keypoint의 주 방향을 빼서 상대적 각도로 변환합니다. 예를 들어, keypoint 방향이 24.5도라면 모든 gradient 방향에서 이 값을 빼어 keypoint 방향에 상대적인 각도로 만듭니다.

조명 불변성 달성: 최종 128차원 벡터를 정규화하여 밝기 변화에 강건하게 만듭니다.

마무리

SIFT는 컴퓨터 비전 분야의 획기적인 알고리즘으로, scale과 rotation 불변성을 달성한 최초의 실용적인 방법입니다. 비록 최신 딥러닝 기법들이 등장했지만, SIFT의 원리는 여전히 feature detection과 description의 기초를 이해하는 데 필수적입니다.

코드 구현

아래 4가지 과정에서 1,2번째 과정은 생략하였습니다. 앞서 배운 Harris Corner detection을 통해 검출된 keypoint를 활용하여 3,4번째 과정을 수행해 SIFT를 수행하였습니다.

• Multi-scale extrema detection
• Keypoint localization
• Orientation assignment
• Keypoint descriptor

코드 구조는 아래와 같습니다. SIFT클래스의 __call__ 함수에서 이미지를 받으면 get_keypoints함수에서 keypoints를 반환하고, keypoints의 orientation assignment과정을 수행하기 위해 get_magnitude_and_angle함수에서 image의 image gradient의 방향과 크기를 반환합니다. 이 정보를 바탕으로 orientation_assignment를 수행하게되고, create_descriptors함수에서 descriptor를 생성합니다.

    def __call__(self, image):
        keypoints = self.get_keypoints(image)
        magnitude, angle = self.get_magnitude_and_angle(image)
        ori_assigned_keypoints = self.orientation_assignment(keypoints, magnitude, angle, image.shape)

        final_keypoints, descriptors = self.create_descriptors(ori_assigned_keypoints, magnitude, angle, image.shape)

        return final_keypoints, descriptors

KeyPoint객체는 opencv클래스의 객체입니다. Angle은 아직 계산하지 않았기 때문에 -1로 지정합니다.

    def get_keypoints(self, image):
        """
        ## Args
        image
            - 3채널 이미지임에 주의!
            - shape: (image height, image width, 3)
            - dtype: np.float32

        ## Return
        keypoints
            - List[cv2.KeyPoints]
        """
        assert image.shape[-1] == 3, "입력 이미지는 3채널이어야합니다."

        gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY).astype(np.float32) / 255.

        responses = cv2.cornerHarris(gray_image, 3, 3, 0.04)
        responses[responses < 0.01 * responses.max()] = 0
        responses = non_maximum_suppression(responses, (21, 21))

        keypoints_coords_y, keypoints_coords_x = np.nonzero(responses)

        keypoints = []
        for x, y in zip(keypoints_coords_x, keypoints_coords_y):
            keypoint = KeyPoint(
                float(x), float(y),   # Coordinate
                None, -1,             # Size, angle
                responses[y, x],      # Response
                0, -1                 # Octave, class id
            )
            keypoints.append(keypoint)

        return keypoints

아래 코드에서는 각 픽셀의 image gradient의 magnitude와 angle을 계산합니다.

    def get_magnitude_and_angle(self, image):
        """
        ## Args
        image
            - 3채널 이미지임에 주의!
            - shape: (image height, image width, 3)
            - dtype: np.float32

        ## Return
        magnitude
            - shape: (image height, image width)
            - dtype: np.float32
        
        angle
            - shape: (image height, image width)
            - dtype: np.float32
        """
        assert image.shape[-1] == 3, "입력 이미지는 3채널이어야합니다."

        gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY).astype(np.float32) / 255.

        # 1. Filtering을 통해 이미지에 대한 수평 및 수직방향 기울기 구하기
        sobel_x, sobel_y = get_sobel_filter()
        Ix = filter2D(gray_image, sobel_x.astype(np.float32), repetition_padding)
        Iy = filter2D(gray_image, sobel_y.astype(np.float32), repetition_padding)

        # 2. Magnitude 및 angle 구하기
        magnitude = np.sqrt(Ix**2 + Iy**2)
        angle = np.arctan2(Iy, Ix)
        angle = np.degrees(angle)

        # 3. (-180도 ~ 180도)인 angle을 (0도 ~ 360도)로 변환하기
        angle = (angle + 360) % 360

        return magnitude, angle

Orientation assignment과정입니다. 계산된 angle을 histogram에 넣어줍니다. 이때 다른 구현에서는 0~10도, 10~20도를 빈으로 나누지만, 저의 과제에서는 355~5도 이런식으로 나누라고 해서 약간 변형했습니다(~~조교님의 귀여운 트릭~~)

각각의 빈에 magnitude와 gaussian weight를 곱해 넣어줍니다.(magnitude에 gaussian weight를 곱하는 이유는 keypoint 중심으로부터의 거리에 따라 각 픽셀의 기여도를 차등 부여하기 위함입니다.) 그 후, keypoint의 주각도를 설정해줍니다. 이때, 가장 값이 높은 빈을 기준으로 80% 이상의 값을 가지는 빈도 keypoint의 angle로 저장해줍니다.

    def orientation_assignment(self, keypoints, magnitude, angle, image_shape):

        """
        ## Args
        keypoints
            - List[cv2.KeyPoints]
        
        magnitude
            - shape: (image height, image width)
            - dtype: np.float32
        
        angle
            - shape: (image height, image width)
            - dtype: np.float32

        image_shape
            - (image height, image width, channel)
            - Tuple or List
        
        ## Return
        ori_assigned_keypoints
            - List[cv2.KeyPoints]
        """

        img_h, img_w = image_shape[:2]

        patch_start = -(self.patch_size//2)
        patch_end = patch_start + self.patch_size

        ori_assigned_keypoints = []

        for keypoint in keypoints:
            kx, ky = keypoint.pt

            hist = np.zeros((36,))

            # 16x16 patch에 대한 주각도 찾기
            for dy in range(patch_start, patch_end):
                for dx in range(patch_start, patch_end):
                    target_x = int(kx + dx)
                    target_y = int(ky + dy)

                    # Out of range 처리
                    if ((target_x < 0) or (target_x >= img_w)) or ((target_y < 0) or (target_y >= img_h)):
                        continue
                    
                    # 1. 주각도에 대한 histogram bin index 찾기 
                    target_angle_bin = int((angle[target_y, target_x] + 5) % 360 / 10)
                    
                    # 2. 찾은 index에 magnitude 값과 gaussian weight을 곱해서 더해주기
                    gaussian_weight = get_gaussian_weight(dx, dy, sigma=self.gaussian_weight_sigma)
                    hist[target_angle_bin] += magnitude[target_y, target_x] * gaussian_weight
            
            # 3. Histogram을 토대로 주각도를 찾아 angle값이 포함된 keypoint 객체로 생성
            # 이때 histogram의 최댓값의 80% 이상인 각도는 모두 keypoint로 생성해야 함
            max_value = np.max(hist)
            for bin_idx, value in enumerate(hist):
                if value >= 0.8 * max_value:
                    new_keypoint = KeyPoint(
                        kx, ky,                  # Coordinate
                        None,                    # Size
                        bin_idx * 10,            # Angle (bin_idx * 10 degrees)
                        keypoint.response,       # Response
                        0, -1                    # Octave, class id
                    )
                    ori_assigned_keypoints.append(new_keypoint)
            
        return ori_assigned_keypoints

descriptor를 생성해주는 함수입니다. keypoint의 주각도를 바탕으로 회전행렬을 만들어 반대로 회전을 해줍니다. 이렇게 회전된 이미지를 바탕으로

    def create_descriptors(self, keypoints, magnitude, angle, image_shape):
        """
        ## Args
        keypoints
            - List[cv2.KeyPoints]
            - 주각도 정보가 존재하는 keypoint여야 함
        
        magnitude
            - shape: (image height, image width)
            - dtype: np.float32
        
        angle
            - shape: (image height, image width)
            - dtype: np.float32
        
        image_shape
            - (image height, image width, channel)
            - Tuple or List
        """
        img_h, img_w = image_shape[:2]

        descriptor_length = self.blocks_per_row * self.blocks_per_row * self.descriptor_histogram_bin_count
        descriptors = np.zeros((len(keypoints), descriptor_length))

        patch_start = -(self.patch_size//2)
        patch_end = patch_start + self.patch_size

        for keypoint_idx, keypoint in enumerate(keypoints):
            kx, ky = keypoint.pt

            # 1. 회전행렬 준비
            dominant_angle = np.radians(keypoint.angle)
            rotation_matrix = np.array([
                [np.cos(dominant_angle), -np.sin(dominant_angle)],
                [np.sin(dominant_angle), np.cos(dominant_angle)]
            ])
            
            descriptor = np.zeros((self.blocks_per_row, self.blocks_per_row, self.descriptor_histogram_bin_count))
            
            for dy in range(patch_start, patch_end):
                for dx in range(patch_start, patch_end):
                    # 회전하기 전 좌표
                    og_x = int(kx + dx)
                    og_y = int(ky + dy)

                    # Out of range 처리
                    if ((og_x < 0) or (og_x >= img_w)) or ((og_y < 0) or (og_y >= img_h)):
                        continue

                    # 2. 주각도의 반대 방향으로 좌표 회전
                    rot_coords = rotation_matrix.T @ np.array([dx, dy])
                    rot_dx = rot_coords[0]
                    rot_dy = rot_coords[1]

                    # 3. 각도가 음수가 될 수 있으므로 (0도 ~ 360도)가 될 수 있도록 처리
                    descriptor_angle = (angle[og_y, og_x] - keypoint.angle + 360) % 360

                    # 4. Histogram에 대한 bin 위치 찾기
                    target_angle_bin = int(descriptor_angle / (360 / self.descriptor_histogram_bin_count)) % self.descriptor_histogram_bin_count
                    
                    # 5. 값이 들어갈 sub-block 위치 찾기
                    sub_block_idx_y = int((rot_dy + self.patch_size/2) / (self.patch_size / self.blocks_per_row))
                    sub_block_idx_x = int((rot_dx + self.patch_size/2) / (self.patch_size / self.blocks_per_row))

                    # Out of range 처리
                    if ((sub_block_idx_x < 0) or (sub_block_idx_x >= self.blocks_per_row)) or ((sub_block_idx_y < 0) or (sub_block_idx_y >= self.blocks_per_row)):
                        continue
                    
                    # 6. Magnitude 값과 gaussian weight을 곱해서 더해주기
                    gaussian_weight = get_gaussian_weight(rot_dx, rot_dy, sigma=self.gaussian_weight_sigma)
                    descriptor_value = magnitude[og_y, og_x] * gaussian_weight

                    descriptor[sub_block_idx_y, sub_block_idx_x, target_angle_bin] += descriptor_value

            # 7. Descriptor normalization
            descriptor = descriptor.flatten()
            descriptor = descriptor / (np.linalg.norm(descriptor) + 1e-7)
            # Clipping to 0.2 and re-normalize (as in original SIFT)
            descriptor = np.minimum(descriptor, 0.2)
            descriptor = descriptor / (np.linalg.norm(descriptor) + 1e-7)

            descriptors[keypoint_idx] = descriptor

        return keypoints, descriptors

결과물

'Robot > Computer Vision' 카테고리의 다른 글

[Computer vision] Table of contents (0)	2025.10.10
[Computer Vision] Two Image Feature Matching (0)	2025.10.09

[컴퓨터비전] SIFT(Scale Invariant Feature Transform) 알고리즘 설명 및 구현