Spark Streaming自定義Receiver - ChouYarn－IT工程師數位筆記本

文章出處

一背景

Spark社區為Spark Streaming提供了很多數據源接口，但是有些比較偏的數據源沒有覆蓋，由于公司技術棧選擇，用了阿里云的MQ服務ONS，要做實時需求，要自己編寫Receiver

二技術實現

1.官網的例子已經比較詳細，但是進入實踐還需要慢慢調試，官方文檔。

2.實現代碼，由三部分組成，receiver，inputstream，util

3.receiver代碼

import java.io.Serializable
import java.util.Properties

import com.aliyun.openservices.ons.api._
import com.aliyun.openservices.ons.api.impl.ONSFactoryImpl
import org.apache.spark.internal.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.receiver.Receiver

class OnsReceiver(
    cid: String,
    accessKey: String,
    secretKey: String,
    addr: String,
    topic: String,
    tag: String,
    func: Message => Array[Byte])
  extends Receiver[Array[Byte]](StorageLevel.MEMORY_AND_DISK_2) with Serializable with Logging {
  receiver =>

  private var consumer: Consumer = null
  private var workerThread: Thread = null

  override def onStart(): Unit = {
    workerThread = new Thread(new Runnable {
      override def run(): Unit = {
        val properties = new Properties
        properties.put(PropertyKeyConst.ConsumerId, cid)
        properties.put(PropertyKeyConst.AccessKey, accessKey)
        properties.put(PropertyKeyConst.SecretKey, secretKey)
        properties.put(PropertyKeyConst.ONSAddr, addr)
        properties.put(PropertyKeyConst.MessageModel, "CLUSTERING")
        properties.put(PropertyKeyConst.ConsumeThreadNums, "50")
        val onsFactoryImpl = new ONSFactoryImpl
        consumer = onsFactoryImpl.createConsumer(properties)
        consumer.subscribe(topic, tag, new MessageListener() {
          override def consume(message: Message, context: ConsumeContext): Action = {
            try {
              receiver.store(func(message))
              Action.CommitMessage
            } catch {
              case e: Throwable => e.printStackTrace()
                Action.ReconsumeLater
            }
          }
        })
        consumer.start()
      }
    })
    workerThread.setName(s"Aliyun ONS Receiver $streamId")
    workerThread.setDaemon(true)
    workerThread.start()
  }

  override def onStop(): Unit = {
    if (workerThread != null) {
      if (consumer != null) {
        consumer.shutdown()
      }

      workerThread.join()
      workerThread = null
      logInfo(s"Stopped receiver for streamId $streamId")
    }
  }
}

input代碼

import com.aliyun.openservices.ons.api.Message
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.receiver.Receiver

class OnsInputDStream(
    @transient _ssc: StreamingContext,
    cid: String,
    topic: String,
    tag: String,
    accessKey: String,
    secretKey: String,
    addr:String,
    func: Message => Array[Byte]
  ) extends ReceiverInputDStream[Array[Byte]](_ssc) {

  override def getReceiver(): Receiver[Array[Byte]] = {
    new OnsReceiver(cid,accessKey,secretKey,addr,topic,tag,func)
  }

}

util代碼

import com.aliyun.openservices.ons.api.Message
import org.apache.spark.annotation.Experimental
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}

object OnsUtils {
  @Experimental
  def createStream(
                    ssc: StreamingContext,
                    cid: String,
                    topic: String,
                    tag: String,
                    accessKey: String,
                    secretKey: String,
                    addr: String,
                    func: Message => Array[Byte]): ReceiverInputDStream[Array[Byte]] = {
    new OnsInputDStream(ssc, cid, topic, tag, accessKey, secretKey, addr, func)
  }

  @Experimental
  def createStreams(
                     ssc: StreamingContext,
                     consumerIdTopicTags: Array[(String, String, String)],
                     accessKey: String,
                     secretKey: String,
                     addr: String,
                     func: Message => Array[Byte]): DStream[Array[Byte]] = {
    val invalidTuples1 = consumerIdTopicTags.groupBy(e => (e._1, e._2)).filter(e => e._2.length > 1)
    val invalidTuples2 = consumerIdTopicTags.map(e => (e._1, e._2)).groupBy(e => e._1).filter(e => e._2.length > 1)
    if (invalidTuples1.size > 1 || invalidTuples2.size > 1) {
      throw new RuntimeException("Inconsistent consumer subscription.")
    } else {
      ssc.union(consumerIdTopicTags.map({
        case (consumerId, topic, tags) =>
          createStream(ssc, consumerId, topic, tags, accessKey, secretKey, addr, func)
      }))
    }
  }

}

三調用

val stream = (0 until 3).map(i => {
      OnsUtils.createStream(ssc,
        "CID",
        "BI_CALL",
        "call_log_ons",
        config.getString("ons.access_key"),
        config.getString("ons.sercet_key"),
        config.getString("ons.ons_addr"),
        func)
    })
    val unionStream = ssc.union(stream).foreachRDD(...)

stream可以決定設置多少個receiver，這個數量必須小于等于spark on yarn的num-executors，內存默認占用executors的內存的一半。

文章列表

不含病毒。www.avast.com

大師兄

IT工程師數位筆記本

大師兄發表在痞客邦留言(0) 人氣()

E-mail轉寄

IT工程師數位筆記本

If you give someone a program , you will frustrate them for a day; if you teach them how to program, you will frustrate them for a lifetime.IT 這段話的意思是，如果你交給某人一隻程式，你將折磨他一整天;如果你教會某人如何寫程式，你將折磨他一輩子。

BloggerAds

Spark Streaming自定義Receiver - ChouYarn

一背景

二技術實現

三調用

歷史上的今天

留言列表

參觀人氣

文章搜尋

最新文章

文章精選

誰來我家

熱門文章

文章分類

最新留言

QR Code

POWERED BY

IT工程師數位筆記本

If you give someone a program , you will frustrate them for a day; if you teach them how to program, you will frustrate them for a lifetime.IT 這段話的意思是，如果你交給某人一隻程式，你將折磨他一整天;如果你教會某人如何寫程式，你將折磨他一輩子。

BloggerAds

Spark Streaming自定義Receiver - ChouYarn

一 背景

二 技術實現

三 調用

歷史上的今天

留言列表

參觀人氣

文章搜尋

最新文章

文章精選

誰來我家

熱門文章

文章分類

最新留言

QR Code

POWERED BY

一背景

二技術實現

三調用