该类在java的java.util包中。

基本原理

BitSet是位操作的对象，值只有0或1即false和true，内部维护了一个long数组，初始只有一个long，所以BitSet最小的size是64，当随着存储的元素越来越多，BitSet内部会动态扩充，最终内部是由N个long来存储，这些针对操作都是透明的。
用1位来表示一个数据是否出现过，0为没有出现过，1表示出现过。使用用的时候既可根据某一个是否为0表示，此数是否出现过。
一个1G的空间，有 8*1024*1024*1024=8.58*10^9bit，也就是可以表示85亿个不同的数。

该类的成员变量

/*
     * BitSets are packed into arrays of "words."  Currently a word is
     * a long, which consists of 64 bits, requiring 6 address bits.
     * The choice of word size is determined purely by performance concerns.
     */
    private final static int ADDRESS_BITS_PER_WORD = 6;
    private final static int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;
    private final static int BIT_INDEX_MASK = BITS_PER_WORD - 1;

    /* Used to shift left or right for a partial word mask */
    private static final long WORD_MASK = 0xffffffffffffffffL;

    /**
     * @serialField bits long[]
     *
     * The bits in this BitSet.  The ith bit is stored in bits[i/64] at
     * bit position i % 64 (where bit position 0 refers to the least
     * significant bit and 63 refers to the most significant bit).
     */
    private static final ObjectStreamField[] serialPersistentFields = {
        new ObjectStreamField("bits", long[].class),
    };

    /**
     * The internal field corresponding to the serialField "bits".
     */
    private long[] words;

    /**
     * The number of words in the logical size of this BitSet.
     */
    private transient int wordsInUse = 0;

    /**
     * Whether the size of "words" is user-specified.  If so, we assume
     * the user knows what he's doing and try harder to preserve it.
     */
    private transient boolean sizeIsSticky = false;

    /* use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = 7997698588986878753L;

wordIndex函数,代码如下

/**
 * Given a bit index, return word index containing it.
 */
private static int wordIndex(int bitIndex) {
    return bitIndex >> ADDRESS_BITS_PER_WORD;
}

这个函数用来获取某个数在words数组中的索引，通过bitIndex右移ADDRESS_BITS_PER_WORD位,即右移6位。

正常情况下，右移n位得到的值为原值除以2的n次方，而2的6次方是64，也就是bitIndex/64，words数组中每个元素能够代表64个数字，所以存储的值除以64来得到数组索引值，例如0到63的索引值为0,那么这64个数字划归到第一个数组元素中，64到127划归到第二个数组元素中，以此类推。

checkInvariants()函数,代码如下

重要的内部检查函数。

/**
 * Every public method must preserve these invariants.
 */
private void checkInvariants() {
    assert(wordsInUse == 0 || words[wordsInUse - 1] != 0);
    assert(wordsInUse >= 0 && wordsInUse <= words.length);
    assert(wordsInUse == words.length || words[wordsInUse] == 0);
}

这个函数用来验证当前对象的有效性。

recalculateWordsInUse()函数,代码如下

重要的内部检查函数。

/**
 * Sets the field wordsInUse to the logical size in words of the bit set.
 * WARNING:This method assumes that the number of words actually in use is
 * less than or equal to the current value of wordsInUse!
 */
private void recalculateWordsInUse() {
    // Traverse the bitset until a used word is found
    int i;
    for (i = wordsInUse-1; i >= 0; i--)
        if (words[i] != 0)
            break;

    wordsInUse = i+1; // The new logical size
}

这个函数用来计算当前已经使用了的最大数组元素个数。

初始化函数

/**
 * Creates a new bit set. All bits are initially {@code false}.
 */
public BitSet() {
   //创建一个words数组，大小为1
    initWords(BITS_PER_WORD);
    //该数组大小不限定
    sizeIsSticky = false;
}

/**
 * Creates a bit set whose initial size is large enough to explicitly
 * represent bits with indices in the range {@code 0} through
 * {@code nbits-1}. All bits are initially {@code false}.
 *
 * @param  nbits the initial size of the bit set
 * @throws NegativeArraySizeException if the specified initial size
 *         is negative
 */
public BitSet(int nbits) {
    // nbits can't be negative; size 0 is OK
    if (nbits < 0)
        throw new NegativeArraySizeException("nbits < 0: " + nbits);

    initWords(nbits);
    //该数组大小限定
    sizeIsSticky = true;
}

 /**
     * Creates a bit set using words as the internal representation.
     * The last word (if there is one) must be non-zero.
     */
    private BitSet(long[] words) {
      //将该数组赋值给BitSet对象中的words数组
        this.words = words;
      //将当前数组正在使用的最大数组索引设置为数组元素大小
        this.wordsInUse = words.length;
       //验证当前对象的合法性
        checkInvariants();
    }

两个构造函数，分别是一个指定了初始大小，一个没指定。如果没指定，我们可以看到默认的初始大小为, 2^6 = 64-1=63 bit. 我们知道java中long的大小就是8个字节，也就是8*8=64bit。也就是说，bitset默认的是一个long整形的大小。初始化函数指定了必要的大小。

注：如果指定了bitset的初始化大小，那么会把它规整到一个大于或者等于这个数字的64的整倍数。比如64位，bitset的大小是1个long，而65位时，bitset大小是2个long，即128位。做这么一个规定，主要是为了内存对齐，同时避免考虑到不要处理特殊情况，简化程序。

initWords函数，代码如下

private void initWords(int nbits) {
    words = new long[wordIndex(nbits-1) + 1];
}

这个函数用来创建数组words,该数组元素大小通过wordIndex(nbits-1)+1来获取，而wordIndex上面我们讲了，用来得到某个值所在words数组的索引，然后创建words数组的时候一定要保证数组个数大于等于该值所在的索引值。

valueof函数，代码如下

/**
 * Returns a new bit set containing all the bits in the given long array.
 *
 * <p>More precisely,
 * <br>{@code BitSet.valueOf(longs).get(n) == ((longs[n/64] & (1L<<(n%64))) != 0)}
 * <br>for all {@code n < 64 * longs.length}.
 *
 * <p>This method is equivalent to
 * {@code BitSet.valueOf(LongBuffer.wrap(longs))}.
 *
 * @param longs a long array containing a little-endian representation
 *        of a sequence of bits to be used as the initial bits of the
 *        new bit set
 * @return a {@code BitSet} containing all the bits in the long array
 * @since 1.7
 */
public static BitSet valueOf(long[] longs) {
    int n;
  //倒序从longs数组中获取第一个不为0的元素所在的索引值
    for (n = longs.length; n > 0 && longs[n - 1] == 0; n--)
        ;
   //从longs数组中拷贝索引从0到n的元素，放到新创建的数组中，然后将该新数组构造BitSet类对象
    return new BitSet(Arrays.copyOf(longs, n));
}

/**
 * Returns a new bit set containing all the bits in the given long
 * buffer between its position and limit.
 *
 * <p>More precisely,
 * <br>{@code BitSet.valueOf(lb).get(n) == ((lb.get(lb.position()+n/64) & (1L<<(n%64))) != 0)}
 * <br>for all {@code n < 64 * lb.remaining()}.
 *
 * <p>The long buffer is not modified by this method, and no
 * reference to the buffer is retained by the bit set.
 *
 * @param lb a long buffer containing a little-endian representation
 *        of a sequence of bits between its position and limit, to be
 *        used as the initial bits of the new bit set
 * @return a {@code BitSet} containing all the bits in the buffer in the
 *         specified range
 * @since 1.7
 */
public static BitSet valueOf(LongBuffer lb) {
    lb = lb.slice();
    int n;
    for (n = lb.remaining(); n > 0 && lb.get(n - 1) == 0; n--)
        ;
    long[] words = new long[n];
    lb.get(words);
    return new BitSet(words);
}

/**
 * Returns a new bit set containing all the bits in the given byte array.
 *
 * <p>More precisely,
 * <br>{@code BitSet.valueOf(bytes).get(n) == ((bytes[n/8] & (1<<(n%8))) != 0)}
 * <br>for all {@code n <  8 * bytes.length}.
 *
 * <p>This method is equivalent to
 * {@code BitSet.valueOf(ByteBuffer.wrap(bytes))}.
 *
 * @param bytes a byte array containing a little-endian
 *        representation of a sequence of bits to be used as the
 *        initial bits of the new bit set
 * @return a {@code BitSet} containing all the bits in the byte array
 * @since 1.7
 */
public static BitSet valueOf(byte[] bytes) {
    return BitSet.valueOf(ByteBuffer.wrap(bytes));
}

/**
 * Returns a new bit set containing all the bits in the given byte
 * buffer between its position and limit.
 *
 * <p>More precisely,
 * <br>{@code BitSet.valueOf(bb).get(n) == ((bb.get(bb.position()+n/8) & (1<<(n%8))) != 0)}
 * <br>for all {@code n < 8 * bb.remaining()}.
 *
 * <p>The byte buffer is not modified by this method, and no
 * reference to the buffer is retained by the bit set.
 *
 * @param bb a byte buffer containing a little-endian representation
 *        of a sequence of bits between its position and limit, to be
 *        used as the initial bits of the new bit set
 * @return a {@code BitSet} containing all the bits in the buffer in the
 *         specified range
 * @since 1.7
 */
public static BitSet valueOf(ByteBuffer bb) {
    bb = bb.slice().order(ByteOrder.LITTLE_ENDIAN);
    int n;
    for (n = bb.remaining(); n > 0 && bb.get(n - 1) == 0; n--)
        ;
    long[] words = new long[(n + 7) / 8];
    bb.limit(n);
    int i = 0;
    while (bb.remaining() >= 8)
        words[i++] = bb.getLong();
    for (int remaining = bb.remaining(), j = 0; j < remaining; j++)
        words[i] |= (bb.get() & 0xffL) << (8 * j);
    return new BitSet(words);
}

ensureCapacity()函数,代码如下

/**
 * Ensures that the BitSet can hold enough words.
 * @param wordsRequired the minimum acceptable number of words.
 */
private void ensureCapacity(int wordsRequired) {
  //如果words的大小小于所需要的
    if (words.length < wordsRequired) {
        //扩容最终的大小，最小为原来的两倍
        int request = Math.max(2 * words.length, wordsRequired);
      //创建新的数组，容量为request,然后将原来的数组拷贝到新的数组中
        words = Arrays.copyOf(words, request);
      //数组大小不固定的标志
        sizeIsSticky = false;
    }
}

这个函数用来动态扩容。

flip()函数，用来将指定的位数取反操作,也就是原来位为0，取反后变成1,原来为1,取反后变成0。

set函数，用来将指定的位设置为1。

clear函数，用来将指定的位设置成0。

get函数,用来获取某一位是否有效，或者获取某个BitSet类对象的指定范围的数值到一个新的BitSet类对象中。

nextSetBit()函数，获取从fromIndex开始第一个设置为1的bit索引值。

nextClearBit函数，获取从fromIndex开始第一个设置为0的bit索引值。

previousSetBit函数

将该fromIndex所在的bit以及它的低位设置成1，然后去和相应的words数组元素进行&操作(此时fromIndex所在的bit以及它的低位之外的其他bit都变成了0)，如果不为0,说明该words元素上有符合要求的bit位，然后查找的规则就是从高位开始查找，直到遇到第一个bit为1的，就是我们要找的。

cardinality函数，代码如下

/**
 * Returns the number of bits set to {@code true} in this {@code BitSet}.
 *
 * @return the number of bits set to {@code true} in this {@code BitSet}
 * @since  1.4
 */
public int cardinality() {
    int sum = 0;
    for (int i = 0; i < wordsInUse; i++)
        sum += Long.bitCount(words[i]);
    return sum;
}

用来获取words数组中每个元素bit为1的总bit数。

使用场景

常见的应用是那些需要对海量数据进行一些统计工作的时候，比如日志分析、用户数统计等等。

阿里面试题：有1千万个随机数，随机数的范围在1到1亿之间。现在要求写出一种算法，将1到1亿之间没有在随机数中的数求出来？

public static void main(String[] args){
        Random random=new Random();
        
        List<Integer> list=new ArrayList<>();
        for(int i=0;i<10000000;i++){
            int randomResult=random.nextInt(100000000);
            list.add(randomResult);
        }
        System.out.println("产生的随机数有");
        for(int i=0;i<list.size();i++){
            System.out.println(list.get(i));
        }
        BitSet bitSet=new BitSet(100000000);
        for(int i=0;i<10000000;i++) {
            bitSet.set(list.get(i));
        }
        
        System.out.println("0~1亿不在上述随机数中有"+bitSet.cardinality());
        for (int i = 0; i < 100000000; i++) {
            if(!bitSet.get(i)){
                System.out.println(i);
            }
        }     
    }

网站首页 > 网络编程 > 其它综合正文

Java BitSet学习

基本原理

该类的成员变量

wordIndex函数,代码如下

checkInvariants()函数,代码如下

recalculateWordsInUse()函数,代码如下

初始化函数

initWords函数，代码如下

valueof函数，代码如下

ensureCapacity()函数,代码如下

flip()函数，用来将指定的位数取反操作,也就是原来位为0，取反后变成1,原来为1,取反后变成0。

set函数，用来将指定的位设置为1。

clear函数，用来将指定的位设置成0。

get函数,用来获取某一位是否有效，或者获取某个BitSet类对象的指定范围的数值到一个新的BitSet类对象中。

nextSetBit()函数，获取从fromIndex开始第一个设置为1的bit索引值。

nextClearBit函数，获取从fromIndex开始第一个设置为0的bit索引值。

previousSetBit函数

cardinality函数，代码如下

使用场景

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎你发表评论:

网站首页 > 网络编程 > 其它综合 正文

Java BitSet学习

基本原理

该类的成员变量

wordIndex函数,代码如下

checkInvariants()函数,代码如下

recalculateWordsInUse()函数,代码如下

初始化函数

initWords函数，代码如下

valueof函数，代码如下

ensureCapacity()函数,代码如下

flip()函数，用来将指定的位数取反操作,也就是原来位为0，取反后变成1,原来为1,取反后变成0。

set函数，用来将指定的位设置为1。

clear函数，用来将指定的位设置成0。

get函数,用来获取某一位是否有效，或者获取某个BitSet类对象的指定范围的数值到一个新的BitSet类对象中。

nextSetBit()函数，获取从fromIndex开始第一个设置为1的bit索引值。

nextClearBit函数，获取从fromIndex开始第一个设置为0的bit索引值。

previousSetBit函数

cardinality函数，代码如下

使用场景

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎 你 发表评论:

网站首页 > 网络编程 > 其它综合正文

取消回复欢迎你发表评论: